Skip to content

williamhyun/spark-kubernetes-operator

 
 

Apache Spark K8s Operator

GitHub Actions Build License Repo Size

Apache Spark™ K8s Operator is a subproject of Apache Spark and aims to extend K8s resource manager to manage Apache Spark applications via Operator Pattern.

Building Spark K8s Operator

Spark K8s Operator is built using Gradle. To build, run:

$ ./gradlew build -x test

Running Tests

$ ./gradlew build

Build Docker Image

$ ./gradlew buildDockerImage

Install Helm Chart

$ ./gradlew spark-operator-api:relocateGeneratedCRD

$ helm install spark-kubernetes-operator --create-namespace -f build-tools/helm/spark-kubernetes-operator/values.yaml build-tools/helm/spark-kubernetes-operator/

Run Spark Pi App

$ kubectl apply -f examples/pi.yaml

$ kubectl get sparkapp
NAME   CURRENT STATE      AGE
pi     ResourceReleased   4m10s

$ kubectl delete sparkapp/pi

Run Spark Cluster

$ kubectl apply -f examples/prod-cluster-with-three-workers.yaml

$ kubectl get sparkcluster
NAME   CURRENT STATE    AGE
prod   RunningHealthy   10s

$ kubectl port-forward prod-master-0 6066 &

$ ./examples/submit-pi-to-prod.sh
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20240821181327-0000",
  "serverSparkVersion" : "4.0.0-preview2",
  "submissionId" : "driver-20240821181327-0000",
  "success" : true
}

$ curl http://localhost:6066/v1/submissions/status/driver-20240821181327-0000/
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FINISHED",
  "serverSparkVersion" : "4.0.0-preview2",
  "submissionId" : "driver-20240821181327-0000",
  "success" : true,
  "workerHostPort" : "10.1.5.188:42099",
  "workerId" : "worker-20240821181236-10.1.5.188-42099"
}

$ kubectl delete sparkcluster prod
sparkcluster.spark.apache.org "prod" deleted

Run Spark Pi App on Apache YuniKorn scheduler

If you have not yet done so, follow YuniKorn docs to install the latest version:

$ helm repo add yunikorn https://apache.github.io/yunikorn-release

$ helm repo update

$ helm install yunikorn yunikorn/yunikorn --namespace yunikorn --version 1.6.0 --create-namespace --set embedAdmissionController=false

Submit a Spark app to YuniKorn enabled cluster:

$ kubectl apply -f examples/pi-on-yunikorn.yaml

$ kubectl describe pod pi-on-yunikorn-0-driver
...
Events:
  Type    Reason             Age   From      Message
  ----    ------             ----  ----      -------
  Normal  Scheduling         14s   yunikorn  default/pi-on-yunikorn-0-driver is queued and waiting for allocation
  Normal  Scheduled          14s   yunikorn  Successfully assigned default/pi-on-yunikorn-0-driver to node docker-desktop
  Normal  PodBindSuccessful  14s   yunikorn  Pod default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop
  Normal  TaskCompleted      6s    yunikorn  Task default/pi-on-yunikorn-0-driver is completed
  Normal  Pulled             13s   kubelet   Container image "apache/spark:4.0.0-preview2" already present on machine
  Normal  Created            13s   kubelet   Created container spark-kubernetes-driver
  Normal  Started            13s   kubelet   Started container spark-kubernetes-driver

$ kubectl delete sparkapp pi-on-yunikorn
sparkapplication.spark.apache.org "pi-on-yunikorn" deleted

Try nightly build for testing

As of now, you can try spark-kubernetes-operator nightly version in the following way.

$ helm install spark-kubernetes-operator \
https://nightlies.apache.org/spark/charts/spark-kubernetes-operator-0.1.0-SNAPSHOT.tgz

Clean Up

Check the existing Spark applications and clusters. If exists, delete them.

$ kubectl get sparkapp
No resources found in default namespace.

$ kubectl get sparkcluster
No resources found in default namespace.

Remove HelmChart and CRDs.

$ helm uninstall spark-kubernetes-operator

$ kubectl delete crd sparkapplications.spark.apache.org

$ kubectl delete crd sparkclusters.spark.apache.org

In case of nightly builds, remove the snapshot image.

$ docker rmi apache/spark-kubernetes-operator:main-snapshot

Contributing

Please review the Contribution to Spark guide for information on how to get started contributing to the project.

About

Apache Spark Kubernetes Operator

Resources

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE-binary

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 92.5%
  • Python 5.3%
  • Other 2.2%