- spark 2.4.7 or 3.0.1
- minikube 1.15.3 or 1.19
- kubectl 1.19
Checking the compatibility matrix of Kubernetes Client: https://github.com/fabric8io/kubernetes-client/blob/master/README.md#compatibility-matrix
Spark and minikube compatibility matrix:
spark 3.0.1 | spark 2.4.7 | |
---|---|---|
minikube 1.19.2 | ✓ | - |
minikube 1.15.3 | ✓ | ✓ |
We assume that here is a minikube, and it work well.
-
Build Spark pi application be an docker image
cd $SPARK_HOME ./bin/docker-image-tool.sh -r <repo> -t my-tag build ./bin/docker-image-tool.sh -r <repo> -t my-tag push #Push image to docker-hub
Note: means dockhub repository For instance, executing command:
./bin/docker-image-tool.sh -r mathstana -t 2.4.7 build
,you would get three docker images after finished.REPOSITORY TAG IMAGE ID CREATED SIZE mathstana/spark-r 2.4.7 b34fefd97dc1 21 hours ago 1.11GB mathstana/spark-py 2.4.7 49c331112048 21 hours ago 1.05GB mathstana/spark 2.4.7 2658f33347ae 21 hours ago 553MB
-
Get k8s master info by command:
kubectl cluster-info
And you would get the info:
Kubernetes master is running at https://127.0.0.1:32780 KubeDNS is running at https://127.0.0.1:32780/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
-
Configure RBAC
kubectl create namespace spark-n kubectl create serviceaccount --namespace spark-n spark-pi kubectl create clusterrolebinding spark-c-pi \ --clusterrole=edit \ --serviceaccount=spark-n:spark-pi \ --namespace=spark-n
-
submit spark pi job to minikube via the below command:
cd $SPARK_HOME #Java ./bin/spark-submit \ --master k8s://https://127.0.0.1:32780 \ --deploy-mode cluster \ --name spark-pi-new \ --class org.apache.spark.examples.SparkPi \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi \ --conf spark.executor.instances=3 \ --conf spark.kubernetes.container.image=mathstana/spark:3.0.1 \ --conf spark.kubernetes.namespace=spark-n \ local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar 1000 #Python --master k8s://https://127.0.0.1:32780 \ --deploy-mode cluster \ --name python-spark-pi \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.container.image=mathstana/spark-py:2.4.7 \ --conf spark.kubernetes.namespace=spark-n \ local:///opt/spark/examples/src/main/python/pi.py 200
The UI associated with any application can be accessed locally using kubectl port-forward
kubectl port-forward <driver-pod-name> 4040:4040
To get some basic information about the scheduling decisions made around the driver pod, you can run:
kubectl describe pod <spark-driver-pod>
If the pod has encountered a runtime error, the status can be probed further using:
kubectl logs <spark-driver-pod>
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark application, including all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of the Spark application.
You might encounter some exceptions when submit spark-example application:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
...
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-1600850824046-driver] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
...
Caused by: java.net.SocketException: Broken pipe (Write failed)
...
-
Solution
Upgrade or downgrade spark version to compatibility minikube version。In this gist, using spark 3.0.1 and minikube 1.19, and it works well.
More ref: [stackoverflow]Why am I not able to run sparkPi example on a Kubernetes (K8s) cluster?
-
Solution
If you start the minikube with docker driver, check the
docker desktop
resources setting. Docker desktop would set 2GB memory as default for container using. You could try to rise the amount of memory to resolve this issue.