Skip to content

Instantly share code, notes, and snippets.

@jkleckner
Last active December 6, 2019 01:10
Show Gist options
  • Save jkleckner/9c5a5c235064f80c90ec952f53c843df to your computer and use it in GitHub Desktop.
Save jkleckner/9c5a5c235064f80c90ec952f53c843df to your computer and use it in GitHub Desktop.
Output of spark-pi-local.sh for spark-on-kubernetes questions
The challenge below is to eliminate all of the sleep invocations and
make this script run to completion.
+ helm init
$HELM_HOME has been configured at /Users/jim/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
+ sleep 30
+ '[' default '!=' default ']'
+ helm install incubator/sparkoperator --name spark-test --namespace spark-operator --set sparkJobNamespace=default --set enableMetrics=false
NAME: spark-test
LAST DEPLOYED: Thu Dec 5 15:10:34 2019
NAMESPACE: spark-operator
STATUS: DEPLOYED
RESOURCES:
==> v1/ClusterRole
NAME AGE
spark-test-sparkoperator-cr 0s
==> v1/ClusterRoleBinding
NAME AGE
spark-test-sparkoperator-crb 0s
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
spark-test-sparkoperator 0/1 1 0 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
spark-test-sparkoperator-7c4d75c74b-rxck7 0/1 ContainerCreating 0 0s
==> v1/Role
NAME AGE
spark-role 0s
==> v1/RoleBinding
NAME AGE
spark-role-binding 0s
==> v1/ServiceAccount
NAME SECRETS AGE
spark-test-spark 1 0s
spark-test-sparkoperator 1 0s
==> v1beta1/CustomResourceDefinition
NAME AGE
scheduledsparkapplications.sparkoperator.k8s.io 0s
sparkapplications.sparkoperator.k8s.io 0s
+ sleep 10
+ kubectl apply --validate=true -f spark-pi.yaml
sparkapplication.sparkoperator.k8s.io/spark-pi created
+ sleep 10
The key question of this example is how to wait reliably for spark-pi to start and finish then know whether it worked.
Note that logs -f knows how to wait until completion...
Note also that there is a race condition that this logs command will fail if the job has not started
+ kubectl logs -f -n default spark-pi-driver
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_K8S_CMD=driver
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.1.1.31 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal
19/12/05 23:10:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/12/05 23:10:56 INFO SparkContext: Running Spark version 2.4.5-SNAPSHOT
19/12/05 23:10:56 INFO SparkContext: Submitted application: Spark Pi
19/12/05 23:10:56 INFO SecurityManager: Changing view acls to: root
19/12/05 23:10:56 INFO SecurityManager: Changing modify acls to: root
19/12/05 23:10:56 INFO SecurityManager: Changing view acls groups to:
19/12/05 23:10:56 INFO SecurityManager: Changing modify acls groups to:
19/12/05 23:10:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
19/12/05 23:10:57 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
19/12/05 23:10:57 INFO SparkEnv: Registering MapOutputTracker
19/12/05 23:10:57 INFO SparkEnv: Registering BlockManagerMaster
19/12/05 23:10:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/05 23:10:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/05 23:10:57 INFO DiskBlockManager: Created local directory at /var/data/spark-11402627-cc4b-4aad-8700-905419c63f8b/blockmgr-1ef55970-57a0-4241-a836-77668a1b54c1
19/12/05 23:10:57 INFO MemoryStore: MemoryStore started with capacity 117.0 MB
19/12/05 23:10:57 INFO SparkEnv: Registering OutputCommitCoordinator
19/12/05 23:10:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/12/05 23:10:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-1575587448751-driver-svc.default.svc:4040
19/12/05 23:10:57 INFO SparkContext: Added JAR file:///opt/spark/examples/jars/spark-examples_2.11-2.4.5-SNAPSHOT.jar at spark://spark-pi-1575587448751-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.4.5-SNAPSHOT.jar with timestamp 1575587457844
19/12/05 23:11:00 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
19/12/05 23:11:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
19/12/05 23:11:00 INFO NettyBlockTransferService: Server created on spark-pi-1575587448751-driver-svc.default.svc:7079
19/12/05 23:11:00 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/12/05 23:11:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None)
19/12/05 23:11:00 INFO BlockManagerMasterEndpoint: Registering block manager spark-pi-1575587448751-driver-svc.default.svc:7079 with 117.0 MB RAM, BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None)
19/12/05 23:11:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None)
19/12/05 23:11:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None)
19/12/05 23:11:05 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.1.32:38306) with ID 1
19/12/05 23:11:06 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
19/12/05 23:11:06 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.1.32:32921 with 117.0 MB RAM, BlockManagerId(1, 10.1.1.32, 32921, None)
19/12/05 23:11:06 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
19/12/05 23:11:06 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
19/12/05 23:11:06 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
19/12/05 23:11:06 INFO DAGScheduler: Parents of final stage: List()
19/12/05 23:11:06 INFO DAGScheduler: Missing parents: List()
19/12/05 23:11:06 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
19/12/05 23:11:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 117.0 MB)
19/12/05 23:11:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 117.0 MB)
19/12/05 23:11:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-pi-1575587448751-driver-svc.default.svc:7079 (size: 1381.0 B, free: 117.0 MB)
19/12/05 23:11:06 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
19/12/05 23:11:06 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
19/12/05 23:11:06 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
19/12/05 23:11:06 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.1.1.32, executor 1, partition 0, PROCESS_LOCAL, 7885 bytes)
19/12/05 23:11:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.1.1.32:32921 (size: 1381.0 B, free: 117.0 MB)
19/12/05 23:11:07 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.1.1.32, executor 1, partition 1, PROCESS_LOCAL, 7885 bytes)
19/12/05 23:11:07 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 695 ms on 10.1.1.32 (executor 1) (1/2)
19/12/05 23:11:07 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 96 ms on 10.1.1.32 (executor 1) (2/2)
19/12/05 23:11:07 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19/12/05 23:11:07 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.087 s
19/12/05 23:11:07 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.257914 s
Pi is roughly 3.1415557077785388
19/12/05 23:11:07 INFO SparkUI: Stopped Spark web UI at http://spark-pi-1575587448751-driver-svc.default.svc:4040
19/12/05 23:11:07 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
19/12/05 23:11:07 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
19/12/05 23:11:07 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/12/05 23:11:07 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/12/05 23:11:07 INFO MemoryStore: MemoryStore cleared
19/12/05 23:11:07 INFO BlockManager: BlockManager stopped
19/12/05 23:11:07 INFO BlockManagerMaster: BlockManagerMaster stopped
19/12/05 23:11:07 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/12/05 23:11:07 INFO SparkContext: Successfully stopped SparkContext
19/12/05 23:11:07 INFO ShutdownHookManager: Shutdown hook called
19/12/05 23:11:07 INFO ShutdownHookManager: Deleting directory /var/data/spark-11402627-cc4b-4aad-8700-905419c63f8b/spark-c44c60a7-9d44-43f7-8564-10c5c6d54645
19/12/05 23:11:07 INFO ShutdownHookManager: Deleting directory /tmp/spark-8a572b48-2c92-489c-b2e4-1ec1734dbedc
++ kubectl get -n default pod/spark-pi-driver '-o=jsonpath={.status.containerStatuses[*].state.terminated.exitCode}'
+ exitCode=0
exitCode is 0
+ kubectl get -n default sparkapplications spark-pi -o 'jsonpath={"ApplicationState:"}{.status.applicationState.state}{"\nExecutorState:"}{.status.executorState.*}{"\n"}'
ApplicationState:COMPLETED
ExecutorState:FAILED
++ kubectl get -n default sparkapplications spark-pi -o 'jsonpath={.status.applicationState.state}'
+ statusCode=COMPLETED
statusCode is COMPLETED
Does a statusCode of COMPLETED imply success in the same way that an exitCode of 0 does?
Why is the statusCode for the executors FAILED?
Shouldn't the spark.stop() call cause the executor to exit cleanly?
+ helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
spark-test 1 Thu Dec 5 15:10:34 2019 DEPLOYED sparkoperator-0.4.7 v1beta2-1.0.1-2.4.4 spark-operator
+ kubectl delete sparkapplication -n default spark-pi
sparkapplication.sparkoperator.k8s.io "spark-pi" deleted
+ sleep 15
+ helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
spark-test 1 Thu Dec 5 15:10:34 2019 DEPLOYED sparkoperator-0.4.7 v1beta2-1.0.1-2.4.4 spark-operator
Note that helm delete of spark-test does not remove the spark-operator namespace so it is not idempotent.
This is understandable since the namespace might have pre-existed and might be used elsewhere.
+ helm delete --purge spark-test
release "spark-test" deleted
+ sleep 15
+ helm list
+ '[' default '!=' default ']'
+ helm reset
Tiller (the Helm server-side component) has been uninstalled from your Kubernetes Cluster.
+ sleep 20
+ kubectl delete ns spark-operator
namespace "spark-operator" deleted
+ sleep 10
+ exit 0
@jkleckner
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment