Last active
December 6, 2019 01:10
-
-
Save jkleckner/9c5a5c235064f80c90ec952f53c843df to your computer and use it in GitHub Desktop.
Output of spark-pi-local.sh for spark-on-kubernetes questions
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The challenge below is to eliminate all of the sleep invocations and | |
make this script run to completion. | |
+ helm init | |
$HELM_HOME has been configured at /Users/jim/.helm. | |
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster. | |
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy. | |
To prevent this, run `helm init` with the --tiller-tls-verify flag. | |
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation | |
+ sleep 30 | |
+ '[' default '!=' default ']' | |
+ helm install incubator/sparkoperator --name spark-test --namespace spark-operator --set sparkJobNamespace=default --set enableMetrics=false | |
NAME: spark-test | |
LAST DEPLOYED: Thu Dec 5 15:10:34 2019 | |
NAMESPACE: spark-operator | |
STATUS: DEPLOYED | |
RESOURCES: | |
==> v1/ClusterRole | |
NAME AGE | |
spark-test-sparkoperator-cr 0s | |
==> v1/ClusterRoleBinding | |
NAME AGE | |
spark-test-sparkoperator-crb 0s | |
==> v1/Deployment | |
NAME READY UP-TO-DATE AVAILABLE AGE | |
spark-test-sparkoperator 0/1 1 0 0s | |
==> v1/Pod(related) | |
NAME READY STATUS RESTARTS AGE | |
spark-test-sparkoperator-7c4d75c74b-rxck7 0/1 ContainerCreating 0 0s | |
==> v1/Role | |
NAME AGE | |
spark-role 0s | |
==> v1/RoleBinding | |
NAME AGE | |
spark-role-binding 0s | |
==> v1/ServiceAccount | |
NAME SECRETS AGE | |
spark-test-spark 1 0s | |
spark-test-sparkoperator 1 0s | |
==> v1beta1/CustomResourceDefinition | |
NAME AGE | |
scheduledsparkapplications.sparkoperator.k8s.io 0s | |
sparkapplications.sparkoperator.k8s.io 0s | |
+ sleep 10 | |
+ kubectl apply --validate=true -f spark-pi.yaml | |
sparkapplication.sparkoperator.k8s.io/spark-pi created | |
+ sleep 10 | |
The key question of this example is how to wait reliably for spark-pi to start and finish then know whether it worked. | |
Note that logs -f knows how to wait until completion... | |
Note also that there is a race condition that this logs command will fail if the job has not started | |
+ kubectl logs -f -n default spark-pi-driver | |
++ id -u | |
+ myuid=0 | |
++ id -g | |
+ mygid=0 | |
+ set +e | |
++ getent passwd 0 | |
+ uidentry=root:x:0:0:root:/root:/bin/bash | |
+ set -e | |
+ '[' -z root:x:0:0:root:/root:/bin/bash ']' | |
+ SPARK_K8S_CMD=driver | |
+ case "$SPARK_K8S_CMD" in | |
+ shift 1 | |
+ SPARK_CLASSPATH=':/opt/spark/jars/*' | |
+ env | |
+ grep SPARK_JAVA_OPT_ | |
+ sort -t_ -k4 -n | |
+ sed 's/[^=]*=\(.*\)/\1/g' | |
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS | |
+ '[' -n '' ']' | |
+ '[' -n '' ']' | |
+ PYSPARK_ARGS= | |
+ '[' -n '' ']' | |
+ R_ARGS= | |
+ '[' -n '' ']' | |
+ '[' '' == 2 ']' | |
+ '[' '' == 3 ']' | |
+ case "$SPARK_K8S_CMD" in | |
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") | |
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.1.1.31 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal | |
19/12/05 23:10:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties | |
19/12/05 23:10:56 INFO SparkContext: Running Spark version 2.4.5-SNAPSHOT | |
19/12/05 23:10:56 INFO SparkContext: Submitted application: Spark Pi | |
19/12/05 23:10:56 INFO SecurityManager: Changing view acls to: root | |
19/12/05 23:10:56 INFO SecurityManager: Changing modify acls to: root | |
19/12/05 23:10:56 INFO SecurityManager: Changing view acls groups to: | |
19/12/05 23:10:56 INFO SecurityManager: Changing modify acls groups to: | |
19/12/05 23:10:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() | |
19/12/05 23:10:57 INFO Utils: Successfully started service 'sparkDriver' on port 7078. | |
19/12/05 23:10:57 INFO SparkEnv: Registering MapOutputTracker | |
19/12/05 23:10:57 INFO SparkEnv: Registering BlockManagerMaster | |
19/12/05 23:10:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information | |
19/12/05 23:10:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up | |
19/12/05 23:10:57 INFO DiskBlockManager: Created local directory at /var/data/spark-11402627-cc4b-4aad-8700-905419c63f8b/blockmgr-1ef55970-57a0-4241-a836-77668a1b54c1 | |
19/12/05 23:10:57 INFO MemoryStore: MemoryStore started with capacity 117.0 MB | |
19/12/05 23:10:57 INFO SparkEnv: Registering OutputCommitCoordinator | |
19/12/05 23:10:57 INFO Utils: Successfully started service 'SparkUI' on port 4040. | |
19/12/05 23:10:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-1575587448751-driver-svc.default.svc:4040 | |
19/12/05 23:10:57 INFO SparkContext: Added JAR file:///opt/spark/examples/jars/spark-examples_2.11-2.4.5-SNAPSHOT.jar at spark://spark-pi-1575587448751-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.4.5-SNAPSHOT.jar with timestamp 1575587457844 | |
19/12/05 23:11:00 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes. | |
19/12/05 23:11:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079. | |
19/12/05 23:11:00 INFO NettyBlockTransferService: Server created on spark-pi-1575587448751-driver-svc.default.svc:7079 | |
19/12/05 23:11:00 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy | |
19/12/05 23:11:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None) | |
19/12/05 23:11:00 INFO BlockManagerMasterEndpoint: Registering block manager spark-pi-1575587448751-driver-svc.default.svc:7079 with 117.0 MB RAM, BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None) | |
19/12/05 23:11:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None) | |
19/12/05 23:11:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-pi-1575587448751-driver-svc.default.svc, 7079, None) | |
19/12/05 23:11:05 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.1.32:38306) with ID 1 | |
19/12/05 23:11:06 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 | |
19/12/05 23:11:06 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.1.32:32921 with 117.0 MB RAM, BlockManagerId(1, 10.1.1.32, 32921, None) | |
19/12/05 23:11:06 INFO SparkContext: Starting job: reduce at SparkPi.scala:38 | |
19/12/05 23:11:06 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions | |
19/12/05 23:11:06 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38) | |
19/12/05 23:11:06 INFO DAGScheduler: Parents of final stage: List() | |
19/12/05 23:11:06 INFO DAGScheduler: Missing parents: List() | |
19/12/05 23:11:06 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents | |
19/12/05 23:11:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 117.0 MB) | |
19/12/05 23:11:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 117.0 MB) | |
19/12/05 23:11:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-pi-1575587448751-driver-svc.default.svc:7079 (size: 1381.0 B, free: 117.0 MB) | |
19/12/05 23:11:06 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163 | |
19/12/05 23:11:06 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1)) | |
19/12/05 23:11:06 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks | |
19/12/05 23:11:06 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.1.1.32, executor 1, partition 0, PROCESS_LOCAL, 7885 bytes) | |
19/12/05 23:11:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.1.1.32:32921 (size: 1381.0 B, free: 117.0 MB) | |
19/12/05 23:11:07 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.1.1.32, executor 1, partition 1, PROCESS_LOCAL, 7885 bytes) | |
19/12/05 23:11:07 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 695 ms on 10.1.1.32 (executor 1) (1/2) | |
19/12/05 23:11:07 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 96 ms on 10.1.1.32 (executor 1) (2/2) | |
19/12/05 23:11:07 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool | |
19/12/05 23:11:07 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.087 s | |
19/12/05 23:11:07 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.257914 s | |
Pi is roughly 3.1415557077785388 | |
19/12/05 23:11:07 INFO SparkUI: Stopped Spark web UI at http://spark-pi-1575587448751-driver-svc.default.svc:4040 | |
19/12/05 23:11:07 INFO KubernetesClusterSchedulerBackend: Shutting down all executors | |
19/12/05 23:11:07 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down | |
19/12/05 23:11:07 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) | |
19/12/05 23:11:07 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! | |
19/12/05 23:11:07 INFO MemoryStore: MemoryStore cleared | |
19/12/05 23:11:07 INFO BlockManager: BlockManager stopped | |
19/12/05 23:11:07 INFO BlockManagerMaster: BlockManagerMaster stopped | |
19/12/05 23:11:07 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! | |
19/12/05 23:11:07 INFO SparkContext: Successfully stopped SparkContext | |
19/12/05 23:11:07 INFO ShutdownHookManager: Shutdown hook called | |
19/12/05 23:11:07 INFO ShutdownHookManager: Deleting directory /var/data/spark-11402627-cc4b-4aad-8700-905419c63f8b/spark-c44c60a7-9d44-43f7-8564-10c5c6d54645 | |
19/12/05 23:11:07 INFO ShutdownHookManager: Deleting directory /tmp/spark-8a572b48-2c92-489c-b2e4-1ec1734dbedc | |
++ kubectl get -n default pod/spark-pi-driver '-o=jsonpath={.status.containerStatuses[*].state.terminated.exitCode}' | |
+ exitCode=0 | |
exitCode is 0 | |
+ kubectl get -n default sparkapplications spark-pi -o 'jsonpath={"ApplicationState:"}{.status.applicationState.state}{"\nExecutorState:"}{.status.executorState.*}{"\n"}' | |
ApplicationState:COMPLETED | |
ExecutorState:FAILED | |
++ kubectl get -n default sparkapplications spark-pi -o 'jsonpath={.status.applicationState.state}' | |
+ statusCode=COMPLETED | |
statusCode is COMPLETED | |
Does a statusCode of COMPLETED imply success in the same way that an exitCode of 0 does? | |
Why is the statusCode for the executors FAILED? | |
Shouldn't the spark.stop() call cause the executor to exit cleanly? | |
+ helm list | |
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE | |
spark-test 1 Thu Dec 5 15:10:34 2019 DEPLOYED sparkoperator-0.4.7 v1beta2-1.0.1-2.4.4 spark-operator | |
+ kubectl delete sparkapplication -n default spark-pi | |
sparkapplication.sparkoperator.k8s.io "spark-pi" deleted | |
+ sleep 15 | |
+ helm list | |
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE | |
spark-test 1 Thu Dec 5 15:10:34 2019 DEPLOYED sparkoperator-0.4.7 v1beta2-1.0.1-2.4.4 spark-operator | |
Note that helm delete of spark-test does not remove the spark-operator namespace so it is not idempotent. | |
This is understandable since the namespace might have pre-existed and might be used elsewhere. | |
+ helm delete --purge spark-test | |
release "spark-test" deleted | |
+ sleep 15 | |
+ helm list | |
+ '[' default '!=' default ']' | |
+ helm reset | |
Tiller (the Helm server-side component) has been uninstalled from your Kubernetes Cluster. | |
+ sleep 20 | |
+ kubectl delete ns spark-operator | |
namespace "spark-operator" deleted | |
+ sleep 10 | |
+ exit 0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The output from running this script:
on this branch:
associated with issue: