Skip to content

Instantly share code, notes, and snippets.

@smurching
Last active June 14, 2022 01:36
Show Gist options
  • Save smurching/aff4e7ed25f8ff2eb9d920cd837f48f8 to your computer and use it in GitHub Desktop.
Save smurching/aff4e7ed25f8ff2eb9d920cd837f48f8 to your computer and use it in GitHub Desktop.
Databricks run and await job: shell scripts

This gist contains some example bash scripts for triggering and awaiting a one-time job run using existing Databricks CLI APIs.

Rough edges include:

  1. Parameter substitution into job JSON (need to implement this ourselves)
  2. Writing logic to trigger and await job status
  3. Updatability of shell script logic. Any customers that rely on this script would need to update it themselves, whereas updates could easily be pushed to an existing databricks runs submit CLI command with a --wait option (e.g. updating the default job polling interval). However, since we use the Databricks CLI for all API requests, any security/auth patches can be fetched by updating the version of the CLI used in the script.

#2 can be addressed through a --wait option to databricks runs submit. #1 requires implementing parameter substitution and so may be more work, but also isn't as complex - there isn't any branching logic to test, just that parameters are properly passed through.

Notably, DBX already supports #1 (see docs)

#!/usr/bin/env bash
cat << EOF
{
"notebook_task": {
"notebook_path": "notebooks/Train",
"base_parameters": {
"env": "staging"
}
},
"new_cluster": {
"spark_version": "10.5.x-cpu-ml-scala2.12",
"node_type_id": "Standard_D3_v2",
"num_workers": 3
},
"git_source": {
"git_url": "$GIT_URL",
"git_provider": "$GIT_PROVIDER",
"git_commit": "$GIT_COMMIT"
}
}
EOF
#!/usr/bin/env bash
set -e
JOB_JSON=$(./integration-test-job-spec.sh)
echo "$JOB_JSON"
# Submit the job run
RUNS_SUBMIT_JSON=$(databricks runs submit --json "$JOB_JSON")
JOB_RUN_ID=$(echo "$RUNS_SUBMIT_JSON" | jq .run_id)
JOB_JSON=$(databricks runs get --run-id "$JOB_RUN_ID")
echo "Launched job. View results at $(echo "$JOB_JSON" | jq -r .run_page_url)"
# Wait for job completion
while true;
do
JOB_JSON=$(databricks runs get --run-id "$JOB_RUN_ID")
JOB_STATE=$(echo "$JOB_JSON" | jq .state)
JOB_LIFECYCLE_STATE=$(echo "$JOB_STATE" | jq -r .life_cycle_state)
JOB_RESULT_STATE=$(echo "$JOB_STATE" | jq -r .result_state)
JOB_STATE_MESSAGE=$(echo "$JOB_STATE" | jq .state_message)
if [ "$JOB_LIFECYCLE_STATE" = "TERMINATED" ] || [ "$JOB_LIFECYCLE_STATE" = "CANCELLED" ] || [ "$JOB_LIFECYCLE_STATE" = "SKIPPED" ]; then
if [ "$JOB_RESULT_STATE" = "SUCCESS" ] ; then
echo "Job completed successfully"
exit 0
fi
echo "Job failed with state $JOB_RESULT_STATE and state message $JOB_STATE_MESSAGE"
exit 1
fi
echo "Job still running, lifecycle state: $JOB_LIFECYCLE_STATE"
sleep 10
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment