Skip to content

Instantly share code, notes, and snippets.

@pracucci
Created May 23, 2018 17:16
Show Gist options
  • Save pracucci/21e2fed8037ed20e9e939effbae1115a to your computer and use it in GitHub Desktop.
Save pracucci/21e2fed8037ed20e9e939effbae1115a to your computer and use it in GitHub Desktop.
Installation script for Spark JobServer on EMR
#!/bin/bash
#
# This is the script run on master as bootstrap action.
#
set -e
# Config
SJS_VERSION=0.8.0
EMR_VERSION=5.13.0
# Run only on the master node
if grep -q isMaster /mnt/var/lib/info/instance.json | grep -q false; then
echo "Skipping Spark JobServer bootstrap because the node is NOT a master"
exit 0
fi
# Download and extract the Spark JobServer custom package
mkdir -p /mnt/lib/spark-jobserver /mnt/var/log/spark-jobserver /mnt/tmp/spark-jobserver
sudo rm -fr /mnt/lib/spark-jobserver/*
aws s3 cp s3://BUCKET/spark-job-server-${SJS_VERSION}-emr-${EMR_VERSION}.tar.gz /mnt/lib/spark-jobserver/
cd /mnt/lib/spark-jobserver && tar -zxf spark-job-server-${SJS_VERSION}-emr-${EMR_VERSION}.tar.gz
# Create a shutdown action to stop Spark JobServer on node shutdown
# See: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html
sudo mkdir -p /mnt/var/lib/instance-controller/public/shutdown-actions/
sudo echo -e '#!/bin/bash\nexec /mnt/lib/spark-jobserver/server_stop.sh' > /mnt/var/lib/instance-controller/public/shutdown-actions/spark-job-server-shutdown.sh
sudo chmod +x /mnt/var/lib/instance-controller/public/shutdown-actions/spark-job-server-shutdown.sh
# IMPORTANT: the bootstrap action runs BEFORE Amazon EMR installs the applications
# and so we can't run the Spark JobServer now, since the dependencies are
# still not running. For this reason, the Spark JobServer is started with a
# custom step using the command-runner.jar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment