Skip to content

Instantly share code, notes, and snippets.

@pablosjv
Last active August 27, 2021 16:10
Show Gist options
  • Save pablosjv/9848c7eef71ee846de0465af264c9b5b to your computer and use it in GitHub Desktop.
Save pablosjv/9848c7eef71ee846de0465af264c9b5b to your computer and use it in GitHub Desktop.
Large Scale Pytorch Inference Pipeline: Spark vs Dask - Code Examples
#!/bin/sh
spark-submit \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG="hdfs:///user/hadoop/config.json" \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=${YOUR_DOCKER_IMAGE} \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG="hdfs:///user/hadoop/config.json" \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=${YOUR_DOCKER_IMAGE} \
s3://your-bucket/path/to/your/script.py
#!/bin/sh
# -----------------------------------------------------------------------------
# 2. Make sure docker is installed.
# -----------------------------------------------------------------------------
sudo yum install -y docker
sudo systemctl start docker
# -----------------------------------------------------------------------------
# 2. Check if running on the master node. If not, we are done.
# -----------------------------------------------------------------------------
grep -q '"isMaster": true' /mnt/var/lib/info/instance.json ||
{ echo "Not running on master node, nothing to do" && exit 0; }
# -----------------------------------------------------------------------------
# 3. Login to the registry and move the file to hdfs
# -----------------------------------------------------------------------------
sudo $(aws ecr get-login --region {{ aws.region }} --no-include-email)
sudo hadoop fs -put /root/.docker/config.json /user/hadoop/
echo ">>> DONE!!!"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment