Skip to content

Instantly share code, notes, and snippets.

@mbiemann
Last active June 10, 2024 18:23
Show Gist options
  • Save mbiemann/4064e6919f5777c51e06781077c3756b to your computer and use it in GitHub Desktop.
Save mbiemann/4064e6919f5777c51e06781077c3756b to your computer and use it in GitHub Desktop.

Glue PySpark Job

Connect to AWS usind SAML2AWS

saml2aws login --skip-prompt --disable-keychain

role=arn:aws:iam::xxx:role/xxx
credentials=$(aws sts assume-role --role-arn $role --role-session-name tmp --profile saml)
export AWS_ACCESS_KEY_ID=$(echo "$credentials" | grep -o '"AccessKeyId": "[^"]*' | cut -d '"' -f 4)
export AWS_SECRET_ACCESS_KEY=$(echo "$credentials" | grep -o '"SecretAccessKey": "[^"]*' | cut -d '"' -f 4)
export AWS_SESSION_TOKEN=$(echo "$credentials" | grep -o '"SessionToken": "[^"]*' | cut -d '"' -f 4)
export AWS_REGION=xxx

Install and Start Colima

Use Colima instead Docker Desktop as Container Runtime.

brew install colima docker
colima start

Launch Glue Libs Docker

# Open using Env Vars
docker run -it -v .:/home/glue_user/workspace/ -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN -e AWS_REGION=$AWS_REGION -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888 --name glue_job amazon/aws-glue-libs:glue_libs_4.0.0_image_01

# Open using AWS_PROFILE
docker run -it -v ~/.aws:/home/glue_user/.aws -v .:/home/glue_user/workspace/ -e AWS_PROFILE=$AWS_PROFILE -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888 --name glue_job amazon/aws-glue-libs:glue_libs_4.0.0_image_01

Run

# Submit Job
spark-submit src/myjob.py --JOB_NAME myjob

# PyTest
python3 -m pytest

# Open PySpark Console
pyspark
spark.sql("show databases").show()

# Open Jupyter
mkdir jupyter_workspace
/home/glue_user/jupyter/jupyter_start.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment