Skip to content

Instantly share code, notes, and snippets.

@belenaj
Last active August 14, 2020 14:09
Show Gist options
  • Save belenaj/6b75b75f8a22f2db0f86074aad78b54d to your computer and use it in GitHub Desktop.
Save belenaj/6b75b75f8a22f2db0f86074aad78b54d to your computer and use it in GitHub Desktop.
spark-app-docker-emr-6.0.0
# https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-docker.html
# https://aws.amazon.com/blogs/big-data/run-spark-applications-with-docker-using-amazon-emr-6-0-0-beta/
FROM amazoncorretto:8
RUN yum -y update
RUN yum -y install yum-utils
RUN yum -y groupinstall development
RUN yum list python3*
RUN yum -y install python3 python3-dev python3-pip python3-virtualenv
RUN python -V
RUN python3 -V
ENV PYSPARK_DRIVER_PYTHON python3
ENV PYSPARK_PYTHON python3
RUN pip3 install --upgrade pip
RUN pip3 install numpy panda
RUN python3 -c "import numpy as np"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment