Created
August 26, 2019 22:50
-
-
Save mcmoe/1eded14f54a1dfcc9d92cb03b6f455ba to your computer and use it in GitHub Desktop.
How to set up pyspark and jupyter on aws ec2 instance
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Originally based on https://raw.githubusercontent.com/pzfreo/ox-clo/master/code/flintrock-jupyter.sh | |
sudo yum install gcc gcc-c++ -y | |
# sudo yum install python27-pip -y | |
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py | |
python get-pip.py | |
#sudo pip-2.7 install jupyter | |
sudo pip2.7 install jupyter | |
export PYSPARK_DRIVER_PYTHON=jupyter | |
export PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser' | |
pyspark --master spark://0.0.0.0:7077 \ | |
--packages org.apache.hadoop:hadoop-aws:2.7.4 --num-executors 3 --driver-memory 4g --executor-memory 4g |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To open locally ssh tunnel to the instance
Note: If you're on flintrock, you can easily ask it to describe the cluster to get its domain name.