Skip to content

Instantly share code, notes, and snippets.

@titipata
Last active August 29, 2015 14:17
Show Gist options
  • Save titipata/cf4d38adfe0f509336e1 to your computer and use it in GitHub Desktop.
Save titipata/cf4d38adfe0f509336e1 to your computer and use it in GitHub Desktop.

Run PySpark from Amazon EC2

Here is suggestion on how to run pyspark from Amazon EC2:

IPYTHON_OPTS="notebook --ip=* --no-browser" ~/spark-1.2.0-bin-hadoop1/bin/pyspark --master local[4] --driver-memory 4g --executor-memory 4g

For help, we can do something like:

spark-1.2.0-bin-hadoop1/bin/pyspark --help

For example, we want to get larger output, where we can do:

--conf PROP=VALUE
IPYTHON_OPTS="notebook --ip=* --no-browser" ~/spark-1.2.0-bin-hadoop1/bin/pyspark --master local[4] --driver-memory 4g --executor-memory 4g --conf spark.driver.maxResultSize=4096
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment