titipata/run_spark.md

Last active August 29, 2015 14:17

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/titipata/cf4d38adfe0f509336e1.js"></script>
Save titipata/cf4d38adfe0f509336e1 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

run_spark.md

Run PySpark from Amazon EC2

Here is suggestion on how to run pyspark from Amazon EC2:

IPYTHON_OPTS="notebook --ip=* --no-browser" ~/spark-1.2.0-bin-hadoop1/bin/pyspark --master local[4] --driver-memory 4g --executor-memory 4g

For help, we can do something like:

spark-1.2.0-bin-hadoop1/bin/pyspark --help

For example, we want to get larger output, where we can do:

--conf PROP=VALUE
IPYTHON_OPTS="notebook --ip=* --no-browser" ~/spark-1.2.0-bin-hadoop1/bin/pyspark --master local[4] --driver-memory 4g --executor-memory 4g --conf spark.driver.maxResultSize=4096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment