Skip to content

Instantly share code, notes, and snippets.

@danish-rehman
Last active July 12, 2016 19:20
Show Gist options
  • Select an option

  • Save danish-rehman/9e2c53f6bc5d266c838e0cb324ae9bdd to your computer and use it in GitHub Desktop.

Select an option

Save danish-rehman/9e2c53f6bc5d266c838e0cb324ae9bdd to your computer and use it in GitHub Desktop.
Spark : Standalone setup on development box

Follow this official link

Download - link

Pre-built with hadoop 2.6 Unzip it and keep it in a folder.

Set up environment

# Apache spark
export SPARK_SSH_FOREGROUND="yes"

# Add the Spark home directory
export SPARK_HOME="/Users/drehman/Apps/spark-1.6.2-bin-hadoop2.6"

# Add the PySpark classes to the Python path:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

# Pyspark looks for this dependency. Do not pip install it.
export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.9-src.zip:$PYTHONPATH"

PySpark shell

cd $SPARK_HOME
python bin/pyspark

Run master

cd $SPARK_HOME
./sbin/start-master.sh

Check master UI - http://localhost:8080/

Run first worker

cd $SPARK_HOME
./sbin/start-slave.sh spark://SANM-MBP01L.local:7077

Check master UI again - http://localhost:8080/

Run first example

cd $SPARK_HOME/examples/src/main/python
python pi.py 1

Stop the worker

cd $SPARK_HOME
./sbin/stop-slave.sh

Check master UI again - http://localhost:8080/

Start a worker using cpu cores 2 and RAM 4GB

cd $SPARK_HOME
./sbin/start-slave.sh spark://SANM-MBP01L.local:7077 --cores 2 --memory 4g

Check master UI again - http://localhost:8080/

Stop all

Drivers and workers

cd $SPARK_HOME
./sbin/stop-all.sh

Note

Mastering apache spark book

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment