Tested with
Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
I suggest to download the pre-built version with Hadoop 2.7.
mkdir ~/opt
cd ~/opt
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar xvzf spark-2.1.0-bin-hadoop2.7.tgz
rm -f spark-2.1.0-bin-hadoop2.7.tgz
Download and install Anaconda.
Once you have installed Anaconda open your terminal and type
conda install jupyter
conda update jupyter
Open terminal and type
echo "export PYTHONPATH=~/opt/spark-2.1.0-bin-hadoop2.7/python:~/opt/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip" >> ~/.profile
Now you can source it to make changes available in this terminal
source ~/.profile
or Cmd+Q
your terminal and reopen it.
jupyter notebook --ip=0.0.0.0 --NotebookApp.token=''
Now the Jupyter notebook should open in your browser.
To check whether Spark is correctly linked create a new Python 2
file inside Jupyter Notebook.
You should see something like this
In [1]: import pyspark
sc = pyspark.SparkContext('local[*]')
sqlContext = SQLContext(sc)
sc
Out[1]: <pyspark.context.SparkContext at 0x1049bdf90>