Tested with
Python 2.7, Ubuntu 16.04 LTS, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
I suggest to download the pre-built version with Hadoop 2.7.
cd /opt
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar xvzf spark-2.1.0-bin-hadoop2.7.tgz
rm -f spark-2.1.0-bin-hadoop2.7.tgz
Download and install Anaconda.
Once you have installed Anaconda open your terminal and type
conda install jupyter
conda update jupyter
Open terminal and type
echo "export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7" >> ~/.bashrc
echo "export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip" >> ~/.bashrc
Now you can source it to make changes available in this terminal
source ~/.bashrc
jupyter notebook --ip=0.0.0.0 --NotebookApp.token=''
Now the Jupyter notebook should open in your browser.
To check whether Spark is correctly linked create a new Python 2
file inside Jupyter Notebook.
You should see something like this
In [1]: import pyspark
sc = pyspark.SparkContext('local[*]')
sqlContext = SQLContext(sc)
sc
Out[1]: <pyspark.context.SparkContext at 0x1049bdf90>