Advanced setup Anaconda + IPython + scikit-learn + PySpark on course machine

For those like me who wish to continue learning about ML using scientific Python stack, check this video workshop by Jake VanderPlas

Install Anaconda or Miniconda, you should be familiar with linux shell. Vagrant Spark VM is Ubuntu 32bit and Python 2.7 until PySpark for py3 not yet released. Get download url from http://continuum.io/downloads#all Depending on your needs if you wish only selected packages, get Miniconda.

vagrant ssh
curl -L https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda-2.3.0-Linux-x86.sh | bash

Wait until download and install complete. Anaconda installed to /home/vagrant/anaconda/

Now tweak Notebook upstart job config and modify PATH env var to launch the Anaconda distribution from your home directory

sudo nano /etc/init/notebook.conf

change env PATH to

env PATH=/home/vagrant/anaconda/bin/:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/local/bin/spark-1.3.1-bin-hadoop2.6/bin

Exit and save config, now reload upstart config:

sudo initctl reload notebook

Optional if you wish to change IPython notebooks directory:

echo "c.NotebookApp.notebook_dir = u'/vagrant'" >> ~/.ipython/profile_pyspark/ipython_notebook_config.py

restart job:

sudo restart notebook

evrial/pyspark_anaconda.md