- Start the Ubuntu clusters
- Do following lines
sudo apt-get update
sudo apt-get install openjdk-7-jdk (to install java)
- Download
anaconda
- Download
phantomjs
- Move spark-1.1.1 to the cluster (we'll use 1.2.0 later if it's ready)
scp -i <key_pair> -r spark-1.1.1-bin-hadoop1 ubuntu@<amazon_ip>:~/.
- Copy spooky stuff
scp -i <key_pair> -r spookystuff ubuntu@<amazon_ip>:~/.
- Then adjust
.bash_profile
and test spooky stuff i.e. addJAVA_HOME, Phantomjs path, Anaconda, SPARK_HOME
###After testing Spookystuff, we will then try ISpark and ISpooky
- Move ISpark, ISpooky to cluster
scp -i <key_pair> -r ISpark ubuntu@<amazon_ip>:~/.
scp -i <key_pair> -r ISpooky ubuntu@<amazon_ip>:~/.
- create ipython profile spooky:
ipython profile create spooky
- In ipython_config.py, customize as follows:
# Configuration file for ipython.
import os
c = get_config()
SPARK_HOME = os.environ['SPARK_HOME']
# the above line can be replaced with: SPARK_HOME = '${INSERT_INSTALLATION_DIR_OF_SPARK}'
MASTER = 'local[4]'
c.KernelManager.kernel_cmd = [SPARK_HOME+"/bin/spark-submit",
"--master", MASTER,
"--class", "org.tribbloid.ispooky.SpookyMain",
"--executor-memory", "2G",
"--jars", "/home/ubuntu/spookystuff/shell/target/scala-2.10/spookystuff-shell-assembly-0.3.0-SNAPSHOT.jar", "/home/ubuntu/ISpooky/target/ispooky-assembly-0.1.0-SNAPSHOT.jar",
"--profile", "{connection_file}",
"--interp", "Spooky",
"--parent"]
c.NotebookApp.ip = '*' # only add this line if you want IPython-notebook being open to the public
c.NotebookApp.open_browser = False
###NOTE for clusters
- check
.bashrc
and.bash_profile
- copy
phantomjs
to all nodes - use hadoop hdfs to save the file (i.e. all nodes can access)