Skip to content

Instantly share code, notes, and snippets.

@titipata
Last active August 29, 2015 14:13
Show Gist options
  • Save titipata/2909e676f4b1b11cc24b to your computer and use it in GitHub Desktop.
Save titipata/2909e676f4b1b11cc24b to your computer and use it in GitHub Desktop.
Notes how I migrate Spookystuff to other clusters

Spooky Stuff Migration

  • Start the Ubuntu clusters
  • Do following lines
sudo apt-get update
sudo apt-get install openjdk-7-jdk (to install java)
  • Download anaconda
  • Download phantomjs
  • Move spark-1.1.1 to the cluster (we'll use 1.2.0 later if it's ready)
scp -i <key_pair> -r spark-1.1.1-bin-hadoop1 ubuntu@<amazon_ip>:~/.
  • Copy spooky stuff
scp -i <key_pair> -r spookystuff ubuntu@<amazon_ip>:~/.
  • Then adjust .bash_profile and test spooky stuff i.e. add JAVA_HOME, Phantomjs path, Anaconda, SPARK_HOME

###After testing Spookystuff, we will then try ISpark and ISpooky

  • Move ISpark, ISpooky to cluster
scp -i <key_pair> -r ISpark ubuntu@<amazon_ip>:~/.
scp -i <key_pair> -r ISpooky ubuntu@<amazon_ip>:~/.
  • create ipython profile spooky: ipython profile create spooky
  • In ipython_config.py, customize as follows:
# Configuration file for ipython.                                                                                                    
import os
c = get_config()

SPARK_HOME = os.environ['SPARK_HOME']
# the above line can be replaced with: SPARK_HOME = '${INSERT_INSTALLATION_DIR_OF_SPARK}'
MASTER = 'local[4]'

c.KernelManager.kernel_cmd = [SPARK_HOME+"/bin/spark-submit",
 "--master", MASTER,
 "--class", "org.tribbloid.ispooky.SpookyMain",
 "--executor-memory", "2G",
 "--jars", "/home/ubuntu/spookystuff/shell/target/scala-2.10/spookystuff-shell-assembly-0.3.0-SNAPSHOT.jar", "/home/ubuntu/ISpooky/target/ispooky-assembly-0.1.0-SNAPSHOT.jar",
 "--profile", "{connection_file}",
 "--interp", "Spooky",
 "--parent"]

c.NotebookApp.ip = '*' # only add this line if you want IPython-notebook being open to the public                                    
c.NotebookApp.open_browser = False

###NOTE for clusters

  • check .bashrc and .bash_profile
  • copy phantomjs to all nodes
  • use hadoop hdfs to save the file (i.e. all nodes can access)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment