Jon Roosevelt RooseveltAdvisors

How to link Apache Spark 2.1.0 with IPython notebook (Ubuntu)

Tested with

Python 2.7, Ubuntu 16.04 LTS, Apache Spark 2.1.0 & Hadoop 2.7

Download Apache Spark & Build it

Download Apache Spark and build it or download the pre-built version.

How to link Apache Spark 2.1.0 with IPython notebook (Mac OS X)

Tested with

Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 2.1.0 & Hadoop 2.7

Download Apache Spark & Build it

Download Apache Spark and build it or download the pre-built version.

Install

sudo su
apt-get update -y
apt-get install -y software-properties-common python-software-properties || true
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get update -y
apt-get install -y zlib1g-dev curl libstdc++-5-dev make binutils libc-dev libgcc-5-dev git
cd /opt
mkdir /opt/osrm

anaconda2

Download and install Anaconda https://www.continuum.io/downloads. Restart Terminal. Or, if you’d prefer to not get the full Anaconda software, check out this post.

wget https://repo.continuum.io/archive/Anaconda2-4.3.1-MacOSX-x86_64.sh
bash Anaconda2-4.3.1-MacOSX-x86_64.sh

In terminal, type

/Users/zeta/anaconda/bin/pip install matlab_kernel

	[zookeeper]
	hostname=slave1.example.com
	hostname=slave2.example.com
	hostname=slave3.example.com
	port=2181
	timeout=6
	lock-path=/burrow/notifier

	[kafka "XX-prod"]
	broker=slave1.example.com

	import org.elasticsearch.spark._
	import org.apache.spark.sql._
	//val sqlContext = new SQLContext(sc)
	val options = Map("pushdown" -> "true", "es.nodes" -> "host_ip_here", "es.port" -> "9200",
	"es.nodes.wan.only" -> "true")
	sqlContext.read.format("es").options(options).load("index_name").write.mode(SaveMode.Overwrite).json("path_to_output")
	sc.esRDD("index_name",options)

	sudo su
	mkdir -p /etc/elasticsearch/analysis
	cd /etc/elasticsearch/analysis
	wget http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
	tar xvzf WNprolog-3.0.tar.gz
	mv prolog/wn_s.pl .
	rm -rf prolog
	rm -f WNprolog-3.0.tar.gz

	pip install -t dependencies -r requirements.txt
	cd dependencies
	zip -r ../dependencies.zip .