This allows us to refer to other nodes using names instead of IP addresses.
sudo nano /etc/hosts
- Example on master:
- Control+X, then Y to save.
So Spark can communicate with other servers without continually entering passwords
ssh-keygen
- Hit enter. Do not specify a file name.
- Press enter again twice to skip passphrase.
- Copy key to all other VMs
ssh-copy-id youruser@slave01
Note: check permissions on home and ssh directories if master cannot connect to worker: /home/ufo ownership is 700 /home/ufo/.ssh ownership is 700 /home/ufo/.ssh/authorized_keys ownership is 600
- Review current firewall rules:
sudo iptables --line -vnL
- Spark requires ports 4040, 6066, 7077, 8080, and 8081 open. The rules to allow traffic on these ports need to be above the REJECT rule, in our case usually line 5.
- Repeat for each port on all machines
Default language for Spark
wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz
tar xvf scala-2.11.7.tgz
sudo mv scala-2.11.7 /usr/lib
sudo ln -s /usr/lib/scala-2.11.7 /usr/lib/scala
export PATH=$PATH:/usr/lib/scala/bin
- Verify installation
scala -version
The star of the party
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz
tar xvf spark-1.6.0-bin-hadoop2.6.tgz
export SPARK_HOME=$HOME/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
So the primary node can start workers remotely
cd spark-1.6.0-bin-hadoop2.6/conf
touch slaves
sudo nano slaves
- Enter the following:
- Control+X, then Y to save.
Spark was having trouble communicating with master node by name, so author added this step
- Skip this step if you are already in
conf
directory:cd spark-1.6.0-bin-hadoop2.6/conf
- touch spark-env.sh
- sudo nano spark-env.sh
- Enter the following (replace master IP address with actual):
- Control+X, then Y to save.
- (From the Spark directory)
sbin/start-all.sh
- Navigate to http://192.168.1.4:8080 (replace with your master node's IP address) to access Spark administration page.