This allows us to refer to other nodes using names instead of IP addresses.
sudo nano /etc/hosts- Example on master:
- Control+X, then Y to save.
So Spark can communicate with other servers without continually entering passwords
ssh-keygen- Hit enter. Do not specify a file name.
- Press enter again twice to skip passphrase.
- Copy key to all other VMs
ssh-copy-id youruser@slave01Note: check permissions on home and ssh directories if master cannot connect to worker: /home/ufo ownership is 700 /home/ufo/.ssh ownership is 700 /home/ufo/.ssh/authorized_keys ownership is 600
- Review current firewall rules:
sudo iptables --line -vnL - Spark requires ports 4040, 6066, 7077, 8080, and 8081 open. The rules to allow traffic on these ports need to be above the REJECT rule, in our case usually line 5.
- Repeat for each port on all machines
Default language for Spark
wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgztar xvf scala-2.11.7.tgzsudo mv scala-2.11.7 /usr/libsudo ln -s /usr/lib/scala-2.11.7 /usr/lib/scalaexport PATH=$PATH:/usr/lib/scala/bin- Verify installation
scala -version
The star of the party
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgztar xvf spark-1.6.0-bin-hadoop2.6.tgzexport SPARK_HOME=$HOME/spark-1.6.0-bin-hadoop2.6export PATH=$PATH:$SPARK_HOME/bin
So the primary node can start workers remotely
cd spark-1.6.0-bin-hadoop2.6/conftouch slavessudo nano slaves- Enter the following:
- Control+X, then Y to save.
Spark was having trouble communicating with master node by name, so author added this step
- Skip this step if you are already in
confdirectory:cd spark-1.6.0-bin-hadoop2.6/conf - touch spark-env.sh
- sudo nano spark-env.sh
- Enter the following (replace master IP address with actual):
- Control+X, then Y to save.
- (From the Spark directory)
sbin/start-all.sh - Navigate to http://192.168.1.4:8080 (replace with your master node's IP address) to access Spark administration page.