Apache Spark 3-node Cluster (CentOS 6.8)

Edit Hosts file (all machines)

This allows us to refer to other nodes using names instead of IP addresses.

So Spark can communicate with other servers without continually entering passwords

ssh-keygen
Hit enter. Do not specify a file name.
Press enter again twice to skip passphrase.
Copy key to all other VMs ssh-copy-id youruser@slave01 Note: check permissions on home and ssh directories if master cannot connect to worker: /home/ufo ownership is 700 /home/ufo/.ssh ownership is 700 /home/ufo/.ssh/authorized_keys ownership is 600

Review current firewall rules: sudo iptables --line -vnL
Spark requires ports 4040, 6066, 7077, 8080, and 8081 open. The rules to allow traffic on these ports need to be above the REJECT rule, in our case usually line 5.
Repeat for each port on all machines
sudo iptables -I INPUT 5 -i eth0 -p tcp --dport 8080 -m state --state NEW,ESTABLISHED -j ACCEPT

Default language for Spark

The star of the party

So the primary node can start workers remotely

Spark was having trouble communicating with master node by name, so author added this step

Skip this step if you are already in conf directory: cd spark-1.6.0-bin-hadoop2.6/conf
touch spark-env.sh
sudo nano spark-env.sh
Enter the following (replace master IP address with actual):
SPARK_MASTER_IP=192.168.1.4
Control+X, then Y to save.

(From the Spark directory) sbin/start-all.sh
Navigate to http://192.168.1.4:8080 (replace with your master node's IP address) to access Spark administration page.