- Download Spark 1.4 to your local machine (laptop, or PC)
- Go to 192.168.1.1 to get local IPs for newly connected RPis
ssh [email protected]
(default password forpi
user israspberry
)
- Enter config:
sudo raspi-config
- Choose expand filesystem (this allows the OS to take up the full size of the SD card)
- Change the hostname of the device to something like rpi007 (under advanced options)
- When exiting the config, choose to reboot so that changes take effect
A Spark cluster will need ssh access between nodes using the same username, so let's configure a spark
user for this node.
- add new user:
sudo adduser spark
(for simplificty, password should be same for all RPis) - add spark user to sudo group:
sudo adduser spark sudo
CTRL+D
to log out of SSH (we'll log in as spark user)
We downloaded Spark 1.4 to our local machine earlier, and now it's time to transfer the file onto the new RPi using scp
to securely transfer the file via SSH to the RPi. Run the following command from your local machine.
scp spark-1.4.0-bin-hadoop2.6.tgz [email protected]:spark-1.4.0-bin-hadoop2.6.tgz
With the file transferred to the new RPi, let's log into the spark user we created earlier to set up spark.
ssh [email protected]
- Extract spark:
tar xvfz spark-1.3.0-bin-hadoop2.4.tgz
Note that spark produces tonnes of logging messages by default (we will turn this down later).
- Go to the new folder:
cd spark-1.3.0-bin-hadoop2.4
bin/run-example SparkPi 10
(calculates Pi to 10 decimals)
bin/spark-shell --master local[4]
scala> sc.textFile("README.md").count
- To see what spark is doing, go to [http://raspi08.home:4040/]
ctrl+D
quits the shell
bin/pyspark --master local[4]
>>> sc.textFile("README.md").count()
ctrl+D
quites the shell
That concludes all that's required to set up an individual spark node. In the next section we'll discuss how to get our individual nodes to act as a cluster.
- Download Hadoop 2.6
wget http://apache.mirror.digitalpacific.com.au/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Can you please be so kind and gracious enough to provide more verbose instructions in how to configure the cluster?