Forked from anonymous/Deploying multinode Hadoop 2.0 cluster using Apache Ambari
Last active
August 29, 2015 14:15
-
-
Save dasgoll/bb111e573eed611a1707 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To actually create the servers, I will use a slightly modified version of bulk servers create script. I will create one server for Apache Ambari and a number of servers for Apache Hadoop Cluster and I will then use Ambari to install the Hadoop onto the Hadoop cluster servers. | |
So basically, I have created the following servers: | |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
myhadoop-Ambari | |
myhadoop1 | |
myhadoop2 | |
myhadoop3 | |
myhadoop4 | |
myhadoop5 | |
and have recorded their hostnames, public/private ip addresses and root passwords for each. | |
2. Prepare the servers. | |
SSH into the newly created Ambari server eg. myhadoop-Ambari. Update its /etc/hosts file with the entry for each server above. | |
Also create a hosts.txt file with the hostnames of the servers from above. | |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
root@myhadoop-Ambari$ cat hosts.txt | |
myhadoop1 | |
myhadoop2 | |
myhadoop3 | |
myhadoop4 | |
myhadoop5 | |
At this point, from the same Ambari server, run the following script which will ssh into all of the servers specified in the hosts.txt file and set them up. | |
Specifically, the script will set up passwordless SSH between the servers and also disable iptables among other things. | |
prepare-cluster.sh | |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
25 | |
26 | |
27 | |
28 | |
29 | |
#!/bin/bash | |
set -x | |
# Generate SSH keys | |
ssh-keygen -t rsa | |
cd ~/.ssh | |
cat id_rsa.pub >> authorized_keys | |
cd ~ | |
# Distribute SSH keys | |
for host in `cat hosts.txt`; do | |
cat ~/.ssh/id_rsa.pub | ssh root@$host "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys" | |
cat ~/.ssh/id_rsa | ssh root@$host "cat > ~/.ssh/id_rsa; chmod 400 ~/.ssh/id_rsa" | |
cat ~/.ssh/id_rsa.pub | ssh root@$host "cat > ~/.ssh/id_rsa.pub" | |
done | |
# Distribute hosts file | |
for host in `cat hosts.txt`; do | |
scp /etc/hosts root@$host:/etc/hosts | |
done | |
# Prepare other basic things | |
for host in `cat hosts.txt`; do | |
ssh root@$host "sed -i s/SELINUX=enforcing/SELINUX=disabled/g /etc/selinux/config" | |
ssh root@$host "chkconfig iptables off" | |
ssh root@$host "/etc/init.d/iptables stop" | |
echo "enabled=0" | ssh root@$host "cat > /etc/yum/pluginconf.d/refresh-packagekit.conf" | |
done | |
Note, this step will ask for root password for each of the servers before setting them for passwordless access. | |
3 Install Ambari. | |
While still on the Ambari server, run the following script that will install Apache Ambari. | |
install-ambari-server.sh | |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
#!/bin/bash | |
set -x | |
if [[ $EUID -ne 0 ]]; then | |
echo "This script must be run as root" | |
exit 1 | |
fi | |
# Install Ambari server | |
cd ~ | |
wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari.repo | |
cp ambari.repo /etc/yum.repos.d/ | |
yum install -y epel-release | |
yum repolist | |
yum install -y ambari-server | |
# Setup Ambari server | |
ambari-server setup -s | |
# Start Ambari server | |
ambari-server start | |
ps -ef | grep Ambari | |
Once the installation completes, you should be able to login to the ip address of the Ambari servers on the browser and access its web interface. | |
http://myhadoop-Ambari:8080 | |
admin/admin is the default username and password. | |
4. Install Hadoop. | |
Once logged into the Ambari web portal, it is pretty intuitive to create a Hadoop Cluster through its wizard. | |
It will ask for hostnames and SSH Private Key, which you can get from the Ambari Server. | |
1 | |
2 | |
root@myhadoop-Ambari$ cat hosts.txt | |
root@myhadoop-Ambari$ cat ~/.ssh/id_rsa | |
You should be able to just follow the wizard and complete the Hadoop 2.0 Installation at this point. The process the install Hadoop 1.* is almost exactly the same although some of the services like YARN don’t exist. | |
Apache Ambari will let you install a plethora of services including HDFS, YARN, MapReduce2, HBase, HIVE, Oozie, Ganglia, Nagios, ZooKeeper and Hive and Pig clients. As you go through the installation wizard, you can choose what service goes on which server. | |
5. Validate Hadoop: | |
SSH to myhadoop1 and run the script to do a wordcount on all books of Shakespeare. | |
wordcount2.sh | |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
#!/bin/bash | |
set -x | |
su hdfs - -c "hadoop fs -rmdir /shakespeare" | |
cd /tmp | |
wget http://homepages.ihug.co.nz/~leonov/shakespeare.tar.bz2 | |
tar xjvf shakespeare.tar.bz2 | |
now=`date +"%y%m%d-%H%M"` | |
su hdfs - -c "hadoop fs -mkdir -p /shakespeare" | |
su hdfs - -c "hadoop fs -mkdir -p /shakespeare/$now" | |
su hdfs - -c "hadoop fs -put /tmp/Shakespeare /shakespeare/$now/input" | |
su hdfs - -c "hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-76.jar wordcount /shakespeare/$now/input /shakespeare/$now/output" | |
su hdfs - -c "hadoop fs -cat /shakespeare/$now/output/part-r-* | sort -nk2" | |
So you have your first Hadoop 2.0 cluster running and validated. Feed free to look into the scripts, its mostly instructions from |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment