dasgoll · August 29, 2015 14:15
diff --git a/Deploying multinode Hadoop 2.0 cluster using Apache Ambari b/Deploying multinode Hadoop 2.0 cluster using Apache Ambari
 To actually create the servers, I will use a slightly modified version of bulk servers create script. I will create one server for Apache Ambari and a number of servers for Apache Hadoop Cluster and I will then use Ambari to install the Hadoop onto the Hadoop cluster servers.

 So basically, I have created the following servers:

 1
 2
 3
 4
 5
 6
 myhadoop-Ambari
 myhadoop1
 myhadoop2
 myhadoop3
 myhadoop4
 myhadoop5
 and have recorded their hostnames, public/private ip addresses and root passwords for each.

 2. Prepare the servers.

 SSH into the newly created Ambari server eg. myhadoop-Ambari. Update its /etc/hosts file with the entry for each server above.

 Also create a hosts.txt file with the hostnames of the servers from above.

 1
 2
 3
 4
 5
 6
 root@myhadoop-Ambari$ cat hosts.txt
 myhadoop1
 myhadoop2
 myhadoop3
 myhadoop4
 myhadoop5
 At this point, from the same Ambari server, run the following script which will ssh into all of the servers specified in the hosts.txt file and set them up.

 Specifically, the script will set up passwordless SSH between the servers and also disable iptables among other things.

 prepare-cluster.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 #!/bin/bash
 
 set -x
 
 # Generate SSH keys
 ssh-keygen -t rsa
 cd ~/.ssh
 cat id_rsa.pub >> authorized_keys
 
 cd ~
 # Distribute SSH keys
 for host in `cat hosts.txt`; do
    cat ~/.ssh/id_rsa.pub | ssh root@$host "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"
    cat ~/.ssh/id_rsa | ssh root@$host "cat > ~/.ssh/id_rsa; chmod 400 ~/.ssh/id_rsa"
    cat ~/.ssh/id_rsa.pub | ssh root@$host "cat > ~/.ssh/id_rsa.pub"
 done
 
 # Distribute hosts file
 for host in `cat hosts.txt`; do
    scp /etc/hosts root@$host:/etc/hosts
 done
 
 # Prepare other basic things
 for host in `cat hosts.txt`; do
    ssh root@$host "sed -i s/SELINUX=enforcing/SELINUX=disabled/g /etc/selinux/config"
    ssh root@$host "chkconfig iptables off"
    ssh root@$host "/etc/init.d/iptables stop"
    echo "enabled=0" | ssh root@$host "cat > /etc/yum/pluginconf.d/refresh-packagekit.conf"
 done
 Note, this step will ask for root password for each of the servers before setting them for passwordless access.

 3 Install Ambari.

 While still on the Ambari server, run the following script that will install Apache Ambari.

 install-ambari-server.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 #!/bin/bash
 
 set -x
 
 if [[ $EUID -ne 0 ]]; then
    echo "This script must be run as root"
    exit 1
 fi
 
 # Install Ambari server
 cd ~
 wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari.repo
 cp ambari.repo /etc/yum.repos.d/
 yum install -y epel-release
 yum repolist
 yum install -y ambari-server
 
 # Setup Ambari server
 ambari-server setup -s
 
 # Start Ambari server
 ambari-server start
 
 ps -ef | grep Ambari
 Once the installation completes, you should be able to login to the ip address of the Ambari servers on the browser and access its web interface.

 http://myhadoop-Ambari:8080

 admin/admin is the default username and password.

 4. Install Hadoop.

 Once logged into the Ambari web portal, it is pretty intuitive to create a Hadoop Cluster through its wizard.

 It will ask for hostnames and SSH Private Key, which you can get from the Ambari Server.

 1
 2
 root@myhadoop-Ambari$ cat hosts.txt
 root@myhadoop-Ambari$ cat ~/.ssh/id_rsa
 You should be able to just follow the wizard and complete the Hadoop 2.0 Installation at this point. The process the install Hadoop 1.* is almost exactly the same although some of the services like YARN don’t exist.

 Apache Ambari will let you install a plethora of services including HDFS, YARN, MapReduce2, HBase, HIVE, Oozie, Ganglia, Nagios, ZooKeeper and Hive and Pig clients. As you go through the installation wizard, you can choose what service goes on which server.

 5. Validate Hadoop:

 SSH to myhadoop1 and run the script to do a wordcount on all books of Shakespeare.

 wordcount2.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 #!/bin/bash
 
 set -x
 
 su hdfs - -c "hadoop fs -rmdir /shakespeare"
 cd /tmp
 wget http://homepages.ihug.co.nz/~leonov/shakespeare.tar.bz2
 tar xjvf shakespeare.tar.bz2
 now=`date +"%y%m%d-%H%M"`
 su hdfs - -c "hadoop fs -mkdir -p /shakespeare"
 su hdfs - -c "hadoop fs -mkdir -p /shakespeare/$now"
 su hdfs - -c "hadoop fs -put /tmp/Shakespeare /shakespeare/$now/input"
 su hdfs - -c "hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-76.jar wordcount /shakespeare/$now/input /shakespeare/$now/output"
 su hdfs - -c "hadoop fs -cat /shakespeare/$now/output/part-r-* | sort -nk2"
 So you have your first Hadoop 2.0 cluster running and validated. Feed free to look into the scripts, its mostly instructions from
	To actually create the servers, I will use a slightly modified version of bulk servers create script. I will create one server for Apache Ambari and a number of servers for Apache Hadoop Cluster and I will then use Ambari to install the Hadoop onto the Hadoop cluster servers.

	So basically, I have created the following servers:

	1
	2
	3
	4
	5
	6
	myhadoop-Ambari
	myhadoop1
	myhadoop2
	myhadoop3
	myhadoop4
	myhadoop5
	and have recorded their hostnames, public/private ip addresses and root passwords for each.

	2. Prepare the servers.

	SSH into the newly created Ambari server eg. myhadoop-Ambari. Update its /etc/hosts file with the entry for each server above.

	Also create a hosts.txt file with the hostnames of the servers from above.

	1
	2
	3
	4
	5
	6
	root@myhadoop-Ambari$ cat hosts.txt
	myhadoop1
	myhadoop2
	myhadoop3
	myhadoop4
	myhadoop5
	At this point, from the same Ambari server, run the following script which will ssh into all of the servers specified in the hosts.txt file and set them up.

	Specifically, the script will set up passwordless SSH between the servers and also disable iptables among other things.

	prepare-cluster.sh

	1
	2
	3
	4
	5
	6
	7
	8
	9
	10
	11
	12
	13
	14
	15
	16
	17
	18
	19
	20
	21
	22
	23
	24
	25
	26
	27
	28
	29
	#!/bin/bash

	set -x

	# Generate SSH keys
	ssh-keygen -t rsa
	cd ~/.ssh
	cat id_rsa.pub >> authorized_keys

	cd ~
	# Distribute SSH keys
	for host in `cat hosts.txt`; do
	cat ~/.ssh/id_rsa.pub \| ssh root@$host "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"
	cat ~/.ssh/id_rsa \| ssh root@$host "cat > ~/.ssh/id_rsa; chmod 400 ~/.ssh/id_rsa"
	cat ~/.ssh/id_rsa.pub \| ssh root@$host "cat > ~/.ssh/id_rsa.pub"
	done

	# Distribute hosts file
	for host in `cat hosts.txt`; do
	scp /etc/hosts root@$host:/etc/hosts
	done

	# Prepare other basic things
	for host in `cat hosts.txt`; do
	ssh root@$host "sed -i s/SELINUX=enforcing/SELINUX=disabled/g /etc/selinux/config"
	ssh root@$host "chkconfig iptables off"
	ssh root@$host "/etc/init.d/iptables stop"
	echo "enabled=0" \| ssh root@$host "cat > /etc/yum/pluginconf.d/refresh-packagekit.conf"
	done
	Note, this step will ask for root password for each of the servers before setting them for passwordless access.

	3 Install Ambari.

	While still on the Ambari server, run the following script that will install Apache Ambari.

	install-ambari-server.sh

	1
	2
	3
	4
	5
	6
	7
	8
	9
	10
	11
	12
	13
	14
	15
	16
	17
	18
	19
	20
	21
	22
	23
	24
	#!/bin/bash

	set -x

	if [[ $EUID -ne 0 ]]; then
	echo "This script must be run as root"
	exit 1
	fi

	# Install Ambari server
	cd ~
	wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari.repo
	cp ambari.repo /etc/yum.repos.d/
	yum install -y epel-release
	yum repolist
	yum install -y ambari-server

	# Setup Ambari server
	ambari-server setup -s

	# Start Ambari server
	ambari-server start

	ps -ef \| grep Ambari
	Once the installation completes, you should be able to login to the ip address of the Ambari servers on the browser and access its web interface.

	http://myhadoop-Ambari:8080

	admin/admin is the default username and password.

	4. Install Hadoop.

	Once logged into the Ambari web portal, it is pretty intuitive to create a Hadoop Cluster through its wizard.

	It will ask for hostnames and SSH Private Key, which you can get from the Ambari Server.

	1
	2
	root@myhadoop-Ambari$ cat hosts.txt
	root@myhadoop-Ambari$ cat ~/.ssh/id_rsa
	You should be able to just follow the wizard and complete the Hadoop 2.0 Installation at this point. The process the install Hadoop 1.* is almost exactly the same although some of the services like YARN don’t exist.

	Apache Ambari will let you install a plethora of services including HDFS, YARN, MapReduce2, HBase, HIVE, Oozie, Ganglia, Nagios, ZooKeeper and Hive and Pig clients. As you go through the installation wizard, you can choose what service goes on which server.

	5. Validate Hadoop:

	SSH to myhadoop1 and run the script to do a wordcount on all books of Shakespeare.

	wordcount2.sh

	1
	2
	3
	4
	5
	6
	7
	8
	9
	10
	11
	12
	13
	14
	#!/bin/bash

	set -x

	su hdfs - -c "hadoop fs -rmdir /shakespeare"
	cd /tmp
	wget http://homepages.ihug.co.nz/~leonov/shakespeare.tar.bz2
	tar xjvf shakespeare.tar.bz2
	now=`date +"%y%m%d-%H%M"`
	su hdfs - -c "hadoop fs -mkdir -p /shakespeare"
	su hdfs - -c "hadoop fs -mkdir -p /shakespeare/$now"
	su hdfs - -c "hadoop fs -put /tmp/Shakespeare /shakespeare/$now/input"
	su hdfs - -c "hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-76.jar wordcount /shakespeare/$now/input /shakespeare/$now/output"
	su hdfs - -c "hadoop fs -cat /shakespeare/$now/output/part-r-* \| sort -nk2"
	So you have your first Hadoop 2.0 cluster running and validated. Feed free to look into the scripts, its mostly instructions from