Skip to content

Instantly share code, notes, and snippets.

@gopal1996
Last active October 30, 2020 21:32
Show Gist options
  • Save gopal1996/4c15f051bf2f888046fb4595f7148492 to your computer and use it in GitHub Desktop.
Save gopal1996/4c15f051bf2f888046fb4595f7148492 to your computer and use it in GitHub Desktop.

Install Hadoop on Ubuntu

  • Install ubuntu OS
  • Login to ubuntu
  • Switch to root

Create user

# adduser hadoop

Install Oracle JDK

Download JDK from below link and extract into /opt directory. Oracle JDK

# cd /opt
# tar -xzvf jdk-8u271-linux-x64.tar.gz

To set JDK 1.8_271 as default JVM

# update-alternatives --install /usr/bin/java java /opt/jdk1.8.0_271/bin/java 100
# update-alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_271/bin/javac 100

After installation to verify the java has been successfully configured

# update-alternatives --display java
# update-alternatives --display javac

Configure passwordless SSH

Install the OpenSSH Server and Open SSH Client

# sudo apt-get install openssh-server openssh-client

Generate Public and Private Key Pairs with the following command. The terminal will prompt for entering the file name. Press ENTER and proceed. After that copy the public keys form id_rsa.pub to authorized_keys.

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Install Hadoop

Download and Extract Hadoop from Apache Official website:

wget https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar -xzvf hadoop-2.8.5.tar.gz

Setting up the environment variables

nano ~/.bashrc

Edit the bashrc for the Hadoop user via setting up the following Hadoop environment variables

export HADOOP_HOME=/home/hadoop/hadoop-2.8.5
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Source the .bashrc in current login session.

source ~/.bashrc

Configure Hadoop

hadoop-env.sh

cd hadoop-2.8.5
nano etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_271
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/hadoop-2.8.5/etc/hadoop"}

core-site.xml

nano etc/hadoop/core-site.xml
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadooptmpdata</value>
  </property>
</configuration>

In addition, create the directory under hadoop home folder.

cd
mkdir hadooptmpdata

hdfs-site.xml

mkdir -p hdfs/namenode
mkdir -p hdfs/datanode
cd hadoop-2.8.5
nano etc/hadoop/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
    <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hdfs/namenode</value>
    <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hdfs/datanode</value>
  </property>
</configuration>

mapred-site.xml

Copy the mapred-site.xml from mapred-site.xml.template using cp command and then edit the mapred-site.xml placed in /etc/hadoop under hadoop instillation directory with the following changes

cd
cd hadoop-2.8.5/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

yarn-site.xml

nano yarn-site.xml
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

Start Hadoop Cluster

cd
hdfs namenode -format
start-dfs.sh
start-yarn.sh

To verify all the Hadoop services/daemons are started successfully you can use the jps command.

/opt/jdk1.8.0_271/bin/jps

20035 SecondaryNameNode 19782 DataNode 21671 Jps 20343 NodeManager 19625 NameNode 20187 ResourceManager

Hadoop version

hadoop version

Access the Namenode and YARN from Browser

URL: http://IP_address:50070 Resource Manager Web UI: http://IP_address:8088


Running a Wordcount Mapreduce

hdfs dfs -mkdir /input
hdfs dfs -ls /
cd
nano file.txt
Hello
World
Hello
India
hdfs dfs -put /home/hadoop/file.txt  /input
hadoop jar hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input/file.txt /output
hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment