- Install ubuntu OS
- Login to ubuntu
- Switch to root
# adduser hadoop
Download JDK from below link and extract into /opt directory. Oracle JDK
# cd /opt
# tar -xzvf jdk-8u271-linux-x64.tar.gz
To set JDK 1.8_271 as default JVM
# update-alternatives --install /usr/bin/java java /opt/jdk1.8.0_271/bin/java 100
# update-alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_271/bin/javac 100
After installation to verify the java has been successfully configured
# update-alternatives --display java
# update-alternatives --display javac
Install the OpenSSH Server and Open SSH Client
# sudo apt-get install openssh-server openssh-client
Generate Public and Private Key Pairs with the following command. The terminal will prompt for entering the file name. Press ENTER and proceed. After that copy the public keys form id_rsa.pub to authorized_keys.
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Download and Extract Hadoop from Apache Official website:
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar -xzvf hadoop-2.8.5.tar.gz
nano ~/.bashrc
Edit the bashrc for the Hadoop user via setting up the following Hadoop environment variables
export HADOOP_HOME=/home/hadoop/hadoop-2.8.5
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Source the .bashrc in current login session.
source ~/.bashrc
cd hadoop-2.8.5
nano etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_271
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/hadoop-2.8.5/etc/hadoop"}
nano etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadooptmpdata</value>
</property>
</configuration>
In addition, create the directory under hadoop home folder.
cd
mkdir hadooptmpdata
mkdir -p hdfs/namenode
mkdir -p hdfs/datanode
cd hadoop-2.8.5
nano etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
Copy the mapred-site.xml from mapred-site.xml.template using cp command and then edit the mapred-site.xml placed in /etc/hadoop under hadoop instillation directory with the following changes
cd
cd hadoop-2.8.5/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
nano yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
cd
hdfs namenode -format
start-dfs.sh
start-yarn.sh
To verify all the Hadoop services/daemons are started successfully you can use the jps command.
/opt/jdk1.8.0_271/bin/jps
20035 SecondaryNameNode 19782 DataNode 21671 Jps 20343 NodeManager 19625 NameNode 20187 ResourceManager
hadoop version
URL: http://IP_address:50070 Resource Manager Web UI: http://IP_address:8088
hdfs dfs -mkdir /input
hdfs dfs -ls /
cd
nano file.txt
Hello
World
Hello
India
hdfs dfs -put /home/hadoop/file.txt /input
hadoop jar hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input/file.txt /output
hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000