Skip to content

Instantly share code, notes, and snippets.

@rahulkumar-aws
Last active June 21, 2018 17:13
Show Gist options
  • Save rahulkumar-aws/f514c2eecc0788e2f9997c08ac2293b8 to your computer and use it in GitHub Desktop.
Save rahulkumar-aws/f514c2eecc0788e2f9997c08ac2293b8 to your computer and use it in GitHub Desktop.
Hadoop Installation Single Node

Set JAVA_HOME

$ vim .bash_profile 
export JAVA_HOME=$(/usr/libexec/java_home)
$ source .bash_profile
$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/1.7.0.jdk/Contents/Home

Setup Password less ssh

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

Pseudo-Distributed Operation

Set JAVA_HOME in etc/hadoop/hadoop-env.sh

etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

YARN on a Single Node

  1. Configure parameters as follows:etc/hadoop/mapred-site.xml:
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  1. etc/hadoop/yarn-site.xml:
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

Commands

$ bin/hdfs namenode -format
$ sbin/start-dfs.sh
$ sbin/start-yarn.sh
$ sbin/stop-yarn.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

NameNode - http://localhost:50070/ ResourceManager - http://localhost:8088/

Make the HDFS directories required to execute MapReduce jobs:

  $ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>

Copy the input files into the distributed filesystem:

  $ bin/hdfs dfs -put etc/hadoop input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep input output 'dfs[a-z.]+'
  $ bin/hdfs dfs -get output output
  $ cat output/*
  $ bin/hdfs dfs -cat output/*

When you’re done, stop the daemons with:

$ sbin/stop-dfs.sh
$ sbin/stop-yarn.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment