Skip to content

Instantly share code, notes, and snippets.

@mmiliaus
Last active December 17, 2015 16:19
Show Gist options
  • Save mmiliaus/5637780 to your computer and use it in GitHub Desktop.
Save mmiliaus/5637780 to your computer and use it in GitHub Desktop.
Hadoop Installation Guide

Mac OSX

Hadoop

  • brew install hadoop
  • Check if Hadoop has been successfully installed: hadoop version
  • Add following three lines to your ~/.bash_profile:
export HADOOP_PREFIX=/usr/local/Cellar/hadoop/1.1.2/
export JAVA_HOME=$(/usr/libexec/java_home)
export PATH=$PATH:$HADOOP_PREFIX/bin
  • Add following to $HADOOP_PREFIX/libexec/conf/hadoop-env.sh:
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

SSH

  • Make sure that you have ssh private (~/.ssh/id_rsa) and public (~/.ssh/id_rsa.pub) keys already setup. If not:
  `ssh-keygen -t rsa`
  • Make sure that "Remote login" is enabled in your system preferences. For this, go to "System Preferences" -> "Sharing". "Remote login" should be checked.
  • Add your public key to authorised keys list:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • Check if SSH setup was successful by logging in to localhost:
$ ssh localhost
Last login: Thu May 23 21:36:20 2013

Ubuntu 12.04 LTS

JDK

  • Make sure you have JDK installed:
$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.5) (6b27-1.12.5-0ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Otherwise:

sudo apt-get install openjdk-6-jdk

SSH

  • If you don't have JDK installed, install it: sudo apt-get install openjdk-6-jre-headles
  • Make sure that you have ssh private (~/.ssh/id_rsa) and public (~/.ssh/id_rsa.pub) keys already setup. If not:
  `ssh-keygen -t rsa`
  • Add your public key to authorised keys list:
  `cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys`
  • Install SSH server:
sudo apt-get install openssh-server
  • Check if you can SSH to localhost:
$ ssh localhost

You may see this warning:

The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 99:3f:f0:6a:8f:3d:7f:a7:2f:c0:75:07:47:98:3c:bd.
Are you sure you want to continue connecting (yes/no)?

Don't worry, this is supposed to happen. Verify that the fingerprint matches the one here and type "yes".

If you try to connect again, you should be greeted with following message:

$ ssh localhost
Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.5.0-23-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

Last login: Thu May 23 21:37:46 2013 from localhost

Hadoop 1.1.2

  • Download Hadoop 1.1.2 archive and extract its contents:
wget http://mirrors.enquira.co.uk/apache/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz
tar xzf hadoop-1.1.2.tar.gz
sudo mv hadoop-1.1.2 /usr/local/hadoop
  • Add following three lines to the end of ~/.bashrc:
export HADOOP_PREFIX=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-amd64
export PATH=$PATH:$HADOOP_PREFIX/bin
  • Restart Bash or source ~/.bashrc
  • Check if hadoop responds:
$ hadoop version
Hadoop 1.1.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782
Compiled by hortonfo on Thu Jan 31 02:03:24 UTC 2013
From source with checksum c720ddcf4b926991de7467d253a79b8b
  • vim /usr/local/hadoop/conf/hadoop-env.sh

Change:

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

To:

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-amd64

Windows

So far the most comprehensive guide on setting up Hadoop on Windows can be found here: http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html

Running WordCount Example

Linux and Mac OSX:

wget http://www.gutenberg.org/files/42778/42778-0.txt
hadoop jar $HADOOP_PREFIX/libexec/hadoop-examples-*.jar wordcount 42778-0.txt output

Check the output:

less output/part-r-00000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment