- Maven (2 or 3) - http://maven.apache.org
- Java (Sun JDK recommended): http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html. Make sure that you export the
JAVA_HOME
environment variable. - Gcc and friends (or XCode package for Mac OS X).
- git (only from developing from source)
- python > 2.6
- lzo - only needed for LZO compression inside Hadoop and HBase
- A java IDE, or VIM :)
Intellij: You can get a free edition of Intellij IDEA here
Eclipse: You need eclipse and m2eclipse
For Mac OS X, make sure that you have the Java package that is available on the Apple developer connection.
- Go to [https://developer.apple.com/]
- Create an account / login
- Go to Member Center
- Sign in if necessary.
- Go to the Mac Dev Center
- Go to View all downloads
- Look in the page for something like "Java for Mac OS X Developer Preview ..."
- Click, you can select from more than 1 download. Select the one for your operating system (10.6, 10.7)
- Open the DMG, and install the package.
- You should now have a new Java installation, inside
/Library/Java/JavaVirtualMachines/VERSION
(the version changes all the time, make the appropriate change)
This packages the Java VM in a
consistent way. Test that you have the JAVA_HOME
variable exported:
$ export JAVA_HOME=/Library/Java/JavaVirtualMachines/VERSION/Contents/Home/
Pick a folder where you will install all the package. We will call this folder ROOT
from now on.
If you want to build or run the entire stack, not only develop jobs, you also need:
- On Mac OS X, ssh daemon can be started in System Preferences > Sharing > Remote Login Service
- Generate an SSH key
$ ssh-keygen -t rsa -P ""
- Make sure that you can connect without password
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
-
You can test the connecting by doing
ssh localhost
. It should connect without a password. -
Create the data folders, and make sure that you can write to them
sudo mkdir -p /var/{,log/hbase,log/hadoop,log/zookeeper,hadoop_datastore,zookeeper_datastore}
sudo chown `id -un`:`id -gn` /var/{log/hbase,log/hadoop,log/zookeeper,hadoop_datastore,zookeeper_datastore}
git clone https://github.com/apache/hadoop-common.git
cd hadoop-common
ant
- Format the namenode (one time only)
DO NOT EXECUTE THIS STEP MORE THAN ONCE. If you do, you will delete all the data in your Hadoop installation !
bin/hadoop namenode -format
- Now, you can start the service:
ROOT/hadoop/bin/start-all.sh
**You can check to see if it works by opening the Hadoop Map/Reduce and Hadoop DFS status pages in the browser. **
If it does not work, the first place to look are the log files, in ROOT/hadoop/logs
.
You can find more details about Hadoop installation at Apache
here.
- Get the source, change the branch and build
git clone https://github.com/apache/zookeeper.git
cd zookeeper
git co -b 3.4.3 remotes/origin/3.4.3
ant
- Start the service:
/zookeper/bin/zkServer.sh start
- Get the source, change the branch and build
git clone https://github.com/apache/hbase.git
cd hbase
git co -b 0.92 remotes/origin/0.92
./build.sh
- Start the service (this step needs to run after installing hadoop-lzo-compression)
bin/start-hbase.sh
If it does not work, the first place to look are the log files, in ROOT/hbase/logs
.
More details about HBase can be found here, at the Apache HBase Book.