Prerequisites

Maven (2 or 3) - http://maven.apache.org
Java (Sun JDK recommended): http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html. Make sure that you export the JAVA_HOME environment variable.
Gcc and friends (or XCode package for Mac OS X).
git (only from developing from source)
python > 2.6
lzo - only needed for LZO compression inside Hadoop and HBase
A java IDE, or VIM :)

Intellij: You can get a free edition of Intellij IDEA here

Mac OS X

For Mac OS X, make sure that you have the Java package that is available on the Apple developer connection.

Go to [https://developer.apple.com/]
Create an account / login
Go to Member Center
Sign in if necessary.
Go to the Mac Dev Center
Go to View all downloads
Look in the page for something like "Java for Mac OS X Developer Preview ..."
Click, you can select from more than 1 download. Select the one for your operating system (10.6, 10.7)
Open the DMG, and install the package.
You should now have a new Java installation, inside /Library/Java/JavaVirtualMachines/VERSION (the version changes all the time, make the appropriate change)

This packages the Java VM in a consistent way. Test that you have the JAVA_HOME variable exported:

$ export JAVA_HOME=/Library/Java/JavaVirtualMachines/VERSION/Contents/Home/

Pick a folder where you will install all the package. We will call this folder ROOT from now on.

Prerequisites for building or running the entire stack

If you want to build or run the entire stack, not only develop jobs, you also need:

Passwordless SSH connection to localhost

On Mac OS X, ssh daemon can be started in System Preferences > Sharing > Remote Login Service
Generate an SSH key

$ ssh-keygen -t rsa -P ""

Make sure that you can connect without password

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

You can test the connecting by doing ssh localhost. It should connect without a password.
Create the data folders, and make sure that you can write to them

sudo mkdir -p /var/{,log/hbase,log/hadoop,log/zookeeper,hadoop_datastore,zookeeper_datastore}
sudo chown `id -un`:`id -gn` /var/{log/hbase,log/hadoop,log/zookeeper,hadoop_datastore,zookeeper_datastore}

Hadoop (branch: cloudera-cdh3u3)

git clone https://github.com/apache/hadoop-common.git
cd hadoop-common
ant

Format the namenode (one time only)

DO NOT EXECUTE THIS STEP MORE THAN ONCE. If you do, you will delete all the data in your Hadoop installation !

bin/hadoop namenode -format

Now, you can start the service:

ROOT/hadoop/bin/start-all.sh

**You can check to see if it works by opening the Hadoop Map/Reduce and Hadoop DFS status pages in the browser. **

Troubleshooting and more details.

If it does not work, the first place to look are the log files, in ROOT/hadoop/logs. You can find more details about Hadoop installation at Apache here.

Zookeeper (branch: 3.4.3)

Get the source, change the branch and build

git clone https://github.com/apache/zookeeper.git
cd zookeeper
git co -b 3.4.3 remotes/origin/3.4.3
ant

Start the service:

/zookeper/bin/zkServer.sh start

Hbase (branch: 0.92)

Get the source, change the branch and build

git clone https://github.com/apache/hbase.git
cd hbase
git co -b 0.92 remotes/origin/0.92
./build.sh

Start the service (this step needs to run after installing hadoop-lzo-compression)

bin/start-hbase.sh

Troubleshooting and more details

If it does not work, the first place to look are the log files, in ROOT/hbase/logs. More details about HBase can be found here, at the Apache HBase Book.

adragomir/gist:2181488