Skip to content

Instantly share code, notes, and snippets.

@mrmichalis
Created May 14, 2013 20:12
Show Gist options
  • Select an option

  • Save mrmichalis/5579099 to your computer and use it in GitHub Desktop.

Select an option

Save mrmichalis/5579099 to your computer and use it in GitHub Desktop.
This script will download all of Shakespeare’s books from project, Gutenberg, upload them to HDFS and run a Map Reduce operation run a word count against the text.
#!/bin/bash
# curl -L "https://raw.github.com/sacharya/random-scripts/master/knife-rackspace-hadoop/wordcount.sh" | bash
set -x
hadoop fs -rmr /shakespeare
cd /tmp
wget http://homepages.ihug.co.nz/~leonov/shakespeare.tar.bz2
tar xjvf shakespeare.tar.bz2
now=`date +"%y%m%d-%H%M"`
hadoop fs -put /tmp/Shakespeare /shakespeare/$now/input
hadoop jar /usr/lib/hadoop/hadoop-examples-1.0.3.15.jar wordcount /shakespeare/$now/input /shakespeare/$now/output
hadoop fs -cat /shakespeare/$now/output/part-r-* | sort -nk2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment