Skip to content

Instantly share code, notes, and snippets.

@tovbinm
Created April 21, 2011 09:48
Show Gist options
  • Save tovbinm/934077 to your computer and use it in GitHub Desktop.
Save tovbinm/934077 to your computer and use it in GitHub Desktop.
Hadoop Tweaks
On Ubuntu:
1.sudo apt-get install lzop liblzo2-dev
2.download and build: https://github.com/kevinweil/hadoop-lzo
3.copy the resulted jar to: <yourhadoop>/lib/, typically: /usr/lib/hadoop/lib/
4.download: http://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/
5.cp ./hadoop-gpl-compression-0.1.0/lib/native/Linux-<your_acrh_type>/*.* /usr/lib/hadoop/lib/native/Linux-<your_acrh_type>/
6.Add the following properties to core-site.xml:
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
Testing:
echo "hello world" > test.log
lzop test.log
hadoop fs -copyFromLocal test.log.lzo /tmp
hadoop jar /usr/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer /tmp/test.log.lzo
hadoop fs -libjars /app/hadoop/resources/conduit_data_types.jar,/app/hadoop/resources/json-rpc-1.0.jar -text /user/mapred/<compressedfile> > out.out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment