Skip to content

Instantly share code, notes, and snippets.

@timrobertson100
Created September 25, 2014 15:31
Show Gist options
  • Save timrobertson100/8a1ced7a91b7e7f3812a to your computer and use it in GitHub Desktop.
Save timrobertson100/8a1ced7a91b7e7f3812a to your computer and use it in GitHub Desktop.
# Detailed steps to distribute a new codec for Hadoop for use in Hive tab delimited files.
##
# 1: Copy up the compress jar around the cluster
##
$ scp hadoop-compress-1.0-SNAPSHOT.jar [email protected]:/usr/local/lib
$ scp hadoop-compress-1.0-SNAPSHOT.jar [email protected]:/usr/local/lib
$ scp hadoop-compress-1.0-SNAPSHOT.jar [email protected]:/usr/local/lib
##
# 2: put jar in HDFS, for Hive aux jar
##
$ hadoop dfs -put hadoop-compress-1.0-SNAPSHOT.jar /user/hive/auxjars/hadoop-compress-1.0-SNAPSHOT.jar
##
# 3: Setup environment using CDH Manager
##
i. search for "MapReduce Service Environment Safety Valve"
ii. added the following:
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/lib/*
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/local/lib/*
iii. search for "io.compression.codecs" and added to the gateway:
org.gbif.hadoop.compress.d2.D2Codec
iv. search for "hive aux" and add it to the CP - e.g. as:
<property>
<name>hive.aux.jars.path</name>
<value>hdfs://c1n1.gbif.org:8020/user/hive/auxjars/guava-11.0.2.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/hbase-0.94.15-cdh4.6.0.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/hive-hbase-handler-0.10.0-cdh4.6.0.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/zookeeper-3.4.5-cdh4.6.0.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/hadoop-compress-1.0-SNAPSHOT.jar</value>
</property>
##
# 4: Test it
##
SET hive.exec.compress.output = true;
SET io.seqfile.compression.type = BLOCK;
SET mapred.output.compression.codec = org.gbif.hadoop.compress.d2.D2Codec;
CREATE TABLE tim.comp_test1
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
AS SELET * FROM tim.small_csv
The data files are correctly made, and the data is compressed.
But in Hue, previewing a sample of the data does not work. I suspect the Codec is not on the hive CP, but I am not yet sure how to get it on there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment