Created
September 25, 2014 15:31
-
-
Save timrobertson100/8a1ced7a91b7e7f3812a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Detailed steps to distribute a new codec for Hadoop for use in Hive tab delimited files. | |
## | |
# 1: Copy up the compress jar around the cluster | |
## | |
$ scp hadoop-compress-1.0-SNAPSHOT.jar [email protected]:/usr/local/lib | |
$ scp hadoop-compress-1.0-SNAPSHOT.jar [email protected]:/usr/local/lib | |
$ scp hadoop-compress-1.0-SNAPSHOT.jar [email protected]:/usr/local/lib | |
## | |
# 2: put jar in HDFS, for Hive aux jar | |
## | |
$ hadoop dfs -put hadoop-compress-1.0-SNAPSHOT.jar /user/hive/auxjars/hadoop-compress-1.0-SNAPSHOT.jar | |
## | |
# 3: Setup environment using CDH Manager | |
## | |
i. search for "MapReduce Service Environment Safety Valve" | |
ii. added the following: | |
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/lib/* | |
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/local/lib/* | |
iii. search for "io.compression.codecs" and added to the gateway: | |
org.gbif.hadoop.compress.d2.D2Codec | |
iv. search for "hive aux" and add it to the CP - e.g. as: | |
<property> | |
<name>hive.aux.jars.path</name> | |
<value>hdfs://c1n1.gbif.org:8020/user/hive/auxjars/guava-11.0.2.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/hbase-0.94.15-cdh4.6.0.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/hive-hbase-handler-0.10.0-cdh4.6.0.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/zookeeper-3.4.5-cdh4.6.0.jar,hdfs://c1n1.gbif.org:8020/user/hive/auxjars/hadoop-compress-1.0-SNAPSHOT.jar</value> | |
</property> | |
## | |
# 4: Test it | |
## | |
SET hive.exec.compress.output = true; | |
SET io.seqfile.compression.type = BLOCK; | |
SET mapred.output.compression.codec = org.gbif.hadoop.compress.d2.D2Codec; | |
CREATE TABLE tim.comp_test1 | |
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' | |
AS SELET * FROM tim.small_csv | |
The data files are correctly made, and the data is compressed. | |
But in Hue, previewing a sample of the data does not work. I suspect the Codec is not on the hive CP, but I am not yet sure how to get it on there. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment