Skip to content

Instantly share code, notes, and snippets.

@MicTech
Last active February 11, 2016 08:13
Show Gist options
  • Save MicTech/12f1950ee174ac1095ad to your computer and use it in GitHub Desktop.
Save MicTech/12f1950ee174ac1095ad to your computer and use it in GitHub Desktop.
Avro to Csv in Pig
#Based on Petr's code - https://github.com/PetrVales
#You need PiggyBank library
#info: https://cwiki.apache.org/confluence/display/PIG/PiggyBank
#jar: http://search.maven.org/#search%7Cga%7C1%7Cpiggybank
#start pig in local mode
pig -x local
#register PiggyBank library
grunt> register /<local_path>/piggybank-0.12.0.jar
#load avro to pig
grunt> result = LOAD '/<path_to_avro>/filename.avro' USING org.apache.pig.builtin.AvroStorage();
#store avro as csv
grunt> STORE result INTO '<destination_filename>' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'YES_MULTILINE', 'UNIX') PARALLEL 1;
#if pig saved more then one file, please use GROUP BY before STORE
grunt> result2 = GROUP result BY id;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment