Skip to content

Instantly share code, notes, and snippets.

@sanjeevtripurari
Forked from tomgullo/wordcount.pig
Created May 22, 2016 17:38
Show Gist options
  • Save sanjeevtripurari/5f83beaa9867bc3e8cb0d80a0dac498c to your computer and use it in GitHub Desktop.
Save sanjeevtripurari/5f83beaa9867bc3e8cb0d80a0dac498c to your computer and use it in GitHub Desktop.
wordcount using hadoop pig
A = load '/tmp/alice.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = filter B by word matches '\\w+';
D = group C by word;
E = foreach D generate COUNT(C), group;
store E into '/tmp/alice_wordcount';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment