Skip to content

Instantly share code, notes, and snippets.

@natbusa
Last active August 29, 2015 13:57
Show Gist options
  • Save natbusa/9740337 to your computer and use it in GitHub Desktop.
Save natbusa/9740337 to your computer and use it in GitHub Desktop.
Word count: in pig
A = load 'wordcount-input/lorem.txt' as (line:chararray);
B = foreach A generate FLATTEN(TOKENIZE(line)) as word;
C = foreach B generate LOWER(REPLACE(word,'\\W+','')) as word;
D = group C by word;
E = foreach D generate group, COUNT(C);
store E into 'wordcount-pig-output';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment