Skip to content

Instantly share code, notes, and snippets.

@benjaminhawkeslewis
Created October 6, 2011 14:48
Show Gist options
  • Save benjaminhawkeslewis/1267584 to your computer and use it in GitHub Desktop.
Save benjaminhawkeslewis/1267584 to your computer and use it in GitHub Desktop.
My first Pig script - filter input by UUID, then sort by score
-- Filter rows in $inputFile by UUIDs present in $filterByFile, then sort by score
input = LOAD '$inputFile' USING PigStorage('\t') AS (uuid:chararray,score:double);
filter_by = LOAD '$filterByFile' USING PigStorage('\t') AS (uuid:chararray);
filtered = JOIN input BY uuid, filter_by by uuid;
wanted_fields = FOREACH filtered GENERATE $0 as uuid:chararray, $1 as score:double;
ordered = ORDER wanted_fields BY score DESC;
STORE ordered INTO '$outputDir' USING PigStorage('\t');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment