Skip to content

Instantly share code, notes, and snippets.

@ottomata
Created October 10, 2012 19:56
Show Gist options
  • Select an option

  • Save ottomata/3868011 to your computer and use it in GitHub Desktop.

Select an option

Save ottomata/3868011 to your computer and use it in GitHub Desktop.
Group By referrer, filter on BannerController
LOG_FIELDS = LOAD '$input' USING PigStorage(' ') AS (hostname:chararray, udplog_sequence:chararray, timestamp:chararray, request_time:chararray, remote_addr:chararray, http_status:chararray, bytes_sent:chararray, request_method:chararray, uri:chararray, proxy_host:chararray, content_type:chararray, referer:chararray, x_forwarded_for:chararray, user_agent);
LOG_FIELDS = FILTER LOG_FIELDS BY (uri matches '.*BannerController.*');
REFERER = FOREACH LOG_FIELDS GENERATE referer;
COUNT = FOREACH (GROUP REFERER BY $0 PARALLEL 7) GENERATE $0, COUNT($1) as num;
COUNT_SORTED = ORDER COUNT BY num DESC;
DUMP COUNT_SORTED;
STORE URI_COUNT_SORTED into '$output';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment