Skip to content

Instantly share code, notes, and snippets.

@ottomata
Created October 3, 2012 20:35
Show Gist options
  • Save ottomata/3829695 to your computer and use it in GitHub Desktop.
Save ottomata/3829695 to your computer and use it in GitHub Desktop.
LOG_FIELDS = LOAD '$input' USING PigStorage(' ') AS (hostname:chararray, udplog_sequence:chararray, timestamp:chararray, request_time:chararray, remote_addr:chararray, http_status:chararray, bytes_sent:chararray, request_method:chararray, uri:chararray, proxy_host:chararray, content_type:chararray, referer:chararray, x_forwarded_for:chararray, user_agent);
STATUS = FOREACH LOG_FIELDS GENERATE http_status;
FILTERED_STATUS = FILTER STATUS BY ($0 matches '.*(404|200|302).*');
STATUS_COUNT = FOREACH (GROUP FILTERED_STATUS BY $0 PARALLEL 28) GENERATE $0, COUNT($1) as num;
STATUS_COUNT_SORTED = ORDER STATUS_COUNT BY num DESC;
STORE STATUS_COUNT_SORTED into '$output';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment