Skip to content

Instantly share code, notes, and snippets.

@ottomata
Created October 12, 2012 21:12
Show Gist options
  • Select an option

  • Save ottomata/3881563 to your computer and use it in GitHub Desktop.

Select an option

Save ottomata/3881563 to your computer and use it in GitHub Desktop.
DEFINE EXTRACT org.apache.pig.builtin.REGEX_EXTRACT_ALL();
LOG_FIELDS = LOAD '$input' USING PigStorage(' ') AS (hostname:chararray, udplog_sequence:chararray, timestamp:chararray, request_time:chararray, remote_addr:chararray, http_status:chararray, bytes_sent:chararray, request_method:chararray, uri:chararray, proxy_host:chararray, content_type:chararray, referer:chararray, x_forwarded_for:chararray, user_agent:chararray);
CANONICAL_STATUS = FOREACH LOG_FIELDS GENERATE (uri MATCHES '\\.m\\.' ? 'mobile' : 'desktop') as canonical:chararray, FLATTEN (EXTRACT(http_status, '.*(\\d\\d\\d)')) as status:chararray;
COUNT = FOREACH (GROUP CANONICAL_STATUS BY (canonical, status) PARALLEL 7) GENERATE FLATTEN(group), COUNT($1) as num;
COUNT = ORDER COUNT BY $0,$1;
STORE COUNT into '$output';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment