Skip to content

Instantly share code, notes, and snippets.

@ottomata
Created October 4, 2012 14:20
Show Gist options
  • Save ottomata/3833806 to your computer and use it in GitHub Desktop.
Save ottomata/3833806 to your computer and use it in GitHub Desktop.
HTTP response status counts for 200, 404 and # 302 in 1:1 bannerImpression logs from 2012-09-30 and 2012-10-01
REGISTER 'piggybank.jar'
DEFINE RegexExtract org.apache.pig.piggybank.evaluation.string.RegexExtract();
LOG_FIELDS = LOAD '$input' USING PigStorage(' ') AS (hostname:chararray, udplog_sequence:chararray, timestamp:chararray, request_time:chararray, remote_addr:chararray, http_status:chararray, bytes_sent:chararray, request_method:chararray, uri:chararray, proxy_host:chararray, content_type:chararray, referer:chararray, x_forwarded_for:chararray, user_agent);
STATUS = FOREACH LOG_FIELDS GENERATE FLATTEN (RegexExtract(http_status, '.*(404|200|302).*', 1)) as status:chararray;
STATUS_COUNT = FOREACH (GROUP STATUS BY $0 PARALLEL 3) GENERATE $0, COUNT($1) as num;
STORE STATUS_COUNT into '$output';
# HTTP response status counts for 200, 404 and
# 302 in 1:1 bannerImpression logs from 2012-09-30 and 2012-10-01.
200 655438604
404 137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment