Skip to content

Instantly share code, notes, and snippets.

@commuterjoy
Created August 21, 2012 13:23
Show Gist options
  • Save commuterjoy/3415362 to your computer and use it in GitHub Desktop.
Save commuterjoy/3415362 to your computer and use it in GitHub Desktop.
pig - pages p/min
-- Job to count the number of logged requests p/min
record = load '12_ip-10-226-85-155.processed/part-r-00000' using PigStorage();
projected = foreach record generate
REGEX_EXTRACT((chararray)$1, '(.*)\\:(.*)', 1) as time
;
group_by_minute = group projected by (time);
ppm = foreach group_by_minute generate group,
COUNT(projected.time) as count;
store ppm into '12_ip-10-226-85-155.a' using PigStorage();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment