Skip to content

Instantly share code, notes, and snippets.

@commuterjoy
Created August 16, 2012 22:13
Show Gist options
  • Save commuterjoy/3374095 to your computer and use it in GitHub Desktop.
Save commuterjoy/3374095 to your computer and use it in GitHub Desktop.
-- job to group number of unique urls request per second
lines = load '12_ip-10-226-85-155.processed/part-r-00000' using PigStorage();
projected = foreach lines generate
REGEX_EXTRACT((chararray)$1, '(.*)\\.(.*)', 1) as time,
(chararray)$3 as url
;
group_by_url = group projected by (url, time);
url_counts = foreach group_by_url generate group, COUNT(projected.url) as count;
store url_counts into '12_ip-10-226-85-155.2' using PigStorage();
2012-08-13T12:03:26 93
2012-08-13T12:03:32 93
2012-08-13T12:03:33 93
2012-08-13T12:03:04 94
2012-08-13T12:03:40 94
2012-08-13T12:04:00 94
2012-08-13T12:00:40 95
2012-08-13T12:01:03 95
2012-08-13T12:01:43 95
2012-08-13T12:02:27 95
2012-08-13T12:03:52 95
2012-08-13T12:00:18 96
2012-08-13T12:02:20 96
2012-08-13T12:02:48 96
2012-08-13T12:03:00 96
2012-08-13T12:03:28 96
2012-08-13T12:03:38 96
2012-08-13T12:02:19 97
2012-08-13T12:03:34 97
2012-08-13T12:01:10 98
2012-08-13T12:03:22 98
2012-08-13T12:03:54 98
2012-08-13T12:01:20 99
2012-08-13T12:03:36 99
2012-08-13T12:02:50 100
2012-08-13T12:03:48 100
2012-08-13T12:03:43 103
2012-08-13T12:02:57 106
2012-08-13T12:03:51 106
2012-08-13T12:03:29 108
(http://www.guardian.co.uk/,2012-08-13T12:02:05) 22
(http://www.guardian.co.uk/,2012-08-13T12:02:44) 22
(http://www.guardian.co.uk/,2012-08-13T12:02:50) 22
(http://www.guardian.co.uk/,2012-08-13T12:02:54) 22
(http://www.guardian.co.uk/,2012-08-13T12:03:11) 22
(http://www.guardian.co.uk/,2012-08-13T12:03:21) 22
(http://www.guardian.co.uk/,2012-08-13T12:03:26) 22
(http://www.guardian.co.uk/,2012-08-13T12:00:36) 23
(http://www.guardian.co.uk/,2012-08-13T12:01:43) 23
(http://www.guardian.co.uk/,2012-08-13T12:02:46) 23
(http://www.guardian.co.uk/,2012-08-13T12:03:50) 23
(http://www.guardian.co.uk/,2012-08-13T12:03:58) 23
(http://www.guardian.co.uk/,2012-08-13T12:02:27) 24
(http://www.guardian.co.uk/,2012-08-13T12:02:28) 24
(http://www.guardian.co.uk/,2012-08-13T12:03:04) 24
(http://www.guardian.co.uk/,2012-08-13T12:03:35) 24
(http://www.guardian.co.uk/,2012-08-13T12:00:40) 25
(http://www.guardian.co.uk/,2012-08-13T12:00:51) 25
(http://www.guardian.co.uk/,2012-08-13T12:01:45) 25
(http://www.guardian.co.uk/,2012-08-13T12:02:48) 25
(http://www.guardian.co.uk/,2012-08-13T12:03:07) 25
(http://www.guardian.co.uk/,2012-08-13T12:03:57) 25
(http://www.guardian.co.uk/,2012-08-13T12:01:12) 26
(http://www.guardian.co.uk/,2012-08-13T12:02:47) 26
(http://www.guardian.co.uk/,2012-08-13T12:02:57) 26
(http://www.guardian.co.uk/,2012-08-13T12:03:49) 26
(http://www.guardian.co.uk/,2012-08-13T12:03:54) 26
(http://www.guardian.co.uk/,2012-08-13T12:04:00) 26
(http://www.guardian.co.uk/,2012-08-13T12:01:10) 28
(http://www.guardian.co.uk/,2012-08-13T12:03:10) 30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment