Skip to content

Instantly share code, notes, and snippets.

@ottomata
Created October 3, 2012 18:47
Show Gist options
  • Save ottomata/3828956 to your computer and use it in GitHub Desktop.
Save ottomata/3828956 to your computer and use it in GitHub Desktop.
REGISTER 'akela-0.5-SNAPSHOT.jar'
REGISTER 'maxmind-geoip-1.2.5.jar'
set pig.exec.nocombiner true
DEFINE GeoIpLookup com.mozilla.pig.eval.geoip.GeoIpLookup('GeoIPCity.dat');
LOG_FIELDS = LOAD '/user/otto/banner0/bannerImpressions-2011-11-16-05PM--30.log' USING PigStorage(' ') AS (hostname:chararray, udplog_sequence:chararray, timestamp:chararray, request_time:chararray, remote_addr:chararray, http_status:chararray, bytes_sent:chararray, request_method:chararray, uri:chararray, proxy_host:chararray, content_type:chararray, referer:chararray, x_forwarded_for:chararray, user_agent);
GEO_DATA = FOREACH LOG_FIELDS GENERATE FLATTEN (GeoIpLookup(remote_addr)) AS (country:chararray, country_code:chararray, region:chararray, city:chararray, postal_code:chararray, metro_code:int);
COUNTRY_CODE = FOREACH GEO_DATA GENERATE country_code;
COUNTRY_COUNT = FOREACH (GROUP COUNTRY_CODE BY $0 PARALLEL 28) GENERATE $0, COUNT($1) as num;
COUNTRY_COUNT_SORTED = ORDER COUNTRY_COUNT BY num DESC;
DUMP COUNTRY_COUNT_SORTED;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment