Skip to content

Instantly share code, notes, and snippets.

@neilkod
Created October 9, 2010 15:09
Show Gist options
  • Save neilkod/618275 to your computer and use it in GitHub Desktop.
Save neilkod/618275 to your computer and use it in GitHub Desktop.
-- load all of the parsed tweets as (id, timestamp, screenname, tweet)
raw = LOAD 'parsed' USING PigStorage('\t') AS (id:chararray,timestamp:chararray,screenname:chararray,tweet:chararray);
-- limit the twitter data to where screenname matches bieber, case-insensitive
fltr = FILTER raw BY screenname matches '.*?[Bb][Ii][Ee][Bb][Ee][Rr].*?';
-- group by screenname to count the tweets
grpd = GROUP fltr BY screenname;
-- count the tweets per screenname. note that these are total tweets, not
-- bieber tweets :D
cntd = FOREACH grpd GENERATE $0,COUNT($1) as cnt;
-- order the list of screennames by count.
srtd = ORDER cntd BY cnt;
-- write the output file
store srtd INTO 'usernameswithbieber2';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment