Created
October 9, 2010 15:09
-
-
Save neilkod/618275 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- load all of the parsed tweets as (id, timestamp, screenname, tweet) | |
raw = LOAD 'parsed' USING PigStorage('\t') AS (id:chararray,timestamp:chararray,screenname:chararray,tweet:chararray); | |
-- limit the twitter data to where screenname matches bieber, case-insensitive | |
fltr = FILTER raw BY screenname matches '.*?[Bb][Ii][Ee][Bb][Ee][Rr].*?'; | |
-- group by screenname to count the tweets | |
grpd = GROUP fltr BY screenname; | |
-- count the tweets per screenname. note that these are total tweets, not | |
-- bieber tweets :D | |
cntd = FOREACH grpd GENERATE $0,COUNT($1) as cnt; | |
-- order the list of screennames by count. | |
srtd = ORDER cntd BY cnt; | |
-- write the output file | |
store srtd INTO 'usernameswithbieber2'; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment