Created
August 29, 2010 13:48
-
-
Save neilkod/556299 to your computer and use it in GitHub Desktop.
canabalt scores
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- load one day's worth of tweets and find canabalt scores | |
register piggybank-0.3-amzn.jar | |
DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT(); | |
raw = LOAD '20100624.txt' USING PigStorage('\t') AS (id:chararray,timestamp:chararray,screenname:chararray,tweet:chararray); | |
fltr = FILTER raw BY tweet matches 'I ran \\d*?m before.* on my.*http://www.canabalt.com/'; | |
thedata = FOREACH fltr GENERATE screenname,timestamp,FLATTEN(EXTRACT(tweet,'I ran (\\d*?)m before (.*) .*on my (i.*)\\. .*')),id; | |
dump thedata | |
sample tweet: http://twitter.com/sfbayrealtor/status/16905379203 | |
data format (screen name, timestamp, score, method of death, device, tweet id) | |
sample output from a day's worth of tweets | |
(sfbayrealtor,Thu Jun 24 04:59:15 +0000 2010,3220,hitting a wall and tumbling to my death,iPhone,16905379203) | |
(MrMacbook97,Thu Jun 24 06:15:22 +0000 2010,14366,hitting a wall and tumbling to my death,iPad,16909181913) | |
(DJFriar,Thu Jun 24 06:24:29 +0000 2010,1601, turning into a fine mist,iPad,16909591508) | |
(gusgus_,Thu Jun 24 06:38:27 +0000 2010,2502,hitting a wall and tumbling to my death,iPhone,16910215309) | |
(imnotstu,Thu Jun 24 08:13:22 +0000 2010,2032,somehow hitting the edge of a crane,iPhone,16914225900) | |
(thezez,Thu Jun 24 08:54:22 +0000 2010,7478,hitting a wall and tumbling to my death,iPad,16915857511) | |
(KawiKami,Thu Jun 24 09:53:33 +0000 2010,1268,hitting a wall and tumbling to my death,iPhone,16918218311) | |
(jagreda,Thu Jun 24 11:32:42 +0000 2010,6336, turning into a fine mist,iPhone,16922509009) | |
(applenerd106,Thu Jun 24 11:39:09 +0000 2010,5756,hitting a wall and tumbling to my death,iPhone,16922816903) | |
(agence84,Thu Jun 24 12:01:42 +0000 2010,4012, missing another window,iPad,16923949713) | |
(MinamiKokutou,Thu Jun 24 13:30:16 +0000 2010,2042, turning into a fine mist,iPhone,16929163105) | |
(cliffpro,Thu Jun 24 14:39:33 +0000 2010,4291,hitting a wall and tumbling to my death,iPhone,16933847610) | |
(RMM1982,Thu Jun 24 17:49:59 +0000 2010,7499,hitting a wall and tumbling to my death,iPad,16947666501) | |
(bradyman10000,Thu Jun 24 19:49:11 +0000 2010,3097, missing another window,iPhone,16956032203) | |
(ITNet1,Thu Jun 24 20:04:48 +0000 2010,2299,hitting a wall and tumbling to my death,iPad,16957068205) | |
(kevinglew,Thu Jun 24 20:51:05 +0000 2010,7060, missing another window,iPad,16960616000) | |
(niklaswick,Thu Jun 24 21:32:21 +0000 2010,4526,colliding with some enormous obstacle,iPad,16962952806) | |
(crazyqbygrl,Thu Jun 24 21:45:45 +0000 2010,2425, turning into a fine mist,iPhone,16963691305) | |
(kiyo_mori,Thu Jun 24 21:59:05 +0000 2010,1468,hitting a wall and tumbling to my death,iPad,16964421803) | |
(TerryBWhite,Thu Jun 24 22:05:12 +0000 2010,681,hitting a wall and tumbling to my death,iPad,16964773813) | |
(Soundtrooper,Thu Jun 24 22:44:58 +0000 2010,5486, missing another window,iPad,16967018412) | |
(fluffymcduff,Thu Jun 24 23:14:33 +0000 2010,3038, missing another window,iPad,16968736912) | |
(_NoSpaces_,Fri Jun 25 00:08:13 +0000 2010,2905, missing another window,iPhone,16971939712) | |
(chemapop,Fri Jun 25 00:18:12 +0000 2010,5379, missing another window,iPhone,16972534014) | |
(marcelocx,Fri Jun 25 01:44:22 +0000 2010,2260,hitting a wall and tumbling to my death,iPad,16977839702) | |
(GiLLyfRESH,Fri Jun 25 02:34:45 +0000 2010,3004,hitting a wall and tumbling to my death,iPod touch,16981057111) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment