Created
June 29, 2010 17:08
-
-
Save neilkod/457496 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
neil-kodners-MacBook-Pro:parsed nkodner$ ls -ltrh 20100617.txt | |
-rw-r--r-- 1 nkodner staff 697M Jun 28 20:17 20100617.txt | |
neil-kodners-MacBook-Pro:parsed nkodner$ wc -l 20100617.txt | |
5194723 20100617.txt | |
neil-kodners-MacBook-Pro:parsed nkodner$ time cat 20100617.txt |awk -F\t '{print $3'}|sort|uniq -c|sort -rg > srtd.out | |
real 5m13.857s | |
user 2m53.029s | |
sys 0m4.201s | |
pig script: | |
raw = LOAD '20100617.txt' using PigStorage('\t') as (id,time,user,tweet); | |
usrs = GROUP raw by user; | |
cntd = FOREACH usrs GENERATE $0 as user,COUNT(raw) as cnt; | |
srtd = ORDER cntd BY cnt; | |
store srtd INTO 'output'; | |
neil-kodners-MacBook-Pro:parsed nkodner$ time pig -x local count_by_username.pig | |
real 4m19.268s | |
user 2m36.597s | |
sys 0m14.996s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment