Skip to content

Instantly share code, notes, and snippets.

@neilkod
Created June 29, 2010 17:08
Show Gist options
  • Save neilkod/457496 to your computer and use it in GitHub Desktop.
Save neilkod/457496 to your computer and use it in GitHub Desktop.
neil-kodners-MacBook-Pro:parsed nkodner$ ls -ltrh 20100617.txt
-rw-r--r-- 1 nkodner staff 697M Jun 28 20:17 20100617.txt
neil-kodners-MacBook-Pro:parsed nkodner$ wc -l 20100617.txt
5194723 20100617.txt
neil-kodners-MacBook-Pro:parsed nkodner$ time cat 20100617.txt |awk -F\t '{print $3'}|sort|uniq -c|sort -rg > srtd.out
real 5m13.857s
user 2m53.029s
sys 0m4.201s
pig script:
raw = LOAD '20100617.txt' using PigStorage('\t') as (id,time,user,tweet);
usrs = GROUP raw by user;
cntd = FOREACH usrs GENERATE $0 as user,COUNT(raw) as cnt;
srtd = ORDER cntd BY cnt;
store srtd INTO 'output';
neil-kodners-MacBook-Pro:parsed nkodner$ time pig -x local count_by_username.pig
real 4m19.268s
user 2m36.597s
sys 0m14.996s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment