Skip to content

Instantly share code, notes, and snippets.

View jwills's full-sized avatar

Josh Wills jwills

View GitHub Profile
@jwills
jwills / gist:2047314
Created March 15, 2012 22:13
Pagerank in Pig
#!/usr/bin/python
from org.apache.pig.scripting import *
INIT = Pig.compile("""
A = LOAD 'page_links_en.nt.bz2' using PigStorage(' ') as (url:chararray, p:chararray, link:chararray);
B = GROUP A by url;
C = foreach B generate group as url, 1 as pagerank, A.link as links;
STORE C into '$docs_in';
""")