Skip to content

Instantly share code, notes, and snippets.

@neilkod
Created August 3, 2010 21:45
Show Gist options
  • Save neilkod/507212 to your computer and use it in GitHub Desktop.
Save neilkod/507212 to your computer and use it in GitHub Desktop.
# reducer.py
# input data looks like
# (1,2) 1
# (2,3) 1
# (1,3) 1
# (1,3) 1
# need to count each group and sum the totals.
#!/usr/bin/python
threshhold = 8
import sys
from collections import defaultdict
cnts = defaultdict(int)
for line in sys.stdin:
line = line.strip()
k,v = line.split('\t')
(item1,item2)=k.split(',')
cnts[(int(item1),int(item2))] += 1
for key,val in cnts.items():
if val >= threshhold:
print '%s\t%s' % (key,val)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment