Skip to content

Instantly share code, notes, and snippets.

@cablehead
Created February 1, 2013 14:21
Show Gist options
  • Select an option

  • Save cablehead/4691574 to your computer and use it in GitHub Desktop.

Select an option

Save cablehead/4691574 to your computer and use it in GitHub Desktop.
import os
uniques = {}
n = 0
for line in open(os.path.dirname(__file__)+'/part-r-00000'):
blog_id, unique_id = [int(x) for x in line.strip().split()]
uniques.setdefault(blog_id, set()).add(unique_id)
n += 1
if not n % 10000: print n
print len(uniques)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment