Skip to content

Instantly share code, notes, and snippets.

@codebynumbers
Created June 29, 2013 00:31
Show Gist options
  • Select an option

  • Save codebynumbers/5889116 to your computer and use it in GitHub Desktop.

Select an option

Save codebynumbers/5889116 to your computer and use it in GitHub Desktop.
Split data into multiple files
import sys
path = sys.argv[1]
ext = 'txt'
dir = 'output'
limit = 10000
limits = {}
with open(path) as file:
for line in file:
try:
(country, keyword, count) = line.strip().split("\t")
except:
continue
if len(country) != 2:
continue
if limits.get(country, 0) < limit:
with open("%s/%s.%s" % (dir, country, ext), 'a') as output:
output.write(keyword)
output.write("\n")
limits[country] = limits.get(country, 0) + 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment