Skip to content

Instantly share code, notes, and snippets.

@fulmicoton
Last active December 17, 2015 06:09
Show Gist options
  • Save fulmicoton/5563104 to your computer and use it in GitHub Desktop.
Save fulmicoton/5563104 to your computer and use it in GitHub Desktop.
Computes the 10 most frequent token from stdin or a couple of files.
import re
import fileinput
from collections import Counter
token_ptn = re.compile("[^\w]+")
word_count = Counter()
for line in fileinput.input():
for token in token_ptn.split(line):
if token != "":
word_count[token] += 1
for (k,v) in word_count.most_common(10):
print k, v
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment