Skip to content

Instantly share code, notes, and snippets.

@tdhopper
Created October 14, 2013 21:01
Show Gist options
  • Select an option

  • Save tdhopper/6982151 to your computer and use it in GitHub Desktop.

Select an option

Save tdhopper/6982151 to your computer and use it in GitHub Desktop.
import re, string, sys, pandas
stops = set(open("../stop_words.txt").read().split(",") + list(string.ascii_lowercase))
words = [x.lower() for x in re.split("[^a-zA-Z]+", open("../pride-and-prejudice.txt").read()) if len(x) > 0 and x.lower() not in stops]
print pandas.Series(words).value_counts().head(25)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment