Skip to content

Instantly share code, notes, and snippets.

@bmaland
Created November 18, 2010 19:05
Show Gist options
  • Select an option

  • Save bmaland/705439 to your computer and use it in GitHub Desktop.

Select an option

Save bmaland/705439 to your computer and use it in GitHub Desktop.
fdist = nltk.FreqDist([w.lower() for w in words])
fdist_freq = nltk.FreqDist([group(freq) for freq in fdist.values()])
def group(i):
if i < 11: return str(i)
elif i in range(11, 51): return("11-50")
elif i in range(51, 101): return("51-100")
else: return(">100")
## Assuming that nltk is available, and that the text is in the current
## directory, named 'twain-tomsawyer.txt'
import nltk
reader = nltk.corpus.reader.PlaintextCorpusReader('.', 'twain-tomsawyer.txt')
words = reader.words('twain-tomsawyer.txt') # List of all the words in the text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment