Created
October 21, 2010 13:20
-
-
Save neilkod/638471 to your computer and use it in GitHub Desktop.
concordance for a few terms in zoolander. just for fun
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
import nltk | |
import string | |
f = open('zoolander.txt','r').read() | |
# favorite way to strip punctuation, found on stackoverflow | |
# http://stackoverflow.com/questions/265960 | |
# it helped the concordance results a little bit but this might | |
# not be the best approach. | |
# concordance is sensitive to punctuation in tokens, i dont want it | |
# for my sample output. | |
# i'd like feedback on this | |
f = f.translate(string.maketrans("",""), string.punctuation) | |
# create an nltk text object | |
foo=nltk.Text(f.split()) | |
# terms to run concordance against. tried a few funny | |
# terms from the movie | |
terms = ['mugatu','hot','good','model', | |
'stupid','freak','coal','work','derek', | |
'hansel','read','kill','underwear'] | |
# loop through the terms, print header row, the output, and | |
# a blank line | |
for term in terms: | |
print "concordance for %s....." % (term) | |
foo.concordance(term) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment