Last active
November 18, 2019 05:46
-
-
Save madaan/ac801b2dc607945c30a55f40c3e5728a to your computer and use it in GitHub Desktop.
NLTK n-gram model
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from nltk.corpus import brown | |
from nltk.lm import MLE | |
from nltk import bigrams | |
from nltk.lm.preprocessing import padded_everygram_pipeline | |
def get_model(order=2): | |
lm = MLE(order) | |
train, vocab = padded_everygram_pipeline(order=2, text=brown.sents()) | |
lm.fit(train, vocab) | |
return lm | |
model = get_model() | |
model.perplexity('prints the perplexity of this sentence') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment