Skip to content

Instantly share code, notes, and snippets.

@language-engineering
Created October 24, 2012 11:47
Show Gist options
  • Select an option

  • Save language-engineering/3945658 to your computer and use it in GitHub Desktop.

Select an option

Save language-engineering/3945658 to your computer and use it in GitHub Desktop.
from nltk.util import bigrams, trigrams
example_tagged_words = [('The', 'DT'), ('little', 'JJ'), ('badgers', 'NNS'), ('ate', 'VBP'), ('some', 'DT'), ('jam', 'NN')]
bi_grams = bigrams(example_tagged_words)
tri_grams = trigrams(example_tagged_words)
#You can even use "extract_by_pos" and "untag_sequence" on bigrams and trigrams
bigram_regex = [("J+","N+")] #Pattern: all adjectives followed by nouns
print untag_sequence(extract_by_pos(bi_grams,bigram_regex))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment