Skip to content

Instantly share code, notes, and snippets.

@technocrat
Created October 9, 2015 01:45
Show Gist options
  • Select an option

  • Save technocrat/9ae8db10da4be357efd8 to your computer and use it in GitHub Desktop.

Select an option

Save technocrat/9ae8db10da4be357efd8 to your computer and use it in GitHub Desktop.
Preparation of text to find specific pattern of words and then collect matches with a specific keyword in a list
import nltk
from nltk.chunk import *
from nltk.chunk.util import *
from nltk.chunk.regexp import *
from nltk import Tree
cp = nltk.RegexpParser('CHUNK: {<NN> <VB> <IN> <NN>}')
bucket = []
brown = nltk.corpus.brown
for sent in brown.tagged_sents():
tree = cp.parse(sent)
for subtree in tree.subtrees():
if subtree.label() == 'CHUNK':
if 'sciatica' in ' '.join([(''.join(''.join(leaf[0]))) for leaf in subtree]):
bucket.append(' '.join([(''.join(''.join(leaf[0]))) for leaf in subtree]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment