Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save pbamotra/d9fc4a522fece613244b to your computer and use it in GitHub Desktop.
Save pbamotra/d9fc4a522fece613244b to your computer and use it in GitHub Desktop.

NLTK API to Stanford NLP Tools compiled on 2015-12-09

Stanford NER

With NLTK version 3.1 and Stanford NER tool 2015-12-09, it is possible to hack the StanfordNERTagger._stanford_jar to include other .jar files that are necessary for the new tagger.

First set up the environment variables as per instructed at https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software

alvas@ubi:~$ cd
alvas@ubi:~$ wget http://nlp.stanford.edu/software/stanford-ner-2015-12-09.zip
alvas@ubi:~$ unzip stanford-ner-2015-12-09.zip
alvas@ubi:~$ ls stanford-ner-2015-12-09/
build.xml    LICENSE.txt   ner-gui.bat      ner.sh                 sample.ner.txt     stanford-ner-3.6.0.jar          stanford-ner.jar
classifiers  ner.bat       ner-gui.command  README.txt             sample.txt         stanford-ner-3.6.0-javadoc.jar
lib          NERDemo.java  ner-gui.sh       sample-conll-file.txt  sample-w-time.txt  stanford-ner-3.6.0-sources.jar
alvas@ubi:~$ export STANFORDTOOLSDIR=$HOME
alvas@ubi:~$ export CLASSPATH=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/stanford-ner.jar
alvas@ubi:~$ export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/classifiers
alvas@ubi:~$ python

After adding the environment variables correctly, then in Python, with the Stanford NER Tagger stanford-ner-2015-04-20 you could just do this:

>>> import nltk
>>> nltk.__version__
'3.1'
>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

But with the stanford-ner-2015-12-09, the python lines above will raise an error:

CRFClassifier invoked on Mon Dec 28 16:23:41 CET 2015 with arguments:
   -loadClassifier /home/alvas/stanford-ner-2015-12-09/classifiers/english.all.3class.distsim.crf.ser.gz -textFile /tmp/tmpdMUhrq -outputFormat slashTags -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerOptions "tokenizeNLs=false" -encoding utf8
tokenizerFactory=edu.stanford.nlp.process.WhitespaceTokenizer
tokenizerOptions="tokenizeNLs=false"
loadClassifier=/home/alvas/stanford-ner-2015-12-09/classifiers/english.all.3class.distsim.crf.ser.gz
encoding=utf8
textFile=/tmp/tmpdMUhrq
outputFormat=slashTags
Loading classifier from /home/alvas/stanford-ner-2015-12-09/classifiers/english.all.3class.distsim.crf.ser.gz ... done [2.2 sec].
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
	at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
	at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1117)
	at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1076)
	at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1057)
	at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3088)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 5 more

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 66, in tag
    return sum(self.tag_sents([tokens]), []) 
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 89, in tag_sents
    stdout=PIPE, stderr=PIPE)
  File "/usr/local/lib/python2.7/dist-packages/nltk/internals.py", line 134, in java
    raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : [u'/usr/bin/java', '-mx1000m', '-cp', '/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', '/home/alvas/stanford-ner-2015-12-09/classifiers/english.all.3class.distsim.crf.ser.gz', '-textFile', '/tmp/tmpdMUhrq', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']

To resolve this you could either wait 1-2 months more for the fixes to this issue: nltk/nltk#1237

Or try out this hack that will automatically add the other .jar files to the Java -cp:

>>> import nltk
>>> nltk.__version__
'3.1'
>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
>>> print st._stanford_jar
/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar
>>> stanford_dir = st._stanford_jar.rpartition('/')[0]
>>> from nltk.internals import find_jars_within_path
>>> stanford_jars = find_jars_within_path(stanford_dir)
>>> print ":".join(stanford_jars)
/home/alvas/stanford-ner-2015-12-09/stanford-ner-3.6.0-javadoc.jar:/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar:/home/alvas/stanford-ner-2015-12-09/stanford-ner-3.6.0.jar:/home/alvas/stanford-ner-2015-12-09/stanford-ner-3.6.0-sources.jar:/home/alvas/stanford-ner-2015-12-09/lib/slf4j-simple.jar:/home/alvas/stanford-ner-2015-12-09/lib/joda-time.jar:/home/alvas/stanford-ner-2015-12-09/lib/slf4j-api.jar:/home/alvas/stanford-ner-2015-12-09/lib/jollyday-0.4.7.jar:/home/alvas/stanford-ner-2015-12-09/lib/stanford-ner-resources.jar
>>> st._stanford_jar = ':'.join(stanford_jars)
>>> print st._stanford_jar
/home/alvas/stanford-ner-2015-12-09/stanford-ner-3.6.0-javadoc.jar:/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar:/home/alvas/stanford-ner-2015-12-09/stanford-ner-3.6.0.jar:/home/alvas/stanford-ner-2015-12-09/stanford-ner-3.6.0-sources.jar:/home/alvas/stanford-ner-2015-12-09/lib/slf4j-simple.jar:/home/alvas/stanford-ner-2015-12-09/lib/joda-time.jar:/home/alvas/stanford-ner-2015-12-09/lib/slf4j-api.jar:/home/alvas/stanford-ner-2015-12-09/lib/jollyday-0.4.7.jar:/home/alvas/stanford-ner-2015-12-09/lib/stanford-ner-resources.jar
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')

Stanford POS

Since the Stanford POS tagger is uses similar API to the NER tool you can do the same ._stanford_jar hack.

First make sure you've export the environment variables correctly:

alvas@ubi:~$ cd 
alvas@ubi:~$ wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-12-09.zip
alvas@ubi:~$ unzip stanford-postagger-full-2015-12-09.zip
alvas@ubi:~$ ls stanford-postagger-full-2015-12-09
build.xml  LICENSE.txt  sample-input.txt              stanford-postagger-3.6.0-javadoc.jar  stanford-postagger-gui.bat  stanford-postagger.sh
data       models       sample-output.txt             stanford-postagger-3.6.0-sources.jar  stanford-postagger-gui.sh   TaggerDemo2.java
lib        README.txt   stanford-postagger-3.6.0.jar  stanford-postagger.bat                stanford-postagger.jar      TaggerDemo.java
alvas@ubi:~$ export STANFORDTOOLSDIR=$HOME
alvas@ubi:~$ export CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-12-09/stanford-postagger.jar
alvas@ubi:~$ export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-12-09/models
alvas@ubi:~$ python

With the Stanford POS Tagger stanford-postagger-full-2015-04-20 you could just do this:

>>> from nltk.tag import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())

But with ``stanford-postagger-full-2015-12-09`, you will see the following error:

Loading default properties from tagger /home/alvas/stanford-postagger-full-2015-12-09/models/english-bidirectional-distsim.tagger
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
	at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
	at edu.stanford.nlp.tagger.maxent.TaggerConfig.<init>(TaggerConfig.java:146)
	at edu.stanford.nlp.tagger.maxent.TaggerConfig.<init>(TaggerConfig.java:128)
	at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1836)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 4 more

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 66, in tag
    return sum(self.tag_sents([tokens]), []) 
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 89, in tag_sents
    stdout=PIPE, stderr=PIPE)
  File "/usr/local/lib/python2.7/dist-packages/nltk/internals.py", line 134, in java
    raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : [u'/usr/bin/java', '-mx1000m', '-cp', '/home/alvas/stanford-postagger-full-2015-12-09/stanford-postagger.jar', 'edu.stanford.nlp.tagger.maxent.MaxentTagger', '-model', '/home/alvas/stanford-postagger-full-2015-12-09/models/english-bidirectional-distsim.tagger', '-textFile', '/tmp/tmpXIgwjP', '-tokenize', 'false', '-outputFormatOptions', 'keepEmptySentences', '-encoding', 'utf8']

Similarly use the same hack as per the NER Tagger and the Stanford POS tagger should work:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.tag import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> print st._stanford_jar
/home/alvas/stanford-postagger-full-2015-12-09/stanford-postagger.jar
>>> stanford_dir = st._stanford_jar.rpartition('/')[0]
>>> stanford_jars = find_jars_within_path(stanford_dir)
>>> st._stanford_jar = ':'.join(stanford_jars)
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

Stanford Parser

Similarly, export the environment variables correctly first:

alvas@ubi:~$ cd
alvas@ubi:~$ wget http://nlp.stanford.edu/software/stanford-parser-full-2015-12-09.zip
alvas@ubi:~$ unzip stanford-parser-full-2015-12-09.zip
alvas@ubi:~$ ls stanford-parser-full-2015-12-09
bin                        ejml-0.23.jar          lexparser-gui.sh              LICENSE.txt       README_dependencies.txt  StanfordDependenciesManual.pdf
build.xml                  ejml-0.23-src.zip      lexparser_lang.def            Makefile          README.txt               stanford-parser-3.6.0-javadoc.jar
conf                       lexparser.bat          lexparser-lang.sh             ParserDemo2.java  ShiftReduceDemo.java     stanford-parser-3.6.0-models.jar
data                       lexparser-gui.bat      lexparser-lang-train-test.sh  ParserDemo.java   slf4j-api.jar            stanford-parser-3.6.0-sources.jar
DependencyParserDemo.java  lexparser-gui.command  lexparser.sh                  pom.xml           slf4j-simple.jar         stanford-parser.jar
alvas@ubi:~$ export STANFORDTOOLSDIR=$HOME
alvas@ubi:~$ export CLASSPATH=$STANFORDTOOLSDIR/stanford-parser-full-2015-12-09/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar

Then in python with the stanford-parser-full-2015-04-20, the following code would work:

>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))

But with the stanford-parser-full-2015-12-09, you will see this error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
	at edu.stanford.nlp.parser.common.ParserGrammar.<clinit>(ParserGrammar.java:46)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 1 more

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nltk/parse/stanford.py", line 128, in raw_parse
    return next(self.raw_parse_sents([sentence], verbose))
  File "/usr/local/lib/python2.7/dist-packages/nltk/parse/stanford.py", line 146, in raw_parse_sents
    return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))
  File "/usr/local/lib/python2.7/dist-packages/nltk/parse/stanford.py", line 212, in _execute
    stdout=PIPE, stderr=PIPE)
  File "/usr/local/lib/python2.7/dist-packages/nltk/internals.py", line 134, in java
    raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : [u'/usr/bin/java', u'-mx1000m', '-cp', '/home/alvas/stanford-parser-full-2015-12-09/stanford-parser.jar:/home/alvas/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar', u'edu.stanford.nlp.parser.lexparser.LexicalizedParser', u'-model', 'edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz', u'-sentences', u'newline', u'-outputFormat', u'penn', u'-encoding', u'utf8', '/tmp/tmpi_MZus']

So to resolve this, we can do the same classpath hack but since the API is written differently the classpath variable is stored in the StanfordParser differently:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> parser._classpath
('/home/alvas/stanford-parser-full-2015-12-09/stanford-parser.jar', '/home/alvas/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar')
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> stanford_dir
'/home/alvas/stanford-parser-full-2015-12-09'
>>> parser._classpath = tuple(find_jars_within_path(stanford_dir))
>>> parser._classpath
('/home/alvas/stanford-parser-full-2015-12-09/ejml-0.23.jar', '/home/alvas/stanford-parser-full-2015-12-09/slf4j-simple.jar', '/home/alvas/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar', '/home/alvas/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-javadoc.jar', '/home/alvas/stanford-parser-full-2015-12-09/stanford-parser.jar', '/home/alvas/stanford-parser-full-2015-12-09/slf4j-api.jar', '/home/alvas/stanford-parser-full-2015-12-09/stanford-parser-3.6.0-sources.jar')
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']), Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

Similarly with the StanfordDepdencyParser API in NLTK:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordDependencyParser
>>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> stanford_dir = dep_parser._classpath[0].rpartition('/')[0]
>>> dep_parser._classpath = tuple(find_jars_within_path(stanford_dir))

>>> print next(dep_parser.raw_parse("The quick brown fox jumps over the lazy dog."))
defaultdict(<function <lambda> at 0x7fa92233d9b0>, {0: {u'ctag': u'TOP', u'head': None, u'word': None, u'deps': defaultdict(<type 'list'>, {u'root': [5]}), u'lemma': None, u'tag': u'TOP', u'rel': None, u'address': 0, u'feats': None}, 1: {u'ctag': u'DT', u'head': 4, u'deps': defaultdict(<type 'list'>, {}), u'tag': u'DT', u'address': 1, u'word': u'The', u'lemma': u'_', u'rel': u'det', u'feats': u'_'}, 2: {u'ctag': u'JJ', u'head': 4, u'deps': defaultdict(<type 'list'>, {}), u'tag': u'JJ', u'address': 2, u'word': u'quick', u'lemma': u'_', u'rel': u'amod', u'feats': u'_'}, 3: {u'ctag': u'JJ', u'head': 4, u'deps': defaultdict(<type 'list'>, {}), u'tag': u'JJ', u'address': 3, u'word': u'brown', u'lemma': u'_', u'rel': u'amod', u'feats': u'_'}, 4: {u'ctag': u'NN', u'head': 5, u'deps': defaultdict(<type 'list'>, {u'det': [1], u'amod': [2, 3]}), u'tag': u'NN', u'address': 4, u'word': u'fox', u'lemma': u'_', u'rel': u'nsubj', u'feats': u'_'}, 5: {u'ctag': u'VBZ', u'head': 0, u'deps': defaultdict(<type 'list'>, {u'nmod': [9], u'nsubj': [4]}), u'tag': u'VBZ', u'address': 5, u'word': u'jumps', u'lemma': u'_', u'rel': u'root', u'feats': u'_'}, 6: {u'ctag': u'IN', u'head': 9, u'deps': defaultdict(<type 'list'>, {}), u'tag': u'IN', u'address': 6, u'word': u'over', u'lemma': u'_', u'rel': u'case', u'feats': u'_'}, 7: {u'ctag': u'DT', u'head': 9, u'deps': defaultdict(<type 'list'>, {}), u'tag': u'DT', u'address': 7, u'word': u'the', u'lemma': u'_', u'rel': u'det', u'feats': u'_'}, 8: {u'ctag': u'JJ', u'head': 9, u'deps': defaultdict(<type 'list'>, {}), u'tag': u'JJ', u'address': 8, u'word': u'lazy', u'lemma': u'_', u'rel': u'amod', u'feats': u'_'}, 9: {u'ctag': u'NN', u'head': 5, u'deps': defaultdict(<type 'list'>, {u'case': [6], u'det': [7], u'amod': [8]}), u'tag': u'NN', u'address': 9, u'word': u'dog', u'lemma': u'_', u'rel': u'nmod', u'feats': u'_'}})

>>> print [parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")]
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy'])])]

For Neural Dependency Parser, it works a little different from the NLTK API.

First get the StanfordCoreNLP:

alvas@ubi:~$ cd
alvas@ubi:~$ wget http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip
alvas@ubi:~$ unzip stanford-corenlp-full-2015-12-09.zip
alvas@ubi:~$ ls stanford-corenlp-full-2015-12-09/
build.xml            input.txt.xml                   LIBRARY-LICENSES  SemgrexDemo.java                    stanford-corenlp-3.6.0-sources.jar  tokensregex
corenlp.sh           javax.json-api-1.0-sources.jar  LICENSE.txt       ShiftReduceDemo.java                StanfordCoreNlpDemo.java            xom-1.2.10-src.jar
CoreNLP-to-HTML.xsl  javax.json.jar                  Makefile          slf4j-api.jar                       StanfordDependenciesManual.pdf      xom.jar
ejml-0.23.jar        joda-time-2.9-sources.jar       patterns          slf4j-simple.jar                    sutime
ejml-0.23-src.zip    joda-time.jar                   pom.xml           stanford-corenlp-3.6.0.jar          test.txt
input.txt            jollyday-0.4.7-sources.jar      protobuf.jar      stanford-corenlp-3.6.0-javadoc.jar  test.txt.json
input.txt.out        jollyday.jar                    README.txt        stanford-corenlp-3.6.0-models.jar   test.txt.out

Then export the environment variables:

alvas@ubi:~$ export STANFORDTOOLSDIR=$HOME
alvas@ubi:~$ export CLASSPATH=$STANFORDTOOLSDIR/stanford-corenlp-full-2015-12-09/stanford-corenlp-3.6.0.jar:$STANFORDTOOLSDIR/stanford-corenlp-full-2015-12-09/stanford-corenlp-3.6.0-models.jar
alvas@ubi:~$ python

And in python:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m' # To increase the amount of RAM it can use.
>>> [parse.tree() for parse in parser.raw_parse("The quick brown fox jumps over the lazy dog.")]
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy']), '.'])]

Since the CoreNLP can take a while to load all the models before parsing, it's best to use raw_parse_sents instead of raw_parse when parsing more than one sentence:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m' # To increase the amount of RAM it can use.
>>> sentences = ("The quick brown fox jumps over the lazy dog.", "The quick grey wolf jumps over the lazy fox.")
>>> parsed_sents = sum([[parse.tree() for parse in dep_graphs] for dep_graphs in parser.raw_parse_sents(sentences)], [])
>>> print parsed_sents
>>> sum([[parse.tree() for parse in dep_graphs] for dep_graphs in parser.raw_parse_sents(sentences)], [])
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy']), '.']), Tree('jumps', [Tree('wolf', ['The', 'quick', 'grey']), Tree('fox', ['over', 'the', 'lazy']), '.'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment