Created
January 3, 2020 23:21
-
-
Save victoriastuart/9d63ad8fd7e05c65ddcbd02199ee81f3 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ============================================================================== | |
| file: /mnt/Vancouver/apps/CoreNLP/_victoria/gist_for_SO44910934.txt | |
| title: "CoreNLP" {Java | Python} Gist for StackOverflow #44910934 | |
| author: Victoria A. Stuart | |
| created: 2020-01-03 | |
| version: 01 | |
| last modified: 2020-01-03 | |
| Versions: | |
| * v01 : this | |
| ============================================================================== | |
| To accompany code described in https://stackoverflow.com/a/59549039/1904943 | |
| ============================================================================== | |
| JAVA | |
| ============================================================================== | |
| [victoria@victoria _victoria]$ cd /mnt/Vancouver/apps/CoreNLP/src-local/stanford-corenlp-full-2018-10-05/ | |
| [victoria@victoria stanford-corenlp-full-2018-10-05]$ date; pwd; echo; ls -l | |
| Fri 03 Jan 2020 02:42:29 PM PST | |
| /mnt/Vancouver/apps/CoreNLP/src-local/stanford-corenlp-full-2018-10-05 | |
| total 1400680 | |
| -rw-r--r-- 1 victoria victoria 3340 Dec 31 14:15 BasicPipelineExample.class | |
| -rw-r--r-- 1 victoria victoria 4666 Dec 31 13:33 BasicPipelineExample.java | |
| -rw-r--r-- 1 victoria victoria 6103 Oct 8 2018 build.xml | |
| ... | |
| -rw-r--r-- 1 victoria victoria 8146873 Oct 8 2018 stanford-corenlp-3.9.2.jar | |
| -rw-r--r-- 1 victoria victoria 9687426 Oct 8 2018 stanford-corenlp-3.9.2-javadoc.jar | |
| -rw-r--r-- 1 victoria victoria 362565193 Oct 8 2018 stanford-corenlp-3.9.2-models.jar | |
| -rw-r--r-- 1 victoria victoria 5370905 Oct 8 2018 stanford-corenlp-3.9.2-sources.jar | |
| -rw-r--r-- 1 victoria victoria 7240 Oct 8 2018 StanfordCoreNlpDemo.java | |
| -rw-r--r-- 1 victoria victoria 199885 Oct 8 2018 StanfordDependenciesManual.pdf | |
| -rw-r--r-- 1 victoria victoria 1038970602 Dec 31 14:07 stanford-english-corenlp-2018-10-05-models.jar | |
| ... | |
| [victoria@victoria stanford-corenlp-full-2018-10-05]$ time java -cp .:* BasicPipelineExample | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos | |
| [main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec]. | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner | |
| [main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [0.9 sec]. | |
| [main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.5 sec]. | |
| [main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.4 sec]. | |
| [main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. | |
| [main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt | |
| [main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580704 unique entries out of 581863 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns. | |
| [main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns. | |
| [main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585573 unique entries from 2 files | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse | |
| [main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec]. | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse | |
| [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... | |
| [main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 7.547 (s) | |
| [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [11.4 sec]. | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref | |
| [main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref model edu/stanford/nlp/models/coref/neural/english-model-default.ser.gz ... done [0.4 sec]. | |
| [main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref embeddings edu/stanford/nlp/models/coref/neural/english-embeddings.ser.gz ... done [0.4 sec]. | |
| [main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: rule | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator kbp | |
| [main] INFO edu.stanford.nlp.pipeline.KBPAnnotator - Loading KBP classifier from: edu/stanford/nlp/models/kbp/english/tac-re-lr.ser.gz | |
| [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator quote | |
| [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... | |
| [main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 8.286 (s) | |
| [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [9.3 sec]. | |
| [main] INFO edu.stanford.nlp.pipeline.QuoteAnnotator - Setting quotes. | |
| Example: token | |
| he-4 | |
| Example: sentence | |
| Joe Smith was born in California. | |
| Example: pos tags | |
| [IN, CD, ,, PRP, VBD, TO, NNP, ,, NNP, IN, DT, NN, .] | |
| Example: ner tags | |
| [O, DATE, O, O, O, O, CITY, O, COUNTRY, O, O, DATE, O] | |
| Example: constituency parse | |
| (ROOT (S (PP (IN In) (NP (CD 2017))) (, ,) (NP (PRP he)) (VP (VBD went) (PP (TO to) (NP (NNP Paris) (, ,) (NNP France))) (PP (IN in) (NP (DT the) (NN summer)))) (. .))) | |
| Example: dependency parse | |
| -> went/VBD (root) | |
| -> 2017/CD (nmod:in) | |
| -> In/IN (case) | |
| -> ,/, (punct) | |
| -> he/PRP (nsubj) | |
| -> Paris/NNP (nmod:to) | |
| -> to/TO (case) | |
| -> ,/, (punct) | |
| -> France/NNP (appos) | |
| -> summer/NN (nmod:in) | |
| -> in/IN (case) | |
| -> the/DT (det) | |
| -> ./. (punct) | |
| Example: relation | |
| 1.0 Jane Smith per:siblings Joe Smith | |
| Example: entity mentions | |
| [2017, Paris, France, summer, he] | |
| Example: original entity mention | |
| Joe | |
| Example: canonical entity mention | |
| Joe Smith | |
| Example: coref chains for document | |
| {23=CHAIN23-["Joe Smith" in sentence 1, "he" in sentence 2, "His" in sentence 3, "Joe" in sentence 4, "He" in sentence 5, "his" in sentence 5, "Joe 's" in sentence 6], 26=CHAIN26-["his sister Jane Smith" in sentence 5, "Jane" in sentence 6, "she" in sentence 6], 12=CHAIN12-["2017" in sentence 2, "2017" in sentence 3]} | |
| Example: quote | |
| "That was delicious!" | |
| Example: original speaker of quote | |
| Joe | |
| Example: canonical speaker of quote | |
| Joe Smith | |
| 0:47.68 | |
| [victoria@victoria stanford-corenlp-full-2018-10-05]$ | |
| ============================================================================== | |
| PYTHON | |
| ============================================================================== | |
| [victoria@victoria ~]$ p37 | |
| [Python 3.7 venv (source ~/venv/py3.7/bin/activate)] | |
| (py3.7) [victoria@victoria ~]$ env | grep -i virtual | |
| VIRTUAL_ENV=/home/victoria/venv/py3.7 | |
| (py3.7) [victoria@victoria ~]$ python --version | |
| Python 3.7.4 | |
| (py3.7) [victoria@victoria ~]$ date | |
| Fri 03 Jan 2020 02:49:42 PM PST | |
| (py3.7) [victoria@victoria ~]$ python | |
| Python 3.7.4 (default, Nov 20 2019, 11:36:53) | |
| [GCC 9.2.0] on linux | |
| Type "help", "copyright", "credits" or "license" for more information. | |
| >>> import stanfordnlp | |
| >>> from stanfordnlp.server import CoreNLPClient | |
| >>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
| >>> text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.' | |
| >>> ann = client.annotate(text) | |
| Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-25ebbde9a1ad4065.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
| >>> sentence = ann.sentence[0] | |
| Traceback (most recent call last): | |
| File "<console>", line 1, in <module> | |
| AttributeError: 'str' object has no attribute 'sentence' | |
| >>> client.server.terminate() | |
| >>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
| >>> ann = client.annotate(text) | |
| Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-9043ef7d7a744b78.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
| >>> sentence = ann.sentence[0] | |
| Traceback (most recent call last): | |
| File "<console>", line 1, in <module> | |
| AttributeError: 'str' object has no attribute 'sentence' | |
| >>> [Ctrl-D] | |
| now exiting EditableBufferInteractiveConsole... | |
| (py3.7) [victoria@victoria ~]$ psgrep -l corenlp | |
| UID PID PPID C STIME TTY TIME CMD | |
| victoria 321300 296292 0 Jan02 pts/2 00:02:09 java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-55bcad5a4c00431e.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
| (py3.7) [victoria@victoria ~]$ pgrep -l -f corenlp | |
| 321300 java | |
| (py3.7) [victoria@victoria ~]$ kill -9 321300 | |
| (py3.7) [victoria@victoria ~]$ python | |
| Python 3.7.4 (default, Nov 20 2019, 11:36:53) | |
| [GCC 9.2.0] on linux | |
| Type "help", "copyright", "credits" or "license" for more information. | |
| >>> import stanfordnlp | |
| >>> from stanfordnlp.server import CoreNLPClient | |
| >>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
| >>> text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.' | |
| >>> ann = client.annotate(text) | |
| Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-ba065446f2fa404d.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
| >>> ## [took ~20" or so to start] | |
| >>> sentence = ann.sentence[0] | |
| Traceback (most recent call last): | |
| File "<console>", line 1, in <module> | |
| AttributeError: 'str' object has no attribute 'sentence' | |
| >>> ## deleted `output_format='text'` argument: | |
| >>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', timeout=30000, memory='16G') | |
| >>> ann = client.annotate(text) | |
| Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-423b84293ffe47f3.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
| >>> sentence = ann.sentence[0] | |
| >>> print(sentence) | |
| token { | |
| word: "Breast" | |
| pos: "NN" | |
| value: "Breast" | |
| before: "" | |
| after: " " | |
| originalText: "Breast" | |
| ner: "CAUSE_OF_DEATH" | |
| lemma: "breast" | |
| beginChar: 0 | |
| endChar: 6 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 0 | |
| endIndex: 1 | |
| tokenBeginIndex: 0 | |
| tokenEndIndex: 1 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "CAUSE_OF_DEATH" | |
| corefMentionIndex: 0 | |
| corefMentionIndex: 3 | |
| entityMentionIndex: 0 | |
| } | |
| token { | |
| word: "cancer" | |
| pos: "NN" | |
| value: "cancer" | |
| before: " " | |
| after: " " | |
| originalText: "cancer" | |
| ner: "CAUSE_OF_DEATH" | |
| lemma: "cancer" | |
| beginChar: 7 | |
| endChar: 13 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 1 | |
| endIndex: 2 | |
| tokenBeginIndex: 1 | |
| tokenEndIndex: 2 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "CAUSE_OF_DEATH" | |
| corefMentionIndex: 0 | |
| corefMentionIndex: 3 | |
| entityMentionIndex: 0 | |
| } | |
| token { | |
| word: "susceptibility" | |
| pos: "NN" | |
| value: "susceptibility" | |
| before: " " | |
| after: " " | |
| originalText: "susceptibility" | |
| ner: "O" | |
| lemma: "susceptibility" | |
| beginChar: 14 | |
| endChar: 28 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 2 | |
| endIndex: 3 | |
| tokenBeginIndex: 2 | |
| tokenEndIndex: 3 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 3 | |
| } | |
| token { | |
| word: "gene" | |
| pos: "NN" | |
| value: "gene" | |
| before: " " | |
| after: " " | |
| originalText: "gene" | |
| ner: "O" | |
| lemma: "gene" | |
| beginChar: 29 | |
| endChar: 33 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 3 | |
| endIndex: 4 | |
| tokenBeginIndex: 3 | |
| tokenEndIndex: 4 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 3 | |
| } | |
| token { | |
| word: "1" | |
| pos: "CD" | |
| value: "1" | |
| before: " " | |
| after: " " | |
| originalText: "1" | |
| ner: "NUMBER" | |
| normalizedNER: "1.0" | |
| lemma: "1" | |
| beginChar: 34 | |
| endChar: 35 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 4 | |
| endIndex: 5 | |
| tokenBeginIndex: 4 | |
| tokenEndIndex: 5 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "NUMBER" | |
| fineGrainedNER: "NUMBER" | |
| corefMentionIndex: 1 | |
| corefMentionIndex: 3 | |
| entityMentionIndex: 1 | |
| } | |
| token { | |
| word: "-LRB-" | |
| pos: "-LRB-" | |
| value: "-LRB-" | |
| before: " " | |
| after: "" | |
| originalText: "(" | |
| ner: "O" | |
| lemma: "-lrb-" | |
| beginChar: 36 | |
| endChar: 37 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 5 | |
| endIndex: 6 | |
| tokenBeginIndex: 5 | |
| tokenEndIndex: 6 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 3 | |
| } | |
| token { | |
| word: "BRCA1" | |
| pos: "NN" | |
| value: "BRCA1" | |
| before: "" | |
| after: "" | |
| originalText: "BRCA1" | |
| ner: "O" | |
| lemma: "brca1" | |
| beginChar: 37 | |
| endChar: 42 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 6 | |
| endIndex: 7 | |
| tokenBeginIndex: 6 | |
| tokenEndIndex: 7 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 3 | |
| corefMentionIndex: 4 | |
| } | |
| token { | |
| word: "-RRB-" | |
| pos: "-RRB-" | |
| value: "-RRB-" | |
| before: "" | |
| after: " " | |
| originalText: ")" | |
| ner: "O" | |
| lemma: "-rrb-" | |
| beginChar: 42 | |
| endChar: 43 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 7 | |
| endIndex: 8 | |
| tokenBeginIndex: 7 | |
| tokenEndIndex: 8 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 3 | |
| } | |
| token { | |
| word: "is" | |
| pos: "VBZ" | |
| value: "is" | |
| before: " " | |
| after: " " | |
| originalText: "is" | |
| ner: "O" | |
| lemma: "be" | |
| beginChar: 44 | |
| endChar: 46 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 8 | |
| endIndex: 9 | |
| tokenBeginIndex: 8 | |
| tokenEndIndex: 9 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| } | |
| token { | |
| word: "a" | |
| pos: "DT" | |
| value: "a" | |
| before: " " | |
| after: " " | |
| originalText: "a" | |
| ner: "O" | |
| lemma: "a" | |
| beginChar: 47 | |
| endChar: 48 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 9 | |
| endIndex: 10 | |
| tokenBeginIndex: 9 | |
| tokenEndIndex: 10 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 5 | |
| } | |
| token { | |
| word: "tumor" | |
| pos: "NN" | |
| value: "tumor" | |
| before: " " | |
| after: " " | |
| originalText: "tumor" | |
| ner: "CAUSE_OF_DEATH" | |
| lemma: "tumor" | |
| beginChar: 49 | |
| endChar: 54 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 10 | |
| endIndex: 11 | |
| tokenBeginIndex: 10 | |
| tokenEndIndex: 11 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "CAUSE_OF_DEATH" | |
| corefMentionIndex: 2 | |
| corefMentionIndex: 5 | |
| entityMentionIndex: 2 | |
| } | |
| token { | |
| word: "suppressor" | |
| pos: "NN" | |
| value: "suppressor" | |
| before: " " | |
| after: " " | |
| originalText: "suppressor" | |
| ner: "O" | |
| lemma: "suppressor" | |
| beginChar: 55 | |
| endChar: 65 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 11 | |
| endIndex: 12 | |
| tokenBeginIndex: 11 | |
| tokenEndIndex: 12 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 5 | |
| } | |
| token { | |
| word: "protein" | |
| pos: "NN" | |
| value: "protein" | |
| before: " " | |
| after: "" | |
| originalText: "protein" | |
| ner: "O" | |
| lemma: "protein" | |
| beginChar: 66 | |
| endChar: 73 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 12 | |
| endIndex: 13 | |
| tokenBeginIndex: 12 | |
| tokenEndIndex: 13 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| corefMentionIndex: 5 | |
| } | |
| token { | |
| word: "." | |
| pos: "." | |
| value: "." | |
| before: "" | |
| after: "" | |
| originalText: "." | |
| ner: "O" | |
| lemma: "." | |
| beginChar: 73 | |
| endChar: 74 | |
| utterance: 0 | |
| speaker: "PER0" | |
| beginIndex: 13 | |
| endIndex: 14 | |
| tokenBeginIndex: 13 | |
| tokenEndIndex: 14 | |
| hasXmlContext: false | |
| isNewline: false | |
| coarseNER: "O" | |
| fineGrainedNER: "O" | |
| } | |
| tokenOffsetBegin: 0 | |
| tokenOffsetEnd: 14 | |
| sentenceIndex: 0 | |
| characterOffsetBegin: 0 | |
| characterOffsetEnd: 74 | |
| parseTree { | |
| child { | |
| child { | |
| child { | |
| child { | |
| child { | |
| child { | |
| value: "Breast" | |
| } | |
| value: "NN" | |
| score: -13.085748672485352 | |
| } | |
| child { | |
| child { | |
| value: "cancer" | |
| } | |
| value: "NN" | |
| score: -7.361298084259033 | |
| } | |
| child { | |
| child { | |
| value: "susceptibility" | |
| } | |
| value: "NN" | |
| score: -12.832098960876465 | |
| } | |
| value: "NP" | |
| score: -39.81563186645508 | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "gene" | |
| } | |
| value: "NN" | |
| score: -7.761730194091797 | |
| } | |
| child { | |
| child { | |
| value: "1" | |
| } | |
| value: "CD" | |
| score: -4.178682804107666 | |
| } | |
| value: "NP" | |
| score: -19.19379997253418 | |
| } | |
| value: "NP" | |
| score: -62.36488342285156 | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "-LRB-" | |
| } | |
| value: "-LRB-" | |
| score: -0.06566064804792404 | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "BRCA1" | |
| } | |
| value: "NN" | |
| score: -13.365689277648926 | |
| } | |
| value: "NP" | |
| score: -16.57198715209961 | |
| } | |
| child { | |
| child { | |
| value: "-RRB-" | |
| } | |
| value: "-RRB-" | |
| score: -0.06669137626886368 | |
| } | |
| value: "PRN" | |
| score: -17.963926315307617 | |
| } | |
| value: "NP" | |
| score: -86.23522186279297 | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "is" | |
| } | |
| value: "VBZ" | |
| score: -0.14657023549079895 | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "a" | |
| } | |
| value: "DT" | |
| score: -1.4235451221466064 | |
| } | |
| child { | |
| child { | |
| value: "tumor" | |
| } | |
| value: "NN" | |
| score: -9.49818229675293 | |
| } | |
| child { | |
| child { | |
| value: "suppressor" | |
| } | |
| value: "NN" | |
| score: -10.207574844360352 | |
| } | |
| child { | |
| child { | |
| value: "protein" | |
| } | |
| value: "NN" | |
| score: -9.312461853027344 | |
| } | |
| value: "NP" | |
| score: -36.75123977661133 | |
| } | |
| value: "VP" | |
| score: -42.08717727661133 | |
| } | |
| child { | |
| child { | |
| value: "." | |
| } | |
| value: "." | |
| score: -0.003481106134131551 | |
| } | |
| value: "S" | |
| score: -131.2326202392578 | |
| } | |
| value: "ROOT" | |
| score: -131.38381958007812 | |
| } | |
| basicDependencies { | |
| node { | |
| sentenceIndex: 0 | |
| index: 1 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 2 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 3 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 4 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 5 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 6 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 7 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 8 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 9 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 10 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 11 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 12 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 13 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 14 | |
| } | |
| edge { | |
| source: 4 | |
| target: 1 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 2 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 3 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 5 | |
| dep: "nummod" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 7 | |
| dep: "appos" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 6 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 8 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 4 | |
| dep: "nsubj" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 9 | |
| dep: "cop" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 10 | |
| dep: "det" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 11 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 12 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 14 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| root: 13 | |
| } | |
| collapsedDependencies { | |
| node { | |
| sentenceIndex: 0 | |
| index: 1 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 2 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 3 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 4 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 5 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 6 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 7 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 8 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 9 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 10 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 11 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 12 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 13 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 14 | |
| } | |
| edge { | |
| source: 4 | |
| target: 1 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 2 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 3 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 5 | |
| dep: "nummod" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 7 | |
| dep: "appos" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 6 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 8 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 4 | |
| dep: "nsubj" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 9 | |
| dep: "cop" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 10 | |
| dep: "det" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 11 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 12 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 14 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| root: 13 | |
| } | |
| collapsedCCProcessedDependencies { | |
| node { | |
| sentenceIndex: 0 | |
| index: 1 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 2 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 3 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 4 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 5 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 6 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 7 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 8 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 9 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 10 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 11 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 12 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 13 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 14 | |
| } | |
| edge { | |
| source: 4 | |
| target: 1 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 2 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 3 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 5 | |
| dep: "nummod" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 7 | |
| dep: "appos" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 6 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 8 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 4 | |
| dep: "nsubj" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 9 | |
| dep: "cop" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 10 | |
| dep: "det" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 11 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 12 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 14 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| root: 13 | |
| } | |
| paragraph: 1 | |
| enhancedDependencies { | |
| node { | |
| sentenceIndex: 0 | |
| index: 1 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 2 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 3 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 4 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 5 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 6 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 7 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 8 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 9 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 10 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 11 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 12 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 13 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 14 | |
| } | |
| edge { | |
| source: 4 | |
| target: 1 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 2 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 3 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 5 | |
| dep: "nummod" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 7 | |
| dep: "appos" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 6 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 8 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 4 | |
| dep: "nsubj" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 9 | |
| dep: "cop" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 10 | |
| dep: "det" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 11 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 12 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 14 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| root: 13 | |
| } | |
| enhancedPlusPlusDependencies { | |
| node { | |
| sentenceIndex: 0 | |
| index: 1 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 2 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 3 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 4 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 5 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 6 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 7 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 8 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 9 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 10 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 11 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 12 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 13 | |
| } | |
| node { | |
| sentenceIndex: 0 | |
| index: 14 | |
| } | |
| edge { | |
| source: 4 | |
| target: 1 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 2 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 3 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 5 | |
| dep: "nummod" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 4 | |
| target: 7 | |
| dep: "appos" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 6 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 7 | |
| target: 8 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 4 | |
| dep: "nsubj" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 9 | |
| dep: "cop" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 10 | |
| dep: "det" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 11 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 12 | |
| dep: "compound" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| edge { | |
| source: 13 | |
| target: 14 | |
| dep: "punct" | |
| isExtra: false | |
| sourceCopy: 0 | |
| targetCopy: 0 | |
| language: UniversalEnglish | |
| } | |
| root: 13 | |
| } | |
| binarizedParseTree { | |
| child { | |
| child { | |
| child { | |
| child { | |
| child { | |
| child { | |
| value: "Breast" | |
| } | |
| value: "NN" | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "cancer" | |
| } | |
| value: "NN" | |
| } | |
| child { | |
| child { | |
| value: "susceptibility" | |
| } | |
| value: "NN" | |
| } | |
| value: "@NP" | |
| } | |
| value: "NP" | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "gene" | |
| } | |
| value: "NN" | |
| } | |
| child { | |
| child { | |
| value: "1" | |
| } | |
| value: "CD" | |
| } | |
| value: "NP" | |
| } | |
| value: "NP" | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "-LRB-" | |
| } | |
| value: "-LRB-" | |
| } | |
| child { | |
| child { | |
| child { | |
| child { | |
| value: "BRCA1" | |
| } | |
| value: "NN" | |
| } | |
| value: "NP" | |
| } | |
| child { | |
| child { | |
| value: "-RRB-" | |
| } | |
| value: "-RRB-" | |
| } | |
| value: "@PRN" | |
| } | |
| value: "PRN" | |
| } | |
| value: "NP" | |
| } | |
| child { | |
| child { | |
| child { | |
| child { | |
| value: "is" | |
| } | |
| value: "VBZ" | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "a" | |
| } | |
| value: "DT" | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "tumor" | |
| } | |
| value: "NN" | |
| } | |
| child { | |
| child { | |
| child { | |
| value: "suppressor" | |
| } | |
| value: "NN" | |
| } | |
| child { | |
| child { | |
| value: "protein" | |
| } | |
| value: "NN" | |
| } | |
| value: "@NP" | |
| } | |
| value: "@NP" | |
| } | |
| value: "NP" | |
| } | |
| value: "VP" | |
| } | |
| child { | |
| child { | |
| value: "." | |
| } | |
| value: "." | |
| } | |
| value: "@S" | |
| } | |
| value: "S" | |
| } | |
| value: "ROOT" | |
| } | |
| hasRelationAnnotations: false | |
| hasNumerizedTokensAnnotation: true | |
| mentions { | |
| sentenceIndex: 0 | |
| tokenStartInSentenceInclusive: 0 | |
| tokenEndInSentenceExclusive: 2 | |
| ner: "CAUSE_OF_DEATH" | |
| entityType: "CAUSE_OF_DEATH" | |
| entityMentionIndex: 0 | |
| canonicalEntityMentionIndex: 0 | |
| entityMentionText: "Breast cancer" | |
| } | |
| mentions { | |
| sentenceIndex: 0 | |
| tokenStartInSentenceInclusive: 4 | |
| tokenEndInSentenceExclusive: 5 | |
| ner: "NUMBER" | |
| normalizedNER: "1.0" | |
| entityType: "NUMBER" | |
| entityMentionIndex: 1 | |
| canonicalEntityMentionIndex: 1 | |
| entityMentionText: "1" | |
| } | |
| mentions { | |
| sentenceIndex: 0 | |
| tokenStartInSentenceInclusive: 10 | |
| tokenEndInSentenceExclusive: 11 | |
| ner: "CAUSE_OF_DEATH" | |
| entityType: "CAUSE_OF_DEATH" | |
| entityMentionIndex: 2 | |
| canonicalEntityMentionIndex: 2 | |
| entityMentionText: "tumor" | |
| } | |
| mentionsForCoref { | |
| mentionID: 0 | |
| mentionType: "NOMINAL" | |
| number: "SINGULAR" | |
| gender: "NEUTRAL" | |
| animacy: "INANIMATE" | |
| person: "UNKNOWN" | |
| startIndex: 0 | |
| endIndex: 2 | |
| headIndex: 1 | |
| headString: "cancer" | |
| nerString: "O" | |
| originalRef: 4294967295 | |
| goldCorefClusterID: -1 | |
| corefClusterID: 0 | |
| mentionNum: 1 | |
| sentNum: 0 | |
| utter: 0 | |
| paragraph: 1 | |
| isSubject: false | |
| isDirectObject: false | |
| isIndirectObject: false | |
| isPrepositionObject: false | |
| hasTwin: false | |
| generic: false | |
| isSingleton: false | |
| hasBasicDependency: true | |
| hasEnhancedDepenedncy: true | |
| hasContextParseTree: true | |
| headIndexedWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| copyCount: 0 | |
| } | |
| dependingVerb { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4294967295 | |
| } | |
| headWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 2 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 5 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 7 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 8 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 9 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 11 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 13 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| } | |
| mentionsForCoref { | |
| mentionID: 1 | |
| mentionType: "PROPER" | |
| number: "SINGULAR" | |
| gender: "UNKNOWN" | |
| animacy: "INANIMATE" | |
| person: "UNKNOWN" | |
| startIndex: 4 | |
| endIndex: 5 | |
| headIndex: 4 | |
| headString: "1" | |
| nerString: "NUMBER" | |
| originalRef: 4294967295 | |
| goldCorefClusterID: -1 | |
| corefClusterID: 1 | |
| mentionNum: 2 | |
| sentNum: 0 | |
| utter: 0 | |
| paragraph: 1 | |
| isSubject: false | |
| isDirectObject: false | |
| isIndirectObject: false | |
| isPrepositionObject: false | |
| hasTwin: false | |
| generic: false | |
| isSingleton: false | |
| hasBasicDependency: true | |
| hasEnhancedDepenedncy: true | |
| hasContextParseTree: true | |
| headIndexedWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| copyCount: 0 | |
| } | |
| dependingVerb { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4294967295 | |
| } | |
| headWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 2 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 5 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 7 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 8 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 9 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 11 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 13 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| } | |
| mentionsForCoref { | |
| mentionID: 2 | |
| mentionType: "NOMINAL" | |
| number: "SINGULAR" | |
| gender: "NEUTRAL" | |
| animacy: "INANIMATE" | |
| person: "UNKNOWN" | |
| startIndex: 10 | |
| endIndex: 11 | |
| headIndex: 10 | |
| headString: "tumor" | |
| nerString: "O" | |
| originalRef: 4294967295 | |
| goldCorefClusterID: -1 | |
| corefClusterID: 2 | |
| mentionNum: 5 | |
| sentNum: 0 | |
| utter: 0 | |
| paragraph: 1 | |
| isSubject: false | |
| isDirectObject: false | |
| isIndirectObject: false | |
| isPrepositionObject: false | |
| hasTwin: false | |
| generic: false | |
| isSingleton: false | |
| hasBasicDependency: true | |
| hasEnhancedDepenedncy: true | |
| hasContextParseTree: true | |
| headIndexedWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| copyCount: 0 | |
| } | |
| dependingVerb { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4294967295 | |
| } | |
| headWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 2 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 5 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 7 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 8 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 9 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 11 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 13 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| } | |
| mentionsForCoref { | |
| mentionID: 3 | |
| mentionType: "NOMINAL" | |
| number: "SINGULAR" | |
| gender: "UNKNOWN" | |
| animacy: "INANIMATE" | |
| person: "UNKNOWN" | |
| startIndex: 0 | |
| endIndex: 8 | |
| headIndex: 3 | |
| headString: "gene" | |
| nerString: "O" | |
| originalRef: 4294967295 | |
| goldCorefClusterID: -1 | |
| corefClusterID: 3 | |
| mentionNum: 0 | |
| sentNum: 0 | |
| utter: 0 | |
| paragraph: 1 | |
| isSubject: false | |
| isDirectObject: false | |
| isIndirectObject: false | |
| isPrepositionObject: false | |
| hasTwin: false | |
| generic: false | |
| isSingleton: false | |
| hasBasicDependency: true | |
| hasEnhancedDepenedncy: true | |
| hasContextParseTree: true | |
| headIndexedWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| copyCount: 0 | |
| } | |
| dependingVerb { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4294967295 | |
| } | |
| headWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 2 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 5 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 7 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 8 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 9 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 11 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 13 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 2 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 5 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 7 | |
| } | |
| } | |
| mentionsForCoref { | |
| mentionID: 4 | |
| mentionType: "NOMINAL" | |
| number: "SINGULAR" | |
| gender: "UNKNOWN" | |
| animacy: "UNKNOWN" | |
| person: "UNKNOWN" | |
| startIndex: 6 | |
| endIndex: 7 | |
| headIndex: 6 | |
| headString: "brca1" | |
| nerString: "O" | |
| originalRef: 4294967295 | |
| goldCorefClusterID: -1 | |
| corefClusterID: 4 | |
| mentionNum: 3 | |
| sentNum: 0 | |
| utter: 0 | |
| paragraph: 1 | |
| isSubject: false | |
| isDirectObject: false | |
| isIndirectObject: false | |
| isPrepositionObject: false | |
| hasTwin: false | |
| generic: false | |
| isSingleton: false | |
| hasBasicDependency: true | |
| hasEnhancedDepenedncy: true | |
| hasContextParseTree: true | |
| headIndexedWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| copyCount: 0 | |
| } | |
| dependingVerb { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4294967295 | |
| } | |
| headWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 2 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 5 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 7 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 8 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 9 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 11 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 13 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| appositions: 3 | |
| } | |
| mentionsForCoref { | |
| mentionID: 5 | |
| mentionType: "NOMINAL" | |
| number: "SINGULAR" | |
| gender: "NEUTRAL" | |
| animacy: "INANIMATE" | |
| person: "UNKNOWN" | |
| startIndex: 9 | |
| endIndex: 13 | |
| headIndex: 12 | |
| headString: "protein" | |
| nerString: "O" | |
| originalRef: 4294967295 | |
| goldCorefClusterID: -1 | |
| corefClusterID: 5 | |
| mentionNum: 4 | |
| sentNum: 0 | |
| utter: 0 | |
| paragraph: 1 | |
| isSubject: false | |
| isDirectObject: false | |
| isIndirectObject: false | |
| isPrepositionObject: false | |
| hasTwin: false | |
| generic: false | |
| isSingleton: false | |
| hasBasicDependency: true | |
| hasEnhancedDepenedncy: true | |
| hasContextParseTree: true | |
| headIndexedWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| copyCount: 0 | |
| } | |
| dependingVerb { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4294967295 | |
| } | |
| headWord { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 0 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 1 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 2 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 3 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 4 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 5 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 6 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 7 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 8 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 9 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 11 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| sentenceWords { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 13 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 9 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 10 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 11 | |
| } | |
| originalSpan { | |
| sentenceNum: 4294967295 | |
| tokenIndex: 12 | |
| } | |
| predicateNominatives: 3 | |
| } | |
| hasCorefMentionsAnnotation: true | |
| hasEntityMentionsAnnotation: true | |
| >>> ## ALL OF THAT (ABOVE) WAS FOR ONE SENTENCE! :-O | |
| >>> ## SAME OUTPUT: | |
| >>> print(ann) | |
| text: "Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein." | |
| sentence { | |
| token { | |
| word: "Breast" | |
| pos: "NN" | |
| value: "Breast" | |
| before: "" | |
| after: " " | |
| [ ... snip ... ] | |
| >>> ## **MUCH** MORE COMPACT: | |
| >>> client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G') | |
| >>> ann = client.annotate(text) | |
| Starting server with command: java -Xmx16G -cp /mnt/Vancouver/apps/CoreNLP/stanford-corenlp-full/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-163b9ecb6a9947a8.props -preload tokenize, ssplit, pos, lemma, ner, parse, depparse, coref | |
| >>> print(ann) | |
| Sentence #1 (14 tokens): | |
| Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein. | |
| Tokens: | |
| [Text=Breast CharacterOffsetBegin=0 CharacterOffsetEnd=6 PartOfSpeech=NN Lemma=breast NamedEntityTag=CAUSE_OF_DEATH] | |
| [Text=cancer CharacterOffsetBegin=7 CharacterOffsetEnd=13 PartOfSpeech=NN Lemma=cancer NamedEntityTag=CAUSE_OF_DEATH] | |
| [Text=susceptibility CharacterOffsetBegin=14 CharacterOffsetEnd=28 PartOfSpeech=NN Lemma=susceptibility NamedEntityTag=O] | |
| [Text=gene CharacterOffsetBegin=29 CharacterOffsetEnd=33 PartOfSpeech=NN Lemma=gene NamedEntityTag=O] | |
| [Text=1 CharacterOffsetBegin=34 CharacterOffsetEnd=35 PartOfSpeech=CD Lemma=1 NamedEntityTag=NUMBER NormalizedNamedEntityTag=1.0] | |
| [Text=-LRB- CharacterOffsetBegin=36 CharacterOffsetEnd=37 PartOfSpeech=-LRB- Lemma=-lrb- NamedEntityTag=O] | |
| [Text=BRCA1 CharacterOffsetBegin=37 CharacterOffsetEnd=42 PartOfSpeech=NN Lemma=brca1 NamedEntityTag=O] | |
| [Text=-RRB- CharacterOffsetBegin=42 CharacterOffsetEnd=43 PartOfSpeech=-RRB- Lemma=-rrb- NamedEntityTag=O] | |
| [Text=is CharacterOffsetBegin=44 CharacterOffsetEnd=46 PartOfSpeech=VBZ Lemma=be NamedEntityTag=O] | |
| [Text=a CharacterOffsetBegin=47 CharacterOffsetEnd=48 PartOfSpeech=DT Lemma=a NamedEntityTag=O] | |
| [Text=tumor CharacterOffsetBegin=49 CharacterOffsetEnd=54 PartOfSpeech=NN Lemma=tumor NamedEntityTag=CAUSE_OF_DEATH] | |
| [Text=suppressor CharacterOffsetBegin=55 CharacterOffsetEnd=65 PartOfSpeech=NN Lemma=suppressor NamedEntityTag=O] | |
| [Text=protein CharacterOffsetBegin=66 CharacterOffsetEnd=73 PartOfSpeech=NN Lemma=protein NamedEntityTag=O] | |
| [Text=. CharacterOffsetBegin=73 CharacterOffsetEnd=74 PartOfSpeech=. Lemma=. NamedEntityTag=O] | |
| Constituency parse: | |
| (ROOT | |
| (S | |
| (NP | |
| (NP | |
| (NP (NN Breast) (NN cancer) (NN susceptibility)) | |
| (NP (NN gene) (CD 1))) | |
| (PRN (-LRB- -LRB-) | |
| (NP (NN BRCA1)) | |
| (-RRB- -RRB-))) | |
| (VP (VBZ is) | |
| (NP (DT a) (NN tumor) (NN suppressor) (NN protein))) | |
| (. .))) | |
| Dependency Parse (enhanced plus plus dependencies): | |
| root(ROOT-0, protein-13) | |
| compound(gene-4, Breast-1) | |
| compound(gene-4, cancer-2) | |
| compound(gene-4, susceptibility-3) | |
| nsubj(protein-13, gene-4) | |
| nummod(gene-4, 1-5) | |
| punct(BRCA1-7, -LRB--6) | |
| appos(gene-4, BRCA1-7) | |
| punct(BRCA1-7, -RRB--8) | |
| cop(protein-13, is-9) | |
| det(protein-13, a-10) | |
| compound(protein-13, tumor-11) | |
| compound(protein-13, suppressor-12) | |
| punct(protein-13, .-14) | |
| Extracted the following NER entity mentions: | |
| Breast cancer CAUSE_OF_DEATH | |
| 1 NUMBER | |
| tumor CAUSE_OF_DEATH | |
| # ============================================================================ | |
| >>> import stanfordnlp | |
| >>> stanfordnlp.download('en') | |
| Using the default treebank "en_ewt" for language "en". | |
| Would you like to download the models for: en_ewt now? (Y/n) Y | |
| Default download directory: /home/victoria/stanfordnlp_resources | |
| Hit enter to continue or type an alternate directory. | |
| Downloading models for: en_ewt | |
| Download location: /home/victoria/stanfordnlp_resources/en_ewt_models.zip | |
| 100%|█████████████████████████████████████| 235M/235M [01:15<00:00, 3.09MB/s] | |
| Download complete. Models saved to: /home/victoria/stanfordnlp_resources/en_ewt_models.zip | |
| Extracting models file for: en_ewt | |
| Cleaning up...Done. | |
| >>> nlp = stanfordnlp.Pipeline() | |
| Use device: cpu | |
| --- | |
| Loading: tokenize | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| --- | |
| Loading: pos | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| --- | |
| Loading: lemma | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| Building an attentional Seq2Seq model... | |
| Using a Bi-LSTM encoder | |
| Using soft attention for LSTM. | |
| Finetune all embeddings. | |
| [Running seq2seq lemmatizer with edit classifier] | |
| --- | |
| Loading: depparse | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| Done loading processors! | |
| --- | |
| >>> text = 'Bananas are an excellent source of potassium.' | |
| >>> text_nlp = nlp(text) | |
| >>> text_nlp.sentences[0].print_dependencies() | |
| ('Bananas', '5', 'nsubj') | |
| ('are', '5', 'cop') | |
| ('an', '5', 'det') | |
| ('excellent', '5', 'amod') | |
| ('source', '0', 'root') | |
| ('of', '7', 'case') | |
| ('potassium', '5', 'nmod') | |
| ('.', '5', 'punct') | |
| # ============================================================================ | |
| >>> import stanfordnlp | |
| >>> from spacy_stanfordnlp import StanfordNLPLanguage | |
| >>> snlp = stanfordnlp.Pipeline(lang="en") | |
| Use device: cpu | |
| --- | |
| Loading: tokenize | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| --- | |
| Loading: pos | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| --- | |
| Loading: lemma | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| Building an attentional Seq2Seq model... | |
| Using a Bi-LSTM encoder | |
| Using soft attention for LSTM. | |
| Finetune all embeddings. | |
| [Running seq2seq lemmatizer with edit classifier] | |
| --- | |
| Loading: depparse | |
| With settings: | |
| {'model_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/victoria/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'} | |
| Done loading processors! | |
| --- | |
| >>> nlp = StanfordNLPLanguage(snlp) | |
| >>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.") | |
| >>> for token in doc: | |
| ... print(token.text, token.lemma_, token.pos_, token.dep_) | |
| ... | |
| Barack Barack PROPN nsubj:pass | |
| Obama Obama PROPN flat | |
| was be AUX aux:pass | |
| born bear VERB root | |
| in in ADP case | |
| Hawaii Hawaii PROPN obl | |
| . . PUNCT punct | |
| He he PRON nsubj:pass | |
| was be AUX aux:pass | |
| elected elect VERB root | |
| president president PROPN xcomp | |
| in in ADP case | |
| 2008 2008 NUM obl | |
| . . PUNCT punct | |
| >>> | |
| ============================================================================== | |
| ============================================================================== | |
| END OF FILE | |
| ============================================================================== | |
| ============================================================================== |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment