Last active
January 19, 2018 13:03
-
-
Save mindscratch/907ce717c601d4480c3443c01d6ab423 to your computer and use it in GitHub Desktop.
ner example using NER with Go (https://github.com/sbl/ner/)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This document is about Cal Ripken and Edgar Allen Poe. It's | |
being written from a house in Carroll County, specifically Union Bridge. | |
The author is Craig Wickesser. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# $ cd ~ | |
# $ source .env/bin/activate | |
# $ python | |
>>> import spacy, io | |
>>> data = io.open('go/src/github.com/sbl/ner/_example/11231.txt', mode='r', encoding='utf-8').read() | |
>>> nlp = spacy.load('en') | |
>>> doc = nlp(data) | |
>>> for ent in doc.ents: | |
... print(ent.text, ent.start_char, ent.end_char, ent.label_) | |
... | |
(u'Cal Ripken', 23, 33, u'PERSON') | |
(u'Edgar Allen Poe', 38, 53, u'PERSON') | |
(u'\n', 59, 60, u'GPE') | |
(u'Carroll County', 90, 104, u'GPE') | |
(u'Union Bridge', 119, 131, u'ORG') | |
(u'Craig Wickesser', 148, 163, u'PERSON') | |
(u'\n', 164, 165, u'GPE') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2018/01/12 18:07:14 available tags: [PERSON LOCATION ORGANIZATION MISC] | |
2018/01/12 18:07:14 {Score:1.053429733529049 Tag:0 TagString: Name:Cal Ripken Range:{Start:4 End:6}} | |
2018/01/12 18:07:14 {Score:0.9948352240424915 Tag:0 TagString: Name:Edgar Allen Poe Range:{Start:7 End:10}} | |
2018/01/12 18:07:14 {Score:1.0153387737219781 Tag:1 TagString: Name:Carroll County Range:{Start:19 End:21}} | |
2018/01/12 18:07:14 {Score:0.6087442249127257 Tag:2 TagString: Name:Union Bridge Range:{Start:23 End:25}} | |
2018/01/12 18:07:14 {Score:1.0389223589760979 Tag:0 TagString: Name:Craig Wickesser Range:{Start:29 End:31}} |
This document is about
Cal Ripken PERSON and
Edgar Allen Poe PERSON .
It 's being written from a house in
Carroll County LOCATION , specifically
Union Bridge LOCATION .
The author is
Craig Wickesser PERSON .
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment