This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A python script to turn annotated data in standoff format (brat annotation tool) to the formats expected by Stanford NER and Relation Extractor models | |
# - NER format based on: http://nlp.stanford.edu/software/crf-faq.html#a | |
# - RE format based on: http://nlp.stanford.edu/software/relationExtractor.html#training | |
# Usage: | |
# 1) Install the pycorenlp package | |
# 2) Run CoreNLP server (change CORENLP_SERVER_ADDRESS if needed) | |
# 3) Place .ann and .txt files from brat in the location specified in DATA_DIRECTORY | |
# 4) Run this script |