Last active
May 10, 2016 16:59
-
-
Save amn41/77b5684bfb64b52700bc to your computer and use it in GitHub Desktop.
attempt at using epic NER on plaintext file
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import epic.models.{NerSelector, ParserSelector} | |
import epic.parser.ParserAnnotator | |
import epic.preprocess | |
import epic.preprocess.{TreebankTokenizer, MLSentenceSegmenter} | |
import epic.sequences.{SemiCRF, Segmenter} | |
import epic.slab.{EntityMention, Token, Sentence} | |
import epic.trees.{AnnotatedLabel, Tree} | |
import epic.util.SafeLogging | |
val text = io.Source.fromFile("data/email.txt").mkString | |
val sentenceSplitter = MLSentenceSegmenter.bundled().get | |
val tokenizer = new epic.preprocess.TreebankTokenizer() | |
val tagger = epic.models.NerSelector.loadNer("en").get | |
val sentences: IndexedSeq[IndexedSeq[String]] = sentenceSplitter(text).map(tokenizer).toIndexedSeq | |
for(sentence <- sentences) { | |
val segments = tagger.bestSequence(sentence) | |
println(segments.render) | |
} | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
i am student in Paris, i am about using EPIC, but i have a problem in line commands like this
pic/parser/models/fr/span/model.ser.gz french-onesent.txt
Could not tag Vector(les, parisiens, ne, sourient, pas, tout, le, temps, par, contre, !!), because epic.preprocess.MLSentenceSegmenter cannot be cast to epic.parser.Parser... epic.parser.ParseText$.annotate(ParseText.scala:11);epic.util.ProcessTextMain$$anonfun$main$1$$anonfun$2.apply(ProcessTextMain.scala:78)
my commands is ๐
java -Xmx4g -cp epic-assembly-0.4-SNAPSHOT.jar epic.parser.ParseText --model /epic/parser/models/fr/span/model.ser.gz french-onesent.txt
when a file french-onesent.txt content is ๐
les, parisiens, ne, sourient, pas, tout, le, temps, par, contre, !!
thanks