-
-
Save h0rn3t/a05753ee9061f7cbab93d5e61f3837a3 to your computer and use it in GitHub Desktop.
list of annotators offered by Spark NLP
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Annotator | Description | Version | Annotator Approach | Annotator Model | |
---|---|---|---|---|---|
Tokenizer* | Identifies tokens with tokenization open standards | Opensource | - | + | |
Normalizer* | Removes all dirty characters from text | Opensource | - | + | |
Stemmer* | Returns hard'-stems out of words with the objective of retrieving the meaningful part of the word | Opensource | + | - | |
Lemmatizer* | Retrieves lemmas out of words with the objective of returning a base dictionary word | Opensource | - | + | |
RegexMatcher* | Uses a reference file to match a set of regular expressions and put them inside a provided key. | Opensource | + | + | |
TextMatcher* | Annotator to match entire phrases (by token) provided in a file against a Document | Opensource | + | + | |
Chunker* | Matches a pattern of part'-of'-speech tags in order to return meaningful phrases from document | Opensource | + | - | |
DateMatcher* | Reads from different forms of date and time expressions and converts them to a provided date format | Opensource | + | - | |
SentenceDetector* | Finds sentence bounds in raw text. Applies rules from Pragmatic Segmenter | Opensource | + | - | |
DeepSentenceDetector* | Finds sentence bounds in raw text. Applies a Named Entity Recognition DL model | Opensource | + | - | |
POSTagger | Sets a Part'-Of'-Speech tag to each word within a sentence. | Opensource | + | + | |
ViveknSentimentDetector | Scores a sentence for a sentiment | Opensource | + | + | |
SentimentDetector* | Scores a sentence for a sentiment | Opensource | + | + | |
WordEmbeddings* | Word Embeddings lookup annotator that maps tokens to vectors | Opensource | + | + | |
BertEmbeddings* | Bert Embeddings that maps tokens to vectors in a bidirectional way | Opensource | + | - | |
NerCrf | Named Entity recognition annotator allows for a generic model to be trained by utilizing a CRF machine learning algorithm | Opensource | + | + | |
NerDL | This Named Entity recognition annotator allows to train generic NER model based on Neural Networks by utilizing Char CNNs '- BiLSTM '- CRF architecture that achieves state'-of'-the'-art in most datasets. | Opensource | + | + | |
NorvigSweeting | This annotator retrieves tokens and makes corrections automatically if not found in an English dictionary | Opensource | + | + | |
SymmetricDelete | This spell checker is inspired on Symmetric Delete algorithm | Opensource | + | + | |
ContextSpellChecker | Utilizes tensorflow to do context based spell checking | Opensource | + | + | |
DependencyParser | Unlabeled parser that finds a grammatical relation between two words in a sentence | Opensource | + | + | |
TypedDependencyParser | Labeled parser that finds a grammatical relation between two words in a sentence | Opensource | + | + | |
AssertionLogReg | It will classify each clinically relevant named entity into its assertion type: “present”, “absent”, “hypothetical”, etc. | Licensed | + | + | |
AssertionDL | It will classify each clinically relevant named entity into its assertion type: “present”, “absent”, “hypothetical”, etc. | Licensed | + | + | |
EntityResolver | Assigns a ICD10 (International Classification of Diseases version 10) code to chunks identified as “PROBLEMS” by the NER Clinical Model | Licensed | + | + | |
DeIdentification | Identifies potential pieces of content with personal information about patients and remove them by replacing with semantic tags. | Licensed | + | + |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment