Skip to content

Instantly share code, notes, and snippets.

@ahmedbesbes
Created August 21, 2022 17:55
Show Gist options
  • Save ahmedbesbes/67cd50d0a17e7384e0e03c2c5074a798 to your computer and use it in GitHub Desktop.
Save ahmedbesbes/67cd50d0a17e7384e0e03c2c5074a798 to your computer and use it in GitHub Desktop.
import spacy
# load a spacy model that detects DNA, RNA and PROTEINS from
# biomedical documents
model = spacy.load(
"en_ner_jnlpba_md",
disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"],
)
# build a list of patterns and inject them into the entity ruler.
# these patterns contain entities that are not initially captured
# by the model.
# knowledge bases or ontologies could be used to construct the patterns
patterns = build_patterns_from_knowledge_base()
print(patterns[:3])
# [{'label': 'PROTEIN', 'pattern': 'tetraspanin-5'},
# {'label': 'PROTEIN', 'pattern': 'estradiol 17-beta-dehydrogenase akr1b15'},
# {'label': 'PROTEIN', 'pattern': 'moz, ybf2/sas3, sas2 and tip60 protein 4'}]
# define an entity ruler
entity_ruler = model.add_pipe("entity_ruler", after="ner")
# add the patterns to the entity ruler
entity_ruler.add_patterns(patterns)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment