Last active
September 7, 2023 18:10
-
-
Save fsndzomga/85aa989ff3deef450d00220ac33e257f to your computer and use it in GitHub Desktop.
stemming and lemmatisation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# using nltk | |
from nltk.stem import PorterStemmer | |
stemmer = PorterStemmer() | |
print(stemmer.stem("running")) # Output: 'run' | |
print(stemmer.stem("flies")) # Output: 'fli' | |
import spacy | |
nlp = spacy.load("en_core_web_sm") | |
doc = nlp("flies running ran") | |
lemmas = [token.lemma_ for token in doc] | |
print(lemmas) | |
# Output: ['fly', 'run', 'run'] | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment