Skip to content

Instantly share code, notes, and snippets.

@prrao87
Created May 24, 2020 06:03
Show Gist options
  • Save prrao87/723947fd817387edc6f98ac18bff075f to your computer and use it in GitHub Desktop.
Save prrao87/723947fd817387edc6f98ac18bff075f to your computer and use it in GitHub Desktop.
def lemmatize_pipe(doc):
lemma_list = [str(tok.lemma_).lower() for tok in doc
if tok.is_alpha and tok.text.lower() not in stopwords]
return lemma_list
def preprocess_pipe(texts):
preproc_pipe = []
for doc in nlp.pipe(texts, batch_size=20):
preproc_pipe.append(lemmatize_pipe(doc))
return preproc_pipe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment