Skip to content

Instantly share code, notes, and snippets.

@eileen-code4fun
Created January 17, 2022 20:49
Show Gist options
  • Save eileen-code4fun/5a4964d08dfd55714b064b6403fbb40d to your computer and use it in GitHub Desktop.
Save eileen-code4fun/5a4964d08dfd55714b064b6403fbb40d to your computer and use it in GitHub Desktop.
from nltk.corpus import stopwords
def lower_remove_punctuation_lemmatize_remove_stopwords(txt):
txt = lower_remove_punctuation_lemmatize(txt)
stop_words = set(stopwords.words('english'))
words = [w for w in txt.split() if w not in stop_words]
return ' '.join(words)
train = preprocess(train_dataset, lower_remove_punctuation_lemmatize_remove_stopwords)
test = preprocess(test_dataset, lower_remove_punctuation_lemmatize_remove_stopwords)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment