Created
July 13, 2020 15:37
-
-
Save lisanka93/be8157dc1ecb499d90c7b1050d2c16b8 to your computer and use it in GitHub Desktop.
removing stopwords with NLTK
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from nltk.corpus import stopwords | |
from nltk.tokenize import word_tokenize | |
example_sent = "This is a sample sentence, showing off the stop words filtration." | |
stop_words = set(stopwords.words('english')) | |
word_tokens = word_tokenize(example_sent) | |
filtered_sentence = [w for w in word_tokens if not w in stop_words] | |
print(filtered_sentence) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment