Skip to content

Instantly share code, notes, and snippets.

@andreaschandra
Created June 27, 2019 14:05
Show Gist options
  • Save andreaschandra/fe5b71e84c6d2847d6cb08fefdc7dc53 to your computer and use it in GitHub Desktop.
Save andreaschandra/fe5b71e84c6d2847d6cb08fefdc7dc53 to your computer and use it in GitHub Desktop.
def cleansing(text):
word_list = word_tokenize(text)
word_list = [word for word in word_list if len(word) > 2 and word.isalnum()]
word_list = [word for word in word_list if string.punctuation not in word]
text = ' '.join(word_list)
return text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment