Created
October 29, 2017 13:14
-
-
Save darcwader/91b2699e50b76eff9aa7f0b9902bd7bb to your computer and use it in GitHub Desktop.
spam
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| print("removing punctuations: "+string.punctuation) | |
| stemmer = PorterStemmer() | |
| def tokenize(message): | |
| """ removes punctuation and tokenizes the words and stems each word. | |
| """ | |
| msg = "".join([ch for ch in message if ch not in string.punctuation]) # get rid of punctuations | |
| tokens = word_tokenize(msg) | |
| #stems = [stemmer.stem(x).lower() for x in tokens] #correct way to do | |
| stems = [x.lower() for x in tokens] #iOS does not have porterstemmer, we are going to not use stem for now | |
| return stems | |
| messages.message.head().apply(tokenize) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment