Skip to content

Instantly share code, notes, and snippets.

@darcwader
Created October 29, 2017 13:14
Show Gist options
  • Select an option

  • Save darcwader/91b2699e50b76eff9aa7f0b9902bd7bb to your computer and use it in GitHub Desktop.

Select an option

Save darcwader/91b2699e50b76eff9aa7f0b9902bd7bb to your computer and use it in GitHub Desktop.
spam
print("removing punctuations: "+string.punctuation)
stemmer = PorterStemmer()
def tokenize(message):
""" removes punctuation and tokenizes the words and stems each word.
"""
msg = "".join([ch for ch in message if ch not in string.punctuation]) # get rid of punctuations
tokens = word_tokenize(msg)
#stems = [stemmer.stem(x).lower() for x in tokens] #correct way to do
stems = [x.lower() for x in tokens] #iOS does not have porterstemmer, we are going to not use stem for now
return stems
messages.message.head().apply(tokenize)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment