Skip to content

Instantly share code, notes, and snippets.

@beauvais
Created April 8, 2013 10:37
Show Gist options
  • Save beauvais/5335853 to your computer and use it in GitHub Desktop.
Save beauvais/5335853 to your computer and use it in GitHub Desktop.
from nltk import *
from nltk.corpus import stopwords
filename = 'tweets.txt'
def txt_to_nltk(filename):
raw = open(filename, 'rU').read()
tokens = word_tokenize(raw)
words = [w.lower() for w in tokens]
vocab = sorted(set(words))
cleaner_tokens = wordpunct_tokenize(raw)
filtered_words = [w for w in cleaner_tokens if not w in stopwords.words('english')]
tweets = Text(tokens)
mining_tweets = Text(filtered_words)
txt_to_nltk(filename)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment