Skip to content

Instantly share code, notes, and snippets.

@mjbommar
Created February 16, 2011 14:57
Show Gist options
  • Save mjbommar/829504 to your computer and use it in GitHub Desktop.
Save mjbommar/829504 to your computer and use it in GitHub Desktop.
Comparison of NLTK and tm.
#@author Michael J Bommarito II
#@date Feb 16, 2011
library(tm)
# Load the tweets
tweets <- unique(read.table('data/tweets_25bahman.csv', sep="\t", quote="", comment.char="", header=FALSE, nrows=100000, stringsAsFactors=FALSE))
names(tweets) <- c("id", "date", "user", "text")
# Build the corpus and then apply the tm pre-processing methods
corpus <- Corpus(VectorSource(tweets$text))
corpus <- tm_map(tm_map(tm_map(corpus, stripWhitespace), tolower), stemDocument)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment