Skip to content

Instantly share code, notes, and snippets.

@trengrj
Created November 11, 2015 12:06
Show Gist options
  • Save trengrj/0ca89e8ef8396e454dbb to your computer and use it in GitHub Desktop.
Save trengrj/0ca89e8ef8396e454dbb to your computer and use it in GitHub Desktop.
library(tm)
docs = Corpus(VectorSource(places$title))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, tolower)
docs <- tm_map(docs, removeWords, c("bad","words"))
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, stripWhitespace)
docs <- tm_map(docs, PlainTextDocument)
dtm <- DocumentTermMatrix(docs)
freq <- sort(colSums(as.matrix(dtm)), decreasing=TRUE)
head(freq, 14)
wf <- data.frame(word=names(freq), freq=freq)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment