Skip to content

Instantly share code, notes, and snippets.

@sergiolucero
Created November 9, 2017 19:00
Show Gist options
  • Save sergiolucero/9d760798606824786ab5aad271042b64 to your computer and use it in GitHub Desktop.
Save sergiolucero/9d760798606824786ab5aad271042b64 to your computer and use it in GitHub Desktop.
word cloud generator
library(tm);library(wordcloud);library(memoise)
books <<- list("A Mid Summer Night's Dream" = "summer",
"Glamorama" = "Glamorama1") # The list of valid books
getTermMatrix <- memoise(function(book) { # Using "memoise" to automatically cache the results
if (!(book %in% books)) stop("Unknown book")
text <- readLines(sprintf("./%s.txt.gz", book), encoding="UTF-8")
myCorpus = Corpus(VectorSource(text))
myCorpus = tm_map(myCorpus, content_transformer(tolower))
myCorpus = tm_map(myCorpus, removePunctuation)
myCorpus = tm_map(myCorpus, removeNumbers)
myCorpus = tm_map(myCorpus, removeWords,
c(stopwords("SMART"), "thy", "thou", "thee", "the", "and", "but"))
myDTM = TermDocumentMatrix(myCorpus, control = list(minWordLength = 1))
m = as.matrix(myDTM)
sort(rowSums(m), decreasing = TRUE)
})
@sergiolucero
Copy link
Author

image

@sergiolucero
Copy link
Author

hookup to videorecorder->watson text recognition->THIS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment