Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Amice13/a71478fc2d6deb472c3db0e407b1b859 to your computer and use it in GitHub Desktop.
Save Amice13/a71478fc2d6deb472c3db0e407b1b859 to your computer and use it in GitHub Desktop.
Convert the output of a topicmodels Latent Dirichlet Allocation model to JSON for use with LDAvis
#' Convert the output of a topicmodels Latent Dirichlet Allocation to JSON
#' for use with LDAvis
#'
#' @param fitted Output from a topicmodels \code{LDA} model.
#' @param corpus Corpus object used to create the document term
#' matrix for the \code{LDA} model. This should have been create with
#' the tm package's \code{Corpus} function.
#' @param doc_term The document term matrix used in the \code{LDA}
#' model. This should have been created with the tm package's
#' \code{DocumentTermMatrix} function.
#'
#' @seealso \link{LDAvis}.
#' @export
topicmodels_json_ldavis <- function(fitted, corpus, doc_term){
# Required packages
library(topicmodels)
library(dplyr)
library(stringi)
library(tm)
library(LDAvis)
# Find required quantities
phi <- posterior(fitted)$terms %>% as.matrix
theta <- posterior(fitted)$topics %>% as.matrix
vocab <- colnames(phi)
doc_length <- vector()
for (i in 1:length(corpus)) {
temp <- paste(corpus[[i]]$content, collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
temp_frequency <- inspect(doc_term)
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
# Convert to json
json_lda <- LDAvis::createJSON(phi = phi, theta = theta,
vocab = vocab,
doc.length = doc_length,
term.frequency = freq_matrix$Freq)
return(json_lda)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment