Skip to content

Instantly share code, notes, and snippets.

@christophergandrud
Last active March 13, 2021 19:00
Show Gist options
  • Save christophergandrud/00e7451c16439421b24a to your computer and use it in GitHub Desktop.
Save christophergandrud/00e7451c16439421b24a to your computer and use it in GitHub Desktop.
Convert the output of a topicmodels Latent Dirichlet Allocation model to JSON for use with LDAvis
#' Convert the output of a topicmodels Latent Dirichlet Allocation to JSON
#' for use with LDAvis
#'
#' @param fitted Output from a topicmodels \code{LDA} model.
#' @param corpus Corpus object used to create the document term
#' matrix for the \code{LDA} model. This should have been create with
#' the tm package's \code{Corpus} function.
#' @param doc_term The document term matrix used in the \code{LDA}
#' model. This should have been created with the tm package's
#' \code{DocumentTermMatrix} function.
#'
#' @seealso \link{LDAvis}.
#' @export
topicmodels_json_ldavis <- function(fitted, corpus, doc_term){
# Required packages
library(topicmodels)
library(dplyr)
library(stringi)
library(tm)
library(LDAvis)
# Find required quantities
phi <- posterior(fitted)$terms %>% as.matrix
theta <- posterior(fitted)$topics %>% as.matrix
vocab <- colnames(phi)
doc_length <- vector()
for (i in 1:length(corpus)) {
temp <- paste(corpus[[i]]$content, collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
temp_frequency <- inspect(doc_term)
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
# Convert to json
json_lda <- LDAvis::createJSON(phi = phi, theta = theta,
vocab = vocab,
doc.length = doc_length,
term.frequency = freq_matrix$Freq)
return(json_lda)
}
@scoavoux
Copy link

great script! It seems like inspect() now only returns the first few rows and columns of the dtm; I had to replace it with as.matrix() because createJSON() threw an error otherwise.

@titaniumtroop
Copy link

I also had to use as.matrix() instead. I'm using LDA to suggest topics for new documents, so it was important for me to set the reorder.topics = F flag in the createJSON() call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment