Skip to content

Instantly share code, notes, and snippets.

@ululh
Last active February 1, 2023 09:32
Show Gist options
  • Save ululh/c3edda2497b8ff9d4f70e63b0c9bd78c to your computer and use it in GitHub Desktop.
Save ululh/c3edda2497b8ff9d4f70e63b0c9bd78c to your computer and use it in GitHub Desktop.
LDA (Latent Dirichlet Allocation) predicting with python scikit-learn
# derived from http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf_lda.html
# explanations are located there : https://www.linkedin.com/pulse/dissociating-training-predicting-latent-dirichlet-lucien-tardres
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import pickle
# create a blank model
lda = LatentDirichletAllocation()
# load parameters from file
with open ('outfile', 'rb') as fd:
(features,lda.components_,lda.exp_dirichlet_component_,lda.doc_topic_prior_) = pickle.load(fd)
# the dataset to predict on (first two samples were also in the training set so one can compare)
data_samples = ["I like to eat broccoli and bananas.",
"I ate a banana and spinach smoothie for breakfast.",
"kittens and dogs are boring"
]
# Vectorize the training set using the model features as vocabulary
tf_vectorizer = CountVectorizer(vocabulary=features)
tf = tf_vectorizer.fit_transform(data_samples)
# transform method returns a matrix with one line per document, columns being topics weight
predict = lda.transform(tf)
print(predict)
@nikbpetrov
Copy link

I came across this gist while looking for inputs on my own LDiA work, so if it seems odd your comments on this 4 years after you wrote them from a random person, my apologies!

For what it's worth, I found your reply quite helpful in my project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment