Last active
February 1, 2023 09:32
-
-
Save ululh/c3edda2497b8ff9d4f70e63b0c9bd78c to your computer and use it in GitHub Desktop.
LDA (Latent Dirichlet Allocation) predicting with python scikit-learn
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# derived from http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf_lda.html | |
# explanations are located there : https://www.linkedin.com/pulse/dissociating-training-predicting-latent-dirichlet-lucien-tardres | |
from sklearn.feature_extraction.text import CountVectorizer | |
from sklearn.decomposition import LatentDirichletAllocation | |
import pickle | |
# create a blank model | |
lda = LatentDirichletAllocation() | |
# load parameters from file | |
with open ('outfile', 'rb') as fd: | |
(features,lda.components_,lda.exp_dirichlet_component_,lda.doc_topic_prior_) = pickle.load(fd) | |
# the dataset to predict on (first two samples were also in the training set so one can compare) | |
data_samples = ["I like to eat broccoli and bananas.", | |
"I ate a banana and spinach smoothie for breakfast.", | |
"kittens and dogs are boring" | |
] | |
# Vectorize the training set using the model features as vocabulary | |
tf_vectorizer = CountVectorizer(vocabulary=features) | |
tf = tf_vectorizer.fit_transform(data_samples) | |
# transform method returns a matrix with one line per document, columns being topics weight | |
predict = lda.transform(tf) | |
print(predict) |
I came across this gist while looking for inputs on my own LDiA work, so if it seems odd your comments on this 4 years after you wrote them from a random person, my apologies!
For what it's worth, I found your reply quite helpful in my project!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I came across this gist while looking for inputs on my own LDiA work, so if it seems odd your comments on this 4 years after you wrote them from a random person, my apologies!