-
-
Save ululh/c3edda2497b8ff9d4f70e63b0c9bd78c to your computer and use it in GitHub Desktop.
# derived from http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf_lda.html | |
# explanations are located there : https://www.linkedin.com/pulse/dissociating-training-predicting-latent-dirichlet-lucien-tardres | |
from sklearn.feature_extraction.text import CountVectorizer | |
from sklearn.decomposition import LatentDirichletAllocation | |
import pickle | |
# create a blank model | |
lda = LatentDirichletAllocation() | |
# load parameters from file | |
with open ('outfile', 'rb') as fd: | |
(features,lda.components_,lda.exp_dirichlet_component_,lda.doc_topic_prior_) = pickle.load(fd) | |
# the dataset to predict on (first two samples were also in the training set so one can compare) | |
data_samples = ["I like to eat broccoli and bananas.", | |
"I ate a banana and spinach smoothie for breakfast.", | |
"kittens and dogs are boring" | |
] | |
# Vectorize the training set using the model features as vocabulary | |
tf_vectorizer = CountVectorizer(vocabulary=features) | |
tf = tf_vectorizer.fit_transform(data_samples) | |
# transform method returns a matrix with one line per document, columns being topics weight | |
predict = lda.transform(tf) | |
print(predict) |
I see now that you have another module where you are fitting but this module is supposed to be for predicting alone.
Instead of using line 8 at all, you can load the pickled model object as it seems you're trying to do on line 12 by assigning the attributes of the fitted model to an empty class of lda. Instead, you can simply use the file you reference, fd
instead of assigning those attributes. You don't need to import the ldia package, only the fitted model.
here's an example:
import pickle
with open('model', 'rb') as model:
ldia_model = pickle.load(model)
# assuming you pickled the vectorizer
with open('vectorizer', 'rb') as vectorizer:
count_vec = pickle.load(vectorizer)
data_samples = [...]
vec_data = count_vec.transform(data_samples)
predict = ldia_model.transform(vec_data)
I came across this gist while looking for inputs on my own LDiA work, so if it seems odd your comments on this 4 years after you wrote them from a random person, my apologies!
I came across this gist while looking for inputs on my own LDiA work, so if it seems odd your comments on this 4 years after you wrote them from a random person, my apologies!
For what it's worth, I found your reply quite helpful in my project!
Something that stands out is that on line 9, you aren't setting a
random state
. That is likely the top reason with each successive run of the model you are getting different outcomes when you are fitting the model. Below are some exampolesYou have:
Try:
With the random state set, you will have deterministic outcomes of the probabilistic models from scikit-learn