Last active
February 27, 2023 22:19
-
-
Save scign/2dda76c292ef76943e0cd9ff8d5a174a to your computer and use it in GitHub Desktop.
Guided LDA using gensim
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Guided LDA using gensim" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<em>Aleem Juma</em> \n", | |
"<em>March 6, 2019</em>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The gensim package for python is a well-known library of text processing routines. One of the language model frameworks that are included in the package is a Latent Dirichlet Allocation (LDA) topic modeling framework. LDA can be used as an unsupervised learning method in which topics are identified based on word co-occurrence probabilities; however with the implementation of LDA included in the gensim package we can also seed terms with topic probabilities. This turns a fully-unsupervized training method into a semi-supervized training method. Semi-supervised because we are not tagging all terms or documents with topic probabilities, just a few, but it turns out that's enough to push the model in a certain direction." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this writeup I will show how to build an LDA model in gensim with seed words, and plot the resulting topic probability distribution that has been assigned to words. I will then train further models with seed probabilities and explore how this leads the model to different topic probability distributions." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Importing libraries" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We will of course need `gensim`, and we will be working with matrix manipulation so we will need `numpy`. `nltk` provides support functions for language processing and we will be visualizing results with `matplotlib`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import warnings\n", | |
"warnings.filterwarnings(action='ignore', category=UserWarning)\n", | |
"import gensim\n", | |
"import numpy as np\n", | |
"import nltk\n", | |
"from nltk.corpus import stopwords\n", | |
"from nltk.stem import WordNetLemmatizer\n", | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There are a couple of additional subpackages that `nltk` requires to use the POS tagging feature and the WordNet model. We have to make sure those are downloaded." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# nltk.download('averaged_perceptron_tagger')\n", | |
"# nltk.download('wordnet')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's set up a simple corpus of text strings to work with. The first two seem to relate to food, the next two to animals, and the last one a bit of both." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"5" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"txt = [\n", | |
" 'I like to eat broccoli and bananas.',\n", | |
" 'I munched a banana and spinach smoothie for breakfast.',\n", | |
" 'Chinchillas and kittens are cute.',\n", | |
" 'My sister adopted a kitten yesterday.',\n", | |
" 'Look at this cute hamster munching on a piece of broccoli.'\n", | |
"]\n", | |
"len(txt)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The first step in most language processing analytics is preprocessing the text strings. We will be running the following on the text: \n", | |
"* `gensim.simple_preprocess` - does a bunch of pre-processing steps such as tokenizing, removing punctuation and converting to lower case \n", | |
"* lemmatization - turning a word into its base form, e.g. 'shaving' -> 'shave' (note this is not the same thing as stemming, which would turn 'shaving' to 'shav')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In order to perform the lemmatization, we use the `WordNet` language model provided through the `nltk` package. The `WordNetLemmatizer.lemmatize()` function takes a part-of-speech tag to tell it whether we're passing a noun, verb, adjective or adverb. This makes a difference for example 'shaving' could either be used as a verb, however it could also be used as a noun, as in \"wood shaving\". If we were to pass 'shaving' and indicate that it's a noun the lemmatization function would need to leave it alone since it's already in its base form. As a verb, however that would need to translate to 'shave'. \n", | |
"To identify what part-of-speech any particular word is, is not easy, but `nltk` again comes to the rescue providing access to a part-of-speech tagger which returns a suitable tag for each word in a given text." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The twist is that the `nltk.pos_tag` function returns the Penn Treebank tag for the word but we just want whether the word is a noun, verb, adjective or adverb. We need a short simplification routine to translate from the Penn tag to a simpler tag." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# simplify Penn tags to n (NOUN), v (VERB), a (ADJECTIVE) or r (ADVERB)\n", | |
"def simplify(penn_tag):\n", | |
" pre = penn_tag[0]\n", | |
" if (pre == 'J'):\n", | |
" return 'a'\n", | |
" elif (pre == 'R'):\n", | |
" return 'r'\n", | |
" elif (pre == 'V'):\n", | |
" return 'v'\n", | |
" else:\n", | |
" return 'n'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we're ready to perform some simple preprocessing on the text." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def preprocess(text):\n", | |
" stop_words = stopwords.words('english')\n", | |
" toks = gensim.utils.simple_preprocess(str(text), deacc=True)\n", | |
" wn = WordNetLemmatizer()\n", | |
" return [wn.lemmatize(tok, simplify(pos)) for tok, pos in nltk.pos_tag(toks) if tok not in stop_words]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[['like', 'eat', 'broccoli', 'banana'],\n", | |
" ['munch', 'banana', 'spinach', 'smoothie', 'breakfast'],\n", | |
" ['chinchilla', 'kitten', 'cute'],\n", | |
" ['sister', 'adopt', 'kitten', 'yesterday'],\n", | |
" ['look', 'cute', 'hamster', 'munch', 'piece', 'broccoli']]" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"corp = [preprocess(line) for line in txt]\n", | |
"corp" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As you can see words have been translated to their base form and converted to lower case. Another step that we performed is omitting 'stop words' like 'this', 'on', 'are', which don't contribute much to the topic probability." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"17" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dictionary = gensim.corpora.Dictionary(corp)\n", | |
"len(dictionary)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We have a vocabulary of 17 words after removing stop words." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The LDA algorithm implementation in `gensim` reads the strings in a 'bag of words' format. This structure lists each distinct word in the sentence once, along with the number of times it occurs in the sentence. The gensim dictionary conveniently provides the `doc2bow` function that converts a line into its respective 'bow' format." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[[(0, 1), (1, 1), (2, 1), (3, 1)],\n", | |
" [(0, 1), (4, 1), (5, 1), (6, 1), (7, 1)],\n", | |
" [(8, 1), (9, 1), (10, 1)],\n", | |
" [(10, 1), (11, 1), (12, 1), (13, 1)],\n", | |
" [(1, 1), (5, 1), (9, 1), (14, 1), (15, 1), (16, 1)]]" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"bow = [dictionary.doc2bow(line) for line in corp]\n", | |
"bow" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The model parameter that tracks how words are allocated to terms is called `eta` in the gensim implementation. When not provided, or provided as the keyword `'auto'`, gensim presupposes an even distribution across terms and topics. The question we need to ask is, if we provide a non-uniform matrix as the eta parameter, does that affect the topic distribution assigned to terms and documents?" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To find out, we'll first train a topic model on the corpus of sentences we set up, using the `'auto'` keyword. We will then train a model using a prior distribution skewed in the same direction as the auto-generated model, just for fun, and then train another model using a prior distribution to try to push the model in the opposite direction." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll set up a function that displays the probability distribution calculated by the algorithm so that we can see how the topics have been allocated across terms." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def viz_model(model, modeldict):\n", | |
" ntopics = model.num_topics\n", | |
" # top words associated with the resulting topics\n", | |
" topics = ['Topic {}: {}'.format(t,modeldict[w]) for t in range(ntopics) for w,p in model.get_topic_terms(t, topn=1)]\n", | |
" terms = [modeldict[w] for w in modeldict.keys()]\n", | |
" fig,ax=plt.subplots()\n", | |
" ax.imshow(model.get_topics()) # plot the numpy matrix\n", | |
" ax.set_xticks(modeldict.keys()) # set up the x-axis\n", | |
" ax.set_xticklabels(terms, rotation=90)\n", | |
" ax.set_yticks(np.arange(ntopics)) # set up the y-axis\n", | |
" ax.set_yticklabels(topics)\n", | |
" plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We will run the following function for each test. Train a model with our prior distribution (or `'auto'`), plot the model, print out the topic distribution and show the topic allocation for our corpus." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def test_eta(eta, dictionary, ntopics, print_topics=True, print_dist=True):\n", | |
" np.random.seed(42) # set the random seed for repeatability\n", | |
" bow = [dictionary.doc2bow(line) for line in corp] # get the bow-format lines with the set dictionary\n", | |
" with (np.errstate(divide='ignore')): # ignore divide-by-zero warnings\n", | |
" model = gensim.models.ldamodel.LdaModel(\n", | |
" corpus=bow, id2word=dictionary, num_topics=ntopics,\n", | |
" random_state=42, chunksize=100, eta=eta,\n", | |
" eval_every=-1, update_every=1,\n", | |
" passes=150, alpha='auto', per_word_topics=True)\n", | |
" # visuzlize the model term topics\n", | |
" viz_model(model, dictionary)\n", | |
" print('Perplexity: {:.2f}'.format(model.log_perplexity(bow)))\n", | |
" if print_topics:\n", | |
" # display the top terms for each topic\n", | |
" for topic in range(ntopics):\n", | |
" print('Topic {}: {}'.format(topic, [dictionary[w] for w,p in model.get_topic_terms(topic, topn=3)]))\n", | |
" if print_dist:\n", | |
" # display the topic probabilities for each document\n", | |
" for line,bag in zip(txt,bow):\n", | |
" doc_topics = ['({}, {:.1%})'.format(topic, prob) for topic,prob in model.get_document_topics(bag)]\n", | |
" print('{} {}'.format(line, doc_topics))\n", | |
" return model" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Before we try a custom prior distribution, let's see how the model does with the default setting, i.e. `'auto'`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbcAAAByCAYAAADQxZ9YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnXe4JFW1vt9vCDKEIUiSjIRBMhIkGS4iCoIEYQBBooCKCMYfKioiKmJEUJQgSlAvoICIEuQiCghDZog/0kWCICDCkGH47h9715yenj5nTu+uw6npWe/z9NOndld9tfp0da/ae6+9lmwTBEEQBP3EmNE2IAiCIAjqJpxbEARB0HeEcwuCIAj6jnBuQRAEQd8Rzi0IgiDoO8K5BUEQBH1HOLcgCIKg7wjnFgRBEPQd4dyCIAiCviOcWxAEQdB3zD7aBsyqLLzQbF5u6Tlq03uF12rTuv/+RWvTGhFqTBm3/Jsfr00L4P77FqlVr270/Eu1aU1Zodk/H7Pd++pomxCMAM+89uQTtmf4RWv21dnHLLf0HEy8aOna9P415bnatD704QNr0xoJNKU+53bGr46rTQtgt92b/b+b/dq7atN65rjFa9MaCcbt8OhomxCMABc/d+oDw9kvhiWDIAiCviOcWxAEQdB3hHMLgiAI+o5wbkEQBEHfMaRzk/RGSTflx6OSHm7ZnrObE0k6RdL4LvY/TNI9ku6UtPkw9r9C0trd2BQEQRD0J0NGS9p+ElgbQNLhwLO2v1tyItt7D3dfSWsCOwCrAksDF0oab7u+ePcgCIKgbykelpT0eUm35sdBuW1FSbdJOk3SJElnShqbX5vas5L0fkk3SLpZ0sUd5LcFfm37Zdv3Av8A1h2GWXtJ+ns+93r5XBvmthslXSlppdz+EUlnS7pI0t2SvtXy3k6QdF1+L19paX9I0uFZ6xZJKw91jiAIgmB0KHJukjYAdgM2ADYCPp57W5B6Wz+2vQbwInBA27GLA8cD29teC9ilwymWBB5s2X4ot5Gd0WCrjN9geyPgYOCk3HYHsKntdYCvA0e27L8WsCOwJrC7pCVy+6G218uvv0fSqi3HPJa1TgI+PYxzBEEQBK8zpYu43w781vbzAJLOBTYFLgbut3113u90YH/ghy3HbgRcZvsBANv/7qCvDm3O+793CLt+nff5H0mLSpoXWAA4VdIKHfb/s+3J+T3cCSwDPALsKmlf0v9nCZLDvj0f87v8fD2wVf57qHMMvClpf9L/g2WWjPXzQRAEI0XpsGQn51PRnj6ifVsd2tp5iDTXVrEUyenMiE7n/gZwke3Vge2AuVpeb81FNAWYPQ8pHgxsZntN4MJBjpnCwM3BUOcYMMY+wfZ6ttdb5I2zDePtBEEQBCWUOre/AttLGpt7R9sCf8uvLS9p/fz3rsAVbcdeCWwmaVkASQt10P89qfc0Z+4NLUvqKc2InbPmu0jDh88B8wMP59f3GobGOGAy8IykNwFD9RQruj1HEARBMIIUOTfbE0lDgNcCVwPH256UX74N2E/SLcA8wAltxz4GfAw4T9LNwBkd9G8GziXNZf0R+HgVKTmDObdnJF0FHAvsl9u+DXxH0pXDfHs3kIYgbwVOJDnjGdHtOYIgCIIRZNgTP7YPb9s+Gji6w65TbO/f4fhNW/6+ALhgBuc7AjiiQ3vHnlSrflv7FcDKLU2H5faT2vZ7X8vmhwfRWqrl76uBzYc6RxAEQTA6RIaSIAiCoO+oNWTP9j3kRd9BEARBMFpEzy0IgiDoO8K5BUEQBH1HrCQeJW57biFW+/tutekdv850QafFzHb5jbVpjQiurxL3na/MU5sWwJiG/++8/hq1aZ20ys9q0xoJPrXaATPeKZj5mDi83aLnFgRBEPQd4dyCIAiCviOcWxAEQdB3hHMLgiAI+o5wbkEQBEHf0ZNzk/RGSTflx6OSHm7ZnrNLrVMkjR/mvotK+ouk5yT9cMZHgKTTJW032HkljZF0aEv7QpI+Ovx3EARBEDSFnpYC2H6SnJFE0uHAs7a/W6i1dxe7Pw98CVgHWLHkfO3nlTQ7cChwVH5pIeCjwE970Q+CIAhef0ZsWFLS5yXdmh8H5bYVJd0m6TRJkySdKWlsfu0KSZWjfL+kGyTdLOnidm3bz9q+klTpu8S2b0k6OffWqvMeBcyXe52n5u3xefuofNyhkiZKukXSV1re061Z7zZJf5LUsZ5bEARB8PowIs5N0gbAbsAGpMrbH5e0Zn55VeDHttcgOacD2o5dHDge2N72WsAuXZ77lMpJDvL690k12z5SldHJHApMtr227T3y9l15+1BJW5Eqdb+N1FvdWNLG+djxwA9trwa8QCpYGgRBEIwSI9VzezvwW9vP255Mqs1WlaS5P5eLATi9pb1iI+Ay2w8A2P53Nye2vbftmwZ5+WvAWNsH2l2nudgC2BK4kVTzbUUGytzc01LP7npguU4CkvaXdJ2k61595vkuTx8EQRAMl5FKv6UhXmt3Ku3b6tBWFxOB9SQtaPupLo8VcKTtk6dplFYEXmppmsIg/1fbJ5CLt45dcYmReo9BEASzPCPVc/srsL2ksZLmBbYF/pZfW17S+vnvXYEr2o69EthM0rKQohZrtOsC4HvAH7JdU7H9aj5f5ZgmA/O17HIRsK+kefJ+S0lauEbbgiAIgpoYEedmeyLwa+Ba4Grg+JZhu9uA/STdAsxD7sm0HPsY8DHgPEk3Ax0zAkt6iFQJfF9JD1XLCGY052b7N8Avsn574MfJwC2STs12XJcDX46y/UfgbOBqSZOAM4F5CYIgCBpHbcOStg9v2z6a5HzamWJ7/w7Hb9ry9wWkXtZQ51tqkPaOSwps797y94nAiXmz9byfAT7Tsr1zm8b3ge93kF+7ZZ+jOrweBEEQvI5EhpIgCIKg73hd67nZvoeWXk4QBEEQjATRcwuCIAj6jnBuQRAEQd8Rzi0IgiDoO9R9oo6gDiQ9DjwwjF0XBp6o6bR1as1qek22rW69JttWt16Tbatbr8m2daO3rO1FZrRTOLeGI+k62+s1TWtW02uybXXrNdm2uvWabFvdek22bST0YlgyCIIg6DvCuQVBEAR9Rzi35nPCjHcZFa1ZTa/JttWt12Tb6tZrsm116zXZttr1Ys4tCIIg6Dui5xYEQRD0HeHcgiAIgr4jnFvQN0h6w3DagiDof8K5NRRJ75f0eUlfqR5dHr9Kfn5rp0cPdl06nLYC3Xl61QD+Psy2YSNpY0kfkrRH9ehBS5J2rz5LSctI2qAX+5qMpGUlbZ7/Hitpvhkd83ogaafhtA1TazZJn+rdqql6W0uq5Xe5btuy5nId2taffs+uNEfkOgnn1kAk/RTYGTgIELATsGyXMlVduu91eHy3wKa5clX0hSUtKGmh/FgOWKJbvRbdjSXdDtyRt9eS9JMuNRaXtC4wVtI6LU78XcDcPdh2Gul/tSmwfn70ssj0J8BGpAr0kKq9/7jQtk0kXSLp/0u6T9L9ku4rNUzSSpLOlnR71ruvR739SMV9f5ablgLO7UFvB0l3S3pa0jOSJkt6plDuC8NsmyG2pwDbFtrRiV2AuyUdLektvQiNgG0Av5O0ZLUh6Z3Az0vF6r5OptGOaMnmIekW22u2PM8L/M72FqNo08HAISRH9jDJ6QI8A5xo+7hC3WuAHYHf214nt91qe/UuNPYE9iI5nmtbbJsM/ML27wptuwNY1TV9SSTdYPutkm5sea83216rQOtO4FPA9cCUqt32k4W2XQF8FfgBsA2wN+n34auFejcBGwDXtLzXSbbXKNS7B9jG9h0lx2eNLYGtgAnAf7e8NI70ORf1oiV9A5g/az5Xtdu+oVBvHOkGaG/AwCnAr21PboBt65Nu0rYB3gp8k/S5PFioV+t10srrWs8tGDYv5OfnJS0BPAks342ApB2Ger3bH3zbxwDHSDrI9rHdHDsM7QcltTZNGWzfQY7/JfBLSR+0/dsaTbsVWBz4Z016r0iajfSDhaRFgNcKtZ62/aea7AIYa/tSSbL9AHC4pL+RHF4JL9l+ufpcJc1Oft+FPNaLY8s8AlwHfIB0U1AxmXSjUMrG+fmIljYDm5WI2X5G0m+BsaQbyu2Bz0n6UcF3r27brpX0SeBi4EXgPbYfL9HK1H2dTCWcWzP5g6QFgO8AN5A+7JO61NhmiNcMFPVmbB8raXVgVWCulvZTS/SAByVtDFjSnMAnyUOUBSyV73onAyeS7iwPtX1xNyKSzif9j+YDbpc0EXipet32Bwrt+xFwDrBovqPeETisS9uq+dLLJH2H9Dm22lZ0Rw68mOd67pb0CVLvfNFCLYDLJX2RNFT8HuDjwPk96F0n6b9JQ1at73fY17Htm4GbJf2K1LtfhfQ532X75VLDbP9X6bHtSNoG2AdYATgN2MD2vyTNTfpedOXc6rKt5TtRMTfwNHCypF6+E3VfJ1OJYcmGoxTtN5ftp0fbFgBJXwXeRXJufwS2BK6wvWOh3sLAMcDmpB+ci4GDS4bXqiE+Se8FDgS+DJxiu6sAmjyPMCi2L+/WthbtVYB3k97rpd32RiRdNrRpLrojz8NNdwALAF8nDWUdbfvqQr0xwL7AFqT3epHtE0u0st4pHZpte58Cra1Iczz3ZtuWBw4o7QlLWow0PLeE7S0lrQpsZPvkAq1TgZNs/7XDa++23VXwVl22jdR3otN1Qnr/PTumcG4NJfdmlqOld13SO5I0P2lo6R256XLgiFJnKWkSsBZwY3Yki5EuxqF6ikPpLWT7321ty9u+v0CrmqM8BviL7XNa57cK9JYH/mn7xbw9FljM9v92qTMuDzUt1On19vffD0g6OA9lD9k2GuT5yq1t35O3VwAusL1Kod6fSPNiX8rfidlJ34+e5416ZSRsy9/5KkJyou1/9aA1D/BiDn4hD9u/wfbzpZoVES3ZQFRvlN7PScN0E/LjGdLFXsoLtl8DXs1DgP8C3tyD3vlZB4AcIVY6LHG9pItJQQMXKYUUl85pAZzVdvyU3NYtv6rsI835tD93jaRv5qHrantBSUcW6Jwv6feDPUpsy+zZoW2vUjFJK0u6VNKteXtNSV0N6bbwr8qxZe4jXcelLGz7TPK1YvtVupw3rpC0oaRrJT0r6WVJU1QeFVqrbdm+CcBEUgT3BOAaSUWjNplLSXOLFWOBP/egN5WYc2sm61FflN4Ktj/Ysv21HKFUynX5R/UE0o/zs8A1Peh9k+Tg3g+MB04FdivU2hdYG7jP9vOS3kiKOCtl9ta5mDzxPWe3Ira3zs9dBQXNgC1tf7HlHE/l4bZuf/C7XhYyFJJ2BT4ELN/mHOcjBUaVciLwOXLIuO1b8txZ1w4duE3SH4EzSfNIOwHXKgdhFUTXPpevtSpQaEPSfFQJx5GWA5xF+h3YA1ixUKtu2wC+BKxf9dZyUNSfSeH8Jcxl+9lqw/azeX6xZ8K5NZM6o/RekLSp7SsgrY9iIBqzhE+QfrwWA94DLEOKmirC9gWS5iDNtc0HbGf77kKt1yTdD6wsaa4ZHjBjHpf0Adu/B5C0LT1WHlZaI7Qs0w43Tze/Mgxmk/QG2y9l3bFA19lYepk/HISrSNftwqQ1lRWTgVt60J3b9kRNG1X7aqHWXMBjQDWP9DiwECkIqyTY6tPA74EVJF0JLEJymEXYvkfSbHmo7hRJV5Vq1W0bMKZtGPJJehsBfE7SW6tAKKX1qr38Pk0lnFszWZj6ovQ+RgqTnz9vP0UPw0OkRcevAZvZPkLS0yTH1FWWAknHMm301TjS8NBBOfrqk90aJukjwMGkhaA3ARuSMpQUBVkAHwXOkHQcabL7QdKddBGSvk1anH87A0NDBkqc2+nApTnQwqQIu18W2DSJIUKvba/ZjV5eRvAAabF6nTyR58aqHsiOFN782e6lN9+J20iOcjzpOrmL8h/85/PowE2Sjia9x16y99RpG8CFki4Cfp23dyYFlpVyCHCWpEfy9puyZs9EQEkDGSwyqccovXFZo5fx+9oWIistvB6UvHatW9smkZzs1bbXVopM/Jrtnr4sSovo5YJFtG06dwFrVr2tXlFalFxFXl5s+6ICjSEz32RnVWLbZAac5pzAHMBztscNftSQem8mDYVvTLpBux/YrcQ+SUuRQuo3yTZeQYrQfajQthvaI3I7tQ1Ta1nS/N8cpLV38wM/aZsjHBXbWo7fgRQPIOCvts8p1cp6czDgfO+0/UovehXRc2sgdQ4VSfomKaT7P3l7QeAztksn42tZiFzivIbBi7ZflEQesrtT0vhSMaVlGB8kR61WQ2K2jxjisKG4j/SjVYtzy6HrPS3kLnVew9CdJj+gpO1ImSh6kPTmObpujO3JStGsJZxCCvKphud2z23v6UZE0uLAkuS0bwxkxhlHYdq3ls/jBeBrJRojZVsLVwKvkH4DJvYilOfXPg0sa3s/pTRw423/oUcbo+fWRPKk77HAW0h3vbNReNerDqHwvdy5SdqNNGzwVtIw2I7AYba7iiKUdKbtCYMNi3U7HJY1zyEFkBxCGop8CpjD9lbdamW9C0mT7+0prr436EGddaoh2CVJyyguZdrh5pIh2FquEUlX2N60racF6cfQpT2tQc51te0NC4/t1AO53va6BVo32V57Rm3D0Kkt7Vvdw8N12tamO4GUXOIvWfPtwOdsFwWUKC3Mvx7Yw/bqee74791+Fp2Inlsz6RQxtVKhVi2BBxW2z5B0PQPDYdu5LC3Swfl561JbKpTXxdnePjcdrrTYeX7gwh6kl7L9vl7tYyDc/3rS5H4rpXeXtUTV2d40P9easV/Tpn8bQ7Kx6/eah5ZXA+Zv0xxHS4acLnlC0u4MzBvtSkEkp+tN+1Z9Dw7Mz6fl592Artd81WxbK3VHS65ge+ccZYvtF9QWNVRKOLeGUmPEVC2BB2223Qnc2aPGP/NzHcNiZwPrSrrU9ruzbh1Du1dJWsP2pF5EqiFYDbKwuQfdOqPqqgW0izFtJOc/CuVaF/W/CvwvKadjt4wn/fAv0KY5Gdiv0LZ9SDcHPyB9J67KbaX0nPat+h5I2sT2Ji0vHZqjHEuHwmtJSddC3dGSL+cb7mqaYwVqGrYP59ZMaouYsn20pFsYSG/19ZLAg7rpMAw29SW6Hw4bo5QWbGVJn25/0fb3C83cFNhLaXnBSy22dT1kmtmTlGqslb06tA2HWqPqJB1EymTzGANzqAZK3+sYUpBG61zv9+jSidg+DzhP0jval0woLWvpmuywS3MhdmIf28copX1blDQ0fgopirhb5tG0S3c2prdoyTptg/qjJb9KGl1ZWtIZpCCfvXrQm0o4t2byYdKPwydIEVNLkwIbuiZPuv/F9oV5e6yk5dxlCqm6qXkYbBdgO9L1XKfulnWIaPCFzeMoX9hc2zWSORgY78KSOR1Ys3JsMHWReVEatMwPSb2OVo7t0DYomn75yTSUzH1W0vl5K1Iu05t7GFrbh9QLn59k69P01qus0zZsf07SB0lOSMAJvURL2r5E0g2kZTsi3RD1tJa0IpxbA2kZqnuRHiKmMmcxUPYCBlJI9VQ9t0nYvgv4tlIdqGnuIjVIPsfhSvdm2VRqX9hc8zUCaQ1fncm5x0ha0PZTMPVz6Pr3RtJGpOt3kbZe+ThSEE03VHOfm5ASf1c13XZi2hI43VKlfVse+IIK074pJRFe0SkH5DhSwF+vn0kttrWS5/B6mseTtEqOZq5uTqo1i8tIWsbl1S0GzhHRks0jD7cczvSZLLrO4ThIZFhRgcymI+kCYFun/HlVOPQFJRF1+fgqgk2k4IXlSeVRVuvBxlqSztZ1jbQ4jNVI81sXMG0kZ9GQrqQ9SNWtzyb9DycA37B92pAHTq/zTlIVio8CP215aTJwvguy2eRgoy2q9VTKGXJcWB4mO6Uq7dt/lNJdLWm76xsXSX+1/Y4Z7/n62lbzNAKSTrC9f/4sOkXpliZemEr03JrJyXSoslxI7SmkGsy5wNl52GRpUmTiZ0vF3JY5Pd9lHlCqJ2knUi7Hv5C+xMdKKg2jrusaqYZx/5Efc+ZHT9g+VdJ1pCUZAnawfXuBzuWkml+/qCn4CFI1+fmAqhrDvLmtK6reB8l5ALy5hkC/SyR9lukrZxdVjnBKSfcq8A6ligAVXTm3uqNpbe+f/9yKVMNtU5KT+xtwfB3niJ5bA5F0je231aS1AnAGaY0VpOGnD9u+tw79piHpQOB9pIXXB9juKYKwg34vawRvJlUuniaMuqQXXec10mQk/dD2IZq+WCZQlpJO0t6kXm9VG++dwOHuMrFAh96HWm0s6X3k4KV2XDJqk/V+TgoKuo2WQCEX1MEbCSSdSapUckZu2hVYwPaEnrXDuTUPSUeR5hPqqrJcWwqpJtI2FyNSsMUk4EboaWitVXcMsC6wkO33FupNau0N5iGjm9t7iMPUqvUakXQJsFNbdONvSt9rXUha1/b1uTdzbdvL42wXlUeStATpOrmDlLHjkfZozC60JgAXOtXs+zIpyOXrdcwb9Yqk222vOtp2DEanKZK6pk1iWLKZVHfkrTXcTEECYLUVK5XUU7HShtI+ZHLOIO0lutXd36ukOnO9TKTXGUZd2zWSWaRDdOOihVq1YbsK9PgQqZr3JJgagXoIBbX/VH+C7cNsnylpU1IKr++Rhta67llrIB3VMrlXuBIpirU0HdXfJa1aMiT8OnGjpA2dK75LehspvVfPRM+tz5H0W1IJnWrI5cPAWrZ3GPyoAEDS+sAXmbYiei/r3GpPOlsXSllnts9rwKoEvueUDsHWjVLi5LNJGTs2JWVk2brkJk01J9hWTnEn6VvAJNu/UmEFeNWcjkrSO0g3AI9Sz1rNWpF0BymQqUoWsAypN/0aPdoZPbeGolS8czVaUgy5LGFv3cVKG0uew/o80//fSu/ITycFpNxKj+HTLfSUdFbS7rZPV4fF6tDTgvUvAVfknj2knv7+Q+z/umL7Pkm7kIKGHiRFO5bW/ao1wTbwsKSfkRIlfFsp4XZp1o6601H9nIFh+rqu4TqpI71dR8K5NRBJPyXNA/wXcBIpOXFp9u26i5U2mTNIUWZbk0LH9yQVoizl8dI5nU5o+qSzJdGSVbaKuqPXLszRoNVi2k/VtZi2FzR9QuGFSHON1yjV/Su5s39IqZr8uaToxKeAR2ZwzFBMIP1IfzeH27+JVDW8hLrTUf2jipRuIjVGwE5HDEs2EEm32F6z5Xle4He2tyjQWgs4lZREGFKm/D1L1uA0HeUs8dX/Lbddbrtjfbxh6L2bFL3VnsW/NKN6bdGSI4HqqxJeGxqhenMt+u8kJ9i2/XIvWnUgaQtSL3pVUoqsTYC9bV825IGD6/2ElJfzfGq4hmcmoufWTKqe1fM5qutJ0gLirsjReOM9kPGg52KlDacqcvjPPKz7CClooJS9gVVINdha8y2W/jDUlnQ2z0EdQ+ppmRQQ8Snb9xXqVVXCpwkZp6xKeG2M5J191q+tdmId2L44z3/WlY5qLMmptd4Y93INzzSEc2smf8jDJkczkBbopG5F8gLOTwBn9rlTqzgyR4d+hpR3cBxpoXMpa5WE6Q9Bp2jJ0mKjvwJ+DFRlfnbJuqVr37Yj3QjVkpE9KEMDlS0u6NDWNbb3rs24mYwYlmwgecz9Y6RCgFNX7dt+sUDry6SeYC0ZD2YlJJ0I/KDOMGpNm3S2OFqy0yJu9VYM9E+kdW7Plhwf9IakuUjz7JeR0o21Vs7+k+239KC7L9MHWTViEfdIEs6tgeRV+5NJ0XrQw6r9nPGgU2aHoowHTUbSyqT1RYvlMOo1gQ/YPrJQ7w5gBaCukjeV7jimndfq+kYjL+L+D/Ab0ue7M6kI7Y9LNPOSkVqqhAfdo1TX7xBSGrCHGXBuzwAn2j6uUPcsUu3FD5Fqwu0G3GG7uI7gzEI4twZS56r93Atsz9320x7CqBtLDmP/HPCzao2RpFttr16o1zGYoXQeSNIBpB+YF0jzWpWzLEmI3ZqmqfoSVz+IXWtK2rNTu7tMSRX0hqSDbB9bo161Bq8KTpuDtBi+58TETSfm3JpJnav2f0m6+/tR3t41t/Wcu62BzG17YtuyoFdLxUYgmOGzwGo1hdj/P2pM+RROrDE8Kmk+25MlHUb6XI8s/VwZCLL6j6TVSYu5l6vBzsYTzq1BtKzpmQPYQ9I/8vayQOm8z/i2Ht9lOSS9H3kirwuq1gjtyECdqCZwL/B8TVq1pHySdKbtCR3WkwGUriMLyvmy7bPy5/peUhWJolRemROU8oQeRqqSMS/w5VosbTjh3JrF1iOgOWK52xrIgcAJwCqSHibNle02uiZNwxeAqyRdQ+/zWlWZm/eThpnPk3R4gU419zIS117QPa2f6/E9fK4Vp5EqtC/HQAq+xXrQm2kI59Yg6hwGG6FeYGPJa/rWs725pHlIa8qaVgHhZ8D/UE8qpFpSPtn+Z34e0fVkwbCpM5UXwHmkCuvX01umk5mOCCjpU0Y6s0MTUc1VjOtG0lW2N65Ja25SyqdJtu/OKZ/WsH1xod4OwLeBRUmBKUUVloPeGIHPtTigamYnnFvQNzR9TZ+kbwAPMH0qpFG3T9I9wDa27xhtW2Z18nzbSrZPySna5rXdqYjpcLROAI51LhU0KxHOLegbmr6mb5Dw/UbYJ+lK25uMth2zOpK+SqrRN972yjn93lndfjYt0xKzAysB99HAkjcjScy5Bf3EqnRY0zeqFk1Lx/D90TQoD0cCXKdUS+xcZrEEuw1je2Ad4AYA249IKqkAMcsHCIVzC/qJpq/pq61ic41sk59NWqYwyyXYbRgv27akajnLPDM6oBP9OKfeLeHcgn6i6Wv66grfr40qsa6kX5Iy0P8nby9Icr7B68uZOVpyAUn7AfsAJ46yTTMl4dyCfqLpa/rqDvOukzUrxwZg+ylJ64ymQbMoLwF/Jo1AjAe+YvuS0TVp5iScWzDTMxOt6auzYnPdjJG0oO2nACQtRPw+jAaLkRbW3wD8nOToggIiWjKY6ZkV1/TVjaQ9SBlUzibdGEwAvmH7tFE1bBZEKTnqFqRiuesBZwIn2753VA2byYg7s2CmJ5xX79g+VdJ1wGakcPEd6qxjFwyfHFDyKCnJ8avAgsDZki4BgEtZAAAAc0lEQVSx/fnRtW7mIXpuQRAEDUHSJ4E9gSeAk4Bzbb+S08vdbXuFUTVwJiJ6bkEQBM1hYVKveZrRCNuvSZrl1651Q/TcgiAIgr6jKWHIQRAEQVAb4dyCIAiCviOcWxAEQdB3hHMLgiAI+o5wbkEQBEHf8X+su3Kv7JwXXwAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Perplexity: -2.98\n", | |
"Topic 0: ['banana', 'broccoli', 'munch']\n", | |
"Topic 1: ['kitten', 'cute', 'broccoli']\n", | |
"I like to eat broccoli and bananas. ['(0, 99.1%)']\n", | |
"I munched a banana and spinach smoothie for breakfast. ['(0, 99.2%)']\n", | |
"Chinchillas and kittens are cute. ['(1, 99.1%)']\n", | |
"My sister adopted a kitten yesterday. ['(1, 99.4%)']\n", | |
"Look at this cute hamster munching on a piece of broccoli. ['(1, 99.6%)']\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<gensim.models.ldamodel.LdaModel at 0x2b8c796eb70>" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"test_eta('auto',dictionary,ntopics=2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Not bad. The distribution allocated with an even start correctly identifies the first two and the next two as separate topics, but fails to identify that the last sentence contains elements of both topics." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To define a prior distribution, we need to create a numpy matrix with the same number of rows and columns as topics and terms, respectively. We then populate that matrix with our prior distribution. To do this we pre-populate all the matrix elements with 1, then with a really high number for the elements that correspond to our 'guided' term-topic distribution." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def create_eta(priors, etadict, ntopics):\n", | |
" eta = np.full(shape=(ntopics, len(etadict)), fill_value=1) # create a (ntopics, nterms) matrix and fill with 1\n", | |
" for word, topic in priors.items(): # for each word in the list of priors\n", | |
" keyindex = [index for index,term in etadict.items() if term==word] # look up the word in the dictionary\n", | |
" if (len(keyindex)>0): # if it's in the dictionary\n", | |
" eta[topic,keyindex[0]] = 1e7 # put a large number in there\n", | |
" eta = np.divide(eta, eta.sum(axis=0)) # normalize so that the probabilities sum to 1 over all topics\n", | |
" return eta" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's start with a list that uses the same topic words." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbcAAAByCAYAAADQxZ9YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnXe4JFW1t98fSYYwBEkCEkQYBCRIkDCGi4hKkCBRlOQFDCAYPwNeEMP9xIiIKCBcQQwDShIlyEUUkJyG5EcSBUQFCUMSgd/3x941p6enz5nTu+rMqelZ7/P006d3V61afbq6Vu21V5BtgiAIgmCQmGe8FQiCIAiCpgnjFgRBEAwcYdyCIAiCgSOMWxAEQTBwhHELgiAIBo4wbkEQBMHAEcYtCIIgGDjCuAVBEAQDRxi3IAiCYOAI4xYEQRAMHPONtwJzK4svOa+XW7Gd//4/PbrMeKsw21jl5X9vVF7b/3cLPP5ic8Keea45WWPBQguOtwazj3nnnnnKtGkPPmJ76Vlt186r61zAcivOx8nnrjDeavRknx8fPN4qzDZOfvd3GpXX9v/dqmc92ZgsX39bY7LGAq219nirMNt4YdGXjbcKs41LL/3M/aPZbu4x90EQBMFcQxi3IAiCYOAI4xYEQRAMHGHcgiAIgoFjROMm6eWSbsqPhyU92PF6gX4OJOkUSZP62P5wSXdLulPSVqPY/nJJ6/ejUxAEQTCYjBgtaftRYH0ASUcCT9n+WsmBbO832m0lrQvsDKwFvBK4QNIk2y+VHDsIgiCYuyh2S0r6pKRb8+OQPPZqSbdJOk3SVElTJE3I702fWUnaVtINkm6WdFEP8TsAP7H9vO17gD8DG45CrX0l/SEfe6N8rE3z2I2SrpC0eh7/T0lnSrpQ0l2S/rvjs50g6br8Wf6rY/wBSUdmWbdIWmOkYwRBEATjQ5Fxk7QJsBewCbAZ8ME824I02zrO9muB54CDuvZdDjge2Mn2esAePQ6xAvCXjtcP5DGyMRouU/ZltjcDDgVOymN3AJNtbwB8Afhix/brAbsA6wLvkbR8Hv+U7Y3y+2+VtFbHPn/Lsk4CPjqKYwRBEASzmdIk7jcAP7f9DICks4HJwEXAfbavytv9CDgQ+FbHvpsBl9q+H8D2P3vIV48x5+3fNoJeP8nb/K+kZSQtAiwOnCpptR7b/8b2tPwZ7gRWAh4C9pT0PtL/Z3mSwb497/OL/Hw9sE3+e6RjDH0o6UDS/4Nll593pE2DIAiCGpS6JXsZnwrP4rV6jHXzAGmtrWJFktGZFb2O/SXgQtvrADsCnTV5/tXx94vAfNmleCiwpe11gQuG2edFhm4ORjrGkDL2CbY3sr3R4i8P4xYEQTBWlBq33wE7SZqQZ0c7AL/P760qaeP8957A5V37XgFsKWllAElL9pB/Lmn2tECeDa1MminNit2zzDeT3IdPA4sBD+b39x2FjInANOBJSa8ARpopVvR7jCAIgmAMKTJutq8huQCvBa4Cjrc9Nb99G3CApFuAhYETuvb9G/AB4BxJNwOn95B/M3A2aS3rV8AHq0jJWay5PSnpSuBY4IA89hXgq5KuGOXHu4HkgrwVOJFkjGdFv8cIgiAIxpBRr7nZPrLr9dHA0T02fdH2gT32n9zx9/nA+bM43lHAUT3Ge86kOuV3jV8OrNExdHgeP6lru7d3vHzvMLJW7Pj7KmCrkY4RBEEQjA9RoSQIgiAYOBpteWP7bnLSdxAEQRCMFzFzC4IgCAaOMG5BEATBwBGduMeJh6YuzBGvGk1FsdnPKps+Pd4qzDaOOLzZ76Dt/7u79lq0MVn3nHdTY7LGgtV+uul4qzDbmO+ZkVKPB4xLR7dZzNyCIAiCgSOMWxAEQTBwhHELgiAIBo4wbkEQBMHAEcYtCIIgGDjGzLhJermkm/LjYUkPdrxeoE9Zp0iaNMptl5H0W0lPS/rWrPfoH0n75750QRAEQQsZs1QA24+Sq5VIOhJ4yvbXCmXt18fmzwCfBTYAXl1yvFGwP6nA8sNjJD8IgiCowbi4JSV9UtKt+XFIHnu1pNsknSZpqqQpkibk9y6XVBnKbSXdIOlmSRd1y7b9lO0rSF3AR6vPcpLOkXRLlvv6rM9NHdt8StLhknYnGe2fVbNQSRtLukzS9ZJ+LWnZmv+iIAiCoAaz3bhJ2gTYC9iE1JX7g5LWzW+vBRxn+7Uk43RQ177LAccDO9leD9ijz2OfUhnJLo4DLs7NSTcktdrpie2fATcBu9ten9R89RjgXbY3JHUf/0I/egVBEATNMh4VSt4A/Nz2MwCSzgYmAxcB9+VWMpCMxIFA57rZZsCltu8HsP3Pfg48gnvzzWRDafsFUl+44XrGdfMaYG3gN5IA5iV1Ep8JSQeSPhMLstCo9Q6CIAj6YzyM20h1YjyL1+ox1hTdcl9gxpntgnmsGwG32H7DLA9gn0Bu3jpRS47V5wiCIJjrGY81t98BO0maIGkRYAfg9/m9VSVtnP/eE7i8a98rgC0lrQwgacmGdLoUeH+WOa+kiaRgkeUlLSFpQWDbju2nAVWRvtuBFbK7lbwGt3ZDegVBEAQFzHbjZvsa4CfAtcBVwPG2p+a3bwMOkHQLsDB5ltOx79+ADwDnSLoZOL3XMSQ9QOoS/j5JD1RpBCOsuR0MvE3SVOA6YE3bzwFfznqeSzJiFacAJ+WAEwO7AN/IOt0IvL6f/0kQBEHQLLPFLWn7yK7XR5OMTzcv2j6wx/6TO/4+Hzh/FsdbcZjxnmtuth8Gtu8x/g3gGz3GpwBTOoZuIK0bBkEQBC0gKpQEQRAEA0dr+rnZvpuc9B0EQRAEdYiZWxAEQTBwhHELgiAIBo4wbkEQBMHAITtyiccDSf8A7h/FpksBjzR02CZlzW3y2qxb0/LarFvT8tqsW9Py2qxbP/JWtr30rDYK49ZyJF1ne6O2yZrb5LVZt6bltVm3puW1Wbem5bVZt7GQF27JIAiCYOAI4xYEQRAMHGHc2s8Js95kXGTNbfLarFvT8tqsW9Py2qxb0/LarFvj8mLNLQiCIBg4YuYWBEEQDBxh3IIgCIKBI4xbMDBIetloxoIgGHzCuLUUSdtK+qSk/6oefe6/Zn5+Xa9HDb0uGc1YgdyF68oA/jDKsVEjaXNJ75a0d/WoIUuS3lN9l5JWqprcDiKSVpa0Vf57gqRFZ7XP7EDSrqMZG6WseSV9pL5W0+VtJ6mR63LTumWZq/QY23jmLfuSOSbnSRi3FiLpe8DuwCGAgF2BlfsU87H8/PUej68V6LRg7ny+VO5OvmR+rAIs36+8DrmbS7oduCO/Xk/Sd/uUsZykDYEJkjboMOJvBhaqodtppP/VZGDj/KiTZPpdYDNSl3lIHd2PK9RtC0kXS/p/ku6VdJ+ke0sVk7S6pDMl3Z7l3VtT3gHAmcD389CKwNk15O0s6S5JT0h6UtI0SU8Wivv0KMdmie0XgR0K9ejFHsBdko6W9Jo6gsZAN4BfSFqheiHpTcDJpcKaPk9mkB3Rku1D0i221+14XgT4he2tx1GnQ4HDSIbsQZLRBXgSONH2dwrlXk3qZH6u7Q3y2K221+lDxj7AviTDc22HbtOA/7H9i0Ld7gDWckM/Ekk32H6dpBs7PuvNttcrkHUn8BHgeuDFatz2o4W6XQ4cAXyT1Lh3P9L14YhCeTcBmwBXd3zWqbZfWyjvbmB723eU7J9lvAPYBtgN+FnHWxNJ33PRLFrSl4DFssynq3HbNxTKm0i6AdoPMHAK8BPb01qg28akm7TtgdcBXyZ9L38plNfoedJJa/q5BTPwbH5+RtLywKPAqv0IkLTzSO/3e8G3fQxwjKRDbB/bz76jkP0XSZ1DLw637TD7/xD4oaR32f55g6rdCiwH/LUhef+WNC/pgoWkpYGXCmU9YfvXDekFMMH2JZJk+37gSEm/Jxm8Ev5l+/nqe5U0H/lzF/K3OoYt8xBwHfBO0k1BxTTSjUIpm+fnozrGDGxZIsz2k5J+Dkwg3VDuBHxC0rcLfntN63atpA8DFwHPAW+1/Y8SWZmmz5PphHFrJ7+UtDjwVeAG0pd9Up8yth/hPQNFsxnbx0paB1gLWLBj/NQSecBfJG0OWNICwIfJLsoCVsx3vdOAE0l3lp+yfVE/QiSdR/ofLQrcLuka4F/V+7bfWajft4GzgGXyHfUuwOF96latl14q6auk77FTt6I7cuC5vNZzl6SDSbPzZQplAVwm6TMkV/FbgQ8C59WQd52kn5FcVp2fd9Tnse2bgZsl/Zg0u1+T9D3/0fbzpYrZ/o/SfbuRtD2wP7AacBqwie2/S1qI9Lvoy7g1pVvHb6JiIeAJ4AeS6vwmmj5PphNuyZajFO23oO0nxlsXAElHAG8mGbdfAe8ALre9S6G8pYBjgK1IF5yLgENL3GuVi0/S24APAZ8DTrHdVwBNXkcYFtuX9atbh+w1gbeQPusl/c5GJF06smouuiPP7qY7gMWBL5BcWUfbvqpQ3jzA+4CtSZ/1QtsnlsjK8k7pMWzb+xfI2oa0xnNP1m1V4KDSmbCkZUnuueVtv0PSWsBmtn9QIOtU4CTbv+vx3lts9xW81ZRuY/Wb6HWekD5/bcMUxq2l5NnMKnTMrktmR5IWI7mW3piHLgOOKjWWkqYC6wE3ZkOyLOlkHGmmOJK8JW3/s2tsVdv3Fciq1iiPAX5r+6zO9a0CeasCf7X9XH49AVjW9p/6lDMxu5qW7PV+9+cfBCQdml3ZI46NB3m9cjvbd+fXqwHn216zUN6vSetin82/iflIv4/a60Z1GQvd8m++ipC8xvbfa8haGHguB7+Q3fYvs/1MqcyKiJZsIWo2Su9kkptut/x4knSyl/Ks7ZeAF7IL8O/Aq2rIOy/LASBHiJW6Ja6XdBEpaOBCpZDi0jUtgDO69n8xj/XLjyv9SGs+3c99I+nL2XVdvV5C0hcL5Jwn6dzhHiW6ZfbpMbZvqTBJa0i6RNKt+fW6kvpy6Xbw98qwZe4lncelLGV7Cvlcsf0Cfa4bV0jaVNK1kp6S9LykF1UeFdqoblm/3YBrSBHcuwFXSyry2mQuIa0tVkwAflND3nRiza2dbERzUXqr2X5Xx+vP5wilUq7LF9UTSBfnp4Cra8j7MsnAbQtMAk4F9iqU9T5gfeBe289Iejkp4qyU+TrXYvLC9wL9CrG9XX7uKyhoFrzD9mc6jvFYdrf1e8HvOy1kJCTtCbwbWLXLOC5KCowq5UTgE+SQcdu35LWzvg06cJukXwFTSOtIuwLXKgdhFUTXPp3PtSpQaFPSelQJ3yGlA5xBug7sDby6UFbTugF8Fti4mq3loKjfkML5S1jQ9lPVC9tP5fXF2oRxaydNRuk9K2my7csh5UcxFI1ZwsGki9eywFuBlUhRU0XYPl/S/KS1tkWBHW3fVSjrJUn3AWtIWnCWO8yaf0h6p+1zASTtQM3Ow0o5Qiszo7t5pvWVUTCvpJfZ/leWOwHouxpLnfXDYbiSdN4uRcqprJgG3FJD7kK2r9GMUbUvFMpaEPgbUK0j/QNYkhSEVRJs9VHgXGA1SVcAS5MMZhG275Y0b3bVnSLpylJZTesGzNPlhnyUeh7ApyW9rgqEUspXrXN9mk4Yt3ayFM1F6X2AFCa/WH79GDXcQ6Sk45eALW0fJekJkmHqq0qBpGOZMfpqIsk9dEiOvvpwv4pJ+k/gUFIi6E3ApqQKJUVBFsD7gdMlfYe02P0X0p10EZK+QkrOv50h15CBEuP2I+CSHGhhUoTdDwt0msoIode21+1HXk4juJ+UrN4kj+S1sWoGsguFN3+268zme3EbyVBOIp0nf6T8gv9M9g7cJOlo0mesU72nSd0ALpB0IfCT/Hp3UmBZKYcBZ0h6KL9+RZZZmwgoaSHDRSbVjNKbmGXU8d83loislHg9LDl3rV/dppKM7FW211eKTPy87Vo/FqUkerkgibZLzh+BdavZVl2UkpKryMuLbF9YIGPEyjfZWJXoNo0ho7kAMD/wtO2Jw+81orxXkVzhm5Nu0O4D9irRT9KKpJD6LbKOl5MidB8o1O2G7ojcXmOjlLUyaf1vflLu3WLAd7vWCMdFt479dybFAwj4ne2zSmVlefMzZHzvtP3vOvIqYubWQpp0FUn6Mimk+/H8egngY7ZLF+MbSUQuMV6j4Dnbz0kiu+zulDSpVJhSGsa7yFGrlUvM9lEj7DYS95IuWo0Ytxy6XiuRu9R4jULuDPUBJe1IqkRRQ6S3ytF189iephTNWsIppCCfyj33njz21n6ESFoOWIFc9o2hyjgTKSz71vF9PAt8vkTGWOnWwRXAv0nXgGvqCMrrax8FVrZ9gFIZuEm2f1lTx5i5tZG86Hss8BrSXe+8FN71qkcofJ07N0l7kdwGryO5wXYBDrfdVxShpCm2dxvOLdavOyzLPIsUQHIYyRX5GDC/7W36lZXlXUBafO8ucfX1YXfqLadywa5ASqO4hBndzSUu2EbOEUmX257cNdOCdDF06UxrmGNdZXvTwn17zUCut71hgaybbK8/q7FRyGms7FvT7uEmdeuSuxupuMRvs8w3AJ+wXRRQopSYfz2wt+118trxH/r9LnoRM7d20itiavVCWY0EHlTYPl3S9Qy5w3Z0WVmkQ/PzdqW6VCjnxdneKQ8dqZTsvBhwQQ3RK9p+e139GAr3v560uN9J6d1lI1F1tifn50Yr9mvG8m/zkHTs+7Nm1/LawGJdMifSUSGnTx6R9B6G1o32pCCS082Wfat+Bx/Kz6fl572AvnO+Gtatk6ajJVezvXuOssX2s+qKGioljFtLaTBiqpHAgy7d7gTurCnjr/m5CbfYmcCGki6x/ZYstwnX7pWSXmt7ah0hlQtWwyQ215DbZFRdlUC7LDNGcv65UFxnUv8LwJ9INR37ZRLpwr94l8xpwAGFuu1Pujn4Juk3cWUeK6V22bfqdyBpC9tbdLz1qRzlWOoKb6QkXQdNR0s+n2+4q2WO1WjIbR/GrZ00FjFl+2hJtzBU3uoLJYEHTdPDDTb9Lfp3h82jVBZsDUkf7X7T9jcK1ZwM7KuUXvCvDt36dplm9iGVGutk3x5jo6HRqDpJh5Aq2fyNoTVUA6WfdR5SkEbnWu/X6dOI2D4HOEfSG7tTJpTSWvomG+zSWoi92N/2MUpl35YhucZPIUUR98vCmjF1Z3PqRUs2qRs0Hy15BMm78kpJp5OCfPatIW86YdzayXtJF4eDSRFTryQFNvRNXnT/re0L8usJklZxnyWkmqZhN9gewI6k87lJue9oQoiGT2yeSHlic2PnSOZQYJILW+b0YN3KsMH0JPOiMmiZb5FmHZ0c22NsWDRz+skMlKx9VqLz8zakWqY313Ct7U+ahS9G0vUJ6s0qm9QN25+Q9C6SERJwQp1oSdsXS7qBlLYj0g1RrVzSijBuLaTDVfccNSKmMmcw1PYChkpI1eqe2yZs/xH4ilIfqBnuIjVMPcfRiq6n2XQaT2xu+ByBlMPXZHHueSQtYfsxmP499H29kbQZ6fxdumtWPpEURNMP1drnFqTC31VPt12ZsQVOv1Rl31YFPq3Csm9KRYRf7VQDciIp4K/ud9KIbp3kNbxa63iS1szRzNXNSZWzuJKklVze3WLoGBEt2T6yu+VIZq5k0XcNx2Eiw4oaZLYdSecDOzjVz6vCoc8viajL+1cRbCIFL6xKao+ydg0dGyk629Q50mEw1iatb53PjJGcRS5dSXuTulufSfof7gZ8yfZpI+44s5w3kbpQvB/4Xsdb04DzXFDNJgcbbV3lUylXyHFhe5hslKqyb48rlbtawXbfNy6Sfmf7jbPecvbq1vAyApJOsH1g/i56RemWFl6YTszc2skP6NFluZDGS0i1mLOBM7Pb5JWkyMSPlwpzV+X0fJd5UKk8SbuSajn+lvQjPlZSaRh1U+dI5cb9c34skB+1sH2qpOtIKRkCdrZ9e4Gcy0g9v/6noeAjSN3kFwWqbgyL5LG+qGYfJOMB8KoGAv0ulvRxZu6cXdQ5wqkk3QvAG5U6AlT0Zdyajqa1fWD+cxtSD7fJJCP3e+D4Jo4RM7cWIulq269vSNZqwOmkHCtI7qf32r6nCfltQ9KHgLeTEq8Psl0rgrCH/Do5gjeTOhfPEEZdMotu8hxpM5K+ZfswzdwsEygrSSdpP9Kst+qN9ybgSPdZWKDH7EOdOpbMPnLwUjcu8dpkeSeTgoJuoyNQyAV98MYCSVNInUpOz0N7Aovb3q227DBu7UPS/yWtJzTVZbmxElJtpGstRqRgi6nAjVDLtdYpdx5gQ2BJ228rlDe1czaYXUY3d88QRymr0XNE0sXArl3RjT8t/axNIWlD29fn2cy1XW9PtF3UHknS8qTz5A5SxY6HuqMx+5C1G3CBU8++z5GCXL7QxLpRXSTdbnut8dZjOHotkTS1bBJuyXZS3ZF39nAzBQWA1dWsVFKtZqUtpdtlctYw4yVyq7u/F0h95uospDcZRt3YOZJZukd04zKFshrDdhXo8W5SN++pMD0C9TAKev+p+QLbh9ueImkyqYTX10mutb5n1hoqR7VSnhWuTopiLS1H9QdJa5W4hGcTN0ra1Lnju6TXk8p71SZmbgOOpJ+TWuhULpf3AuvZ3nn4vQIASRsDn2HGjuh18twaLzrbFEpVZ3bKOWBVAd+zSl2wTaNUOPlMUsWOyaSKLNuV3KSp4QLbyiXuJP03MNX2j1XYAV4Nl6OS9EbSDcDDNJOr2SiS7iAFMlXFAlYizaZfoqaeMXNrKUrNO9emo8SQywr2Nt2stLXkNaxPMvP/rfSO/EekgJRbqRk+3UGtorOS3mP7R+qRrA61EtY/C1yeZ/aQZvoHjrD9bMX2vZL2IAUN/YUU7Vja96vRAtvAg5K+TyqU8BWlgtulVTuaLkd1MkNu+qbO4SZporxdT8K4tRBJ3yOtA/wHcBKpOHFp9e2mm5W2mdNJUWbbkULH9yE1oizlH6VrOr3QzEVnS6Ilq2oVTUevXZCjQatk2o80lUxbB81cUHhJ0lrj1Up9/0ru7B9Q6iZ/Nik68THgoVnsMxK7kS7SX8vh9q8gdQ0voelyVH+uIqXbSIMRsDMRbskWIukW2+t2PC8C/ML21gWy1gNOJRURhlQpf5+SHJy2o1wlvvq/5bHLbPfsjzcKeW8hRW91V/EvrajeWLTkWKDmuoQ3hsao31yH/DeRC2zbfr6OrCaQtDVpFr0WqUTWFsB+ti8dccfh5X2XVJfzPBo4h+ckYubWTqqZ1TM5qutRUgJxX+RovEkeqnhQu1lpy6maHP41u3UfIgUNlLIfsCapB1tnvcXSC0NjRWfzGtQxpJmWSQERH7F9b6G8qkv4DCHjlHUJb4yxvLPP8hvrndgEti/K659NlaOaQDJqnTfGdc7hOYYwbu3kl9ltcjRDZYFO6ldITuA8GJgy4Eat4os5OvRjpLqDE0mJzqWsVxKmPwK9oiVLm43+GDgOqNr87JHllua+7Ui6EWqkIntQhoY6W5zfY6xvbO/XmHJzGOGWbCHZ5/4BUiPA6Vn7tp8rkPU50kywkYoHcxOSTgS+2WQYtWYsOlscLdkriVv1moH+mpTn9lTJ/kE9JC1IWme/lFRurLNz9q9tv6aG3Pcxc5BVK5K4x5Iwbi0kZ+1PI0XrQY2s/VzxoFdlh6KKB21G0hqk/KJlcxj1usA7bX+xUN4dwGpAUy1vKrkTmXFdq+8bjZzE/TjwU9L3uzupCe1xJTJzykgjXcKD/lHq63cYqQzYgwwZtyeBE21/p1DuGaTei+8m9YTbC7jDdnEfwTmFMG4tpMms/TwL7K7d9r0aYdStJYexfwL4fpVjJOlW2+sUyusZzFC6DiTpINIF5lnSulZlLEsKYneWaap+xNUFsW+ZkvbpNe4+S1IF9ZB0iO1jG5RX5eBVwWnzk5Lhaxcmbjux5tZOmsza/yHp7u/b+fWeeax27bYWspDta7rSgl4oFTYGwQwfB9ZuKMT+/9BgyacwYq3hYUmL2p4m6XDS9/rF0u+VoSCrxyWtQ0rmXqUBPVtPGLcW0ZHTMz+wt6Q/59crA6XrPpO6ZnyX5pD0QeSRnBdU5QjtwlCfqDZwD/BMQ7IaKfkkaYrt3XrkkwGU5pEF5XzO9hn5e30bqYtEUSmvzAlKdUIPJ3XJWAT4XCOatpwwbu1iuzGQOWa121rIh4ATgDUlPUhaK9trfFWagU8DV0q6mvrrWlWbm21JbuZzJB1ZIKdaexmLcy/on87v9fga32vFaaQO7aswVIJv2Rry5hjCuLWIJt1gYzQLbC05p28j21tJWpiUU9a2DgjfB/6XZkohNVLyyfZf8/OY5pMFo6bJUl4A55A6rF9PvUoncxwRUDKgjHVlhzaihrsYN42kK21v3pCshUgln6baviuXfHqt7YsK5e0MfAVYhhSYUtRhOajHGHyvxQFVczph3IKBoe05fZK+BNzPzKWQxl0/SXcD29u+Y7x1mdvJ622r2z4ll2hbxHavJqajkXUCcKxzq6C5iTBuwcDQ9py+YcL3W6GfpCtsbzHeesztSDqC1KNvku01cvm9M/r9bjqWJeYDVgfupYUtb8aSWHMLBom16JHTN64azUjP8P3xVCi7IwGuU+oldjZzWYHdlrETsAFwA4DthySVdICY6wOEwrgFg0Tbc/oa69jcINvnZ5PSFOa6Arst43nbllSlsyw8qx16MYhr6v0Sxi0YJNqe09dU+H5jVIV1Jf2QVIH+8fx6CZLxDWYvU3K05OKSDgD2B04cZ53mSMK4BYNE23P6mg7zbpJ1K8MGYPsxSRuMp0JzKf8CfkPyQEwC/sv2xeOr0pxJGLdgjmcOyulrsmNz08wjaQnbjwFIWpK4PowHy5IS628ATiYZuqCAiJYM5njmxpy+ppG0N6mCypmkG4PdgC/ZPm1cFZsLUSqOujWpWe5GwBTgB7bvGVfF5jDiziyY4wnjVR/bp0q6DtiSFC6+c5N97ILRkwNKHiYVOX4BWAI4U9LFtj85vtrNOcTMLQiCoCVI+jCwD/AIcBJwtu1/5/Jyd9lebVwVnIOImVsQBEF7WIo0a57BG2H7JUlzfe5aP8TMLQiCIBg42hKGHARBEASNEcYtCIIgGDjCuAU8RJhUAAAAGklEQVRBEAQDRxi3IAiCYOAI4xYEQRAMHP8fr+4s/WBSmCsAAAAASUVORK5CYII=\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Perplexity: -1.30\n", | |
"Topic 0: ['broccoli', 'banana', 'munch']\n", | |
"Topic 1: ['kitten', 'cute', 'sister']\n", | |
"I like to eat broccoli and bananas. ['(0, 96.3%)', '(1, 3.7%)']\n", | |
"I munched a banana and spinach smoothie for breakfast. ['(0, 97.0%)', '(1, 3.0%)']\n", | |
"Chinchillas and kittens are cute. ['(0, 4.8%)', '(1, 95.2%)']\n", | |
"My sister adopted a kitten yesterday. ['(0, 3.7%)', '(1, 96.3%)']\n", | |
"Look at this cute hamster munching on a piece of broccoli. ['(0, 40.4%)', '(1, 59.6%)']\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<gensim.models.ldamodel.LdaModel at 0x2b8c7c99710>" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"apriori_original = {\n", | |
" 'banana':0, 'broccoli':0, 'munch':0,\n", | |
" 'cute':1, 'kitten':1 # we'll leave out broccoli from this one!\n", | |
"}\n", | |
"eta = create_eta(apriori_original, dictionary, 2)\n", | |
"test_eta(eta, dictionary, 2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"When we guide the distribution to allocate a few words towards the 'foody' topic and others towards the 'animaly' topic, we actually get a more pronounced distribution in the same topic allocation direction, and we even get more probability assiged to both topics for the last sentence." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbcAAAByCAYAAADQxZ9YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnXe4XGW1/z9feigBIkVRKVKCIL1IU6+CIAoKCBFEQVDARlHEn/WK9adcvYpYMAhIE2+IVFGKXEABQ0ggIVRBEGnSpIQu8L1/vO/OmZzMOTnzzj45O5P1eZ555ux3z16z5syevfa73lVkmyAIgiDoJRYaaQWCIAiCoG7CuAVBEAQ9Rxi3IAiCoOcI4xYEQRD0HGHcgiAIgp4jjFsQBEHQc4RxC4IgCHqOMG5BEARBzxHGLQiCIOg5wrgFQRAEPcciI63AgspiWtxLsNRIq9GepUaNtAbzjmeeq1dew/93Lyxf3/3sBmMeqU3WcDDjXyuOtArzjMWfeGWkVZhnzHz6gUdtz/XLDeM2QizBUrxZ24+0Gm3xRhuNtArzDP1leq3ymv6/u2vP+ozv5H2Or03WcLDWmR8faRXmGWuc/8JIqzDP+N8rvnzPUF4XbskgCIKg5wjjFgRBEPQcYdyCIAiCniOMWxAEQdBzdG3cJL1K0rT8+Kek+1u2F+tQ1smSxnbw+q9IulPSbZJ2GMLrr5K0cZvxiyUtI2mMpI+3jL9B0t5D/wRBEARBE+g6WtL2Y8DGAJKOBp62/f1CWQcM9bWSNgT2ANYDXg9cJGms7Y5jYm3vlGWuBXwcqMLA3gDsDfymU5lBEATByDGsbklJn5d0U34cmsfWknSzpNMkzZA0QdKovG/WzErSeyRdL2m6pEvaiH8fcKbtF23/DfgHsNkQ9VpY0unZGCPpPknLAd8FxuZZ53fz9tvz9mGSFpH035ImS7pR0sfy8TtIukzS2ZJul3RqV/+4IAiCoCuGLc9N0pbAvsCWwMLAZElXAs+SZlsftT0pG4JDgB+1HPtq4OfAW2zfI2lMm7d4LXBFy/Z9eew6SRcDH7b9cJvjFgF+DVxv+3v99n0BWMt2ZWB3AD5te7e8/UngYdtbSlocmNRieDfNn+vhPL6V7Ulz/08FQRAEdTOcM7e3AL+1/aztmcC5wHZ5390tF/7TW8YrtgYut30PgO1/tZGvNmPOr99pAMMGcCLtDdtQ2BE4QNI04FpgOWDtvG+S7QdtvwxMA1afQ2HpYElTJE35NwtO0mUQBMG8ZjiNWzvjU+G5bKvNWH/uI621VbwOeGAIel0NbJ9nXp0i4JO2N86PNWxflve1WquXaTMrtj3e9ua2N1+UkrcPgiAIhsJwGrc/AbtLGiVpadIa2Z/zvjUkbZH/3ge4qt+xVwPvkLQawABuyfOBfSQtJmlNYDVg6hD0Gg/8EfiNpP4GaCawzCDbFwOfrI6TNLZaLwyCIAiaw7AZN9uTgTOB64BJwM9tz8i7bwYOknQjsBTJ4LQe+xDwCeA8SdOBM9rIn05ydd4K/J40o3oFZoX2rzSIbscAtwC/krRQy/hDwJQc6PJd4AZg4RzUchjwC+AOYJqkm0jrglGfMwiCoGHInpv3r+Y3TOH2E6ugjQWV0RrjxhZO3rrZxX/rpPbCyQ3/39VZOPnOKJzcGBawwslTbW8+t9dFhZIgCIKg55jnLjXbd5KTvoMgCIJgOIiZWxAEQdBzhHELgiAIeo6I9BshtM6iLDx+lZFWoy0PTlxqpFWYZ7zmO/V+B03/361z4qO1ydrpyGavLqyzXrvaD73JQ2951UirMO+4Ymgvi5lbEARB0HOEcQuCIAh6jjBuQRAEQc8Rxi0IgiDoOcK4BUEQBD3HoMZN0qtyo85pkv4p6f6W7cU6eSNJJ0saO8TXriTpCknPSPrR3I+A3Hx0t050CoIgCHqTQVMBbD9GriaSu1Y/bfv7JW9k+4AOXv4s8GVgE2CtkvcLgiAIFlyK3ZKSPi/ppvw4NI+tJelmSaflyvoTqpYwkq6SVBnK90i6Plfbv6S/bNtP274aeL5DtXaS9GdJf5W0c36vNfPYDZKmSnpzHt9B0mWSzpZ0e+4IXn22r0u6Ln+24yWp5TN8V9LkfMw2g71HEARBMDIUGTdJWwL7AluSumZ/UtKGefd6wE9tb0AyTof0O/bVpFYxu9veCNi7w/c+uTKSbXg98DZgV2B8bkj6IPBO25tknX/c8vpNgU9lnd8oaas8fqztLYANgGWBd7WqYHtL4CjgP/PYYO8RBEEQzGNKZ25vAX5r+1nbM0l91bbL++62PSn/fXrLeMXWwOW27wGw3VEZAdsH2J42wO4Jtl+xfTtwL7A2sDhwYu6/9huSIauYZPtB2y8D04DV8/j2kiYD00nGcv2WY87Oz1NbXj/Ye8xC0sGSpkia8uKTzw35MwdBEASdUWrcNMi+/g3i+m+rzVhdtHvvI0mGbgPSTHPxlv2tTZBeBhaRtCTwE9LMckPgJGCJNse8TN+a5WDv0aeMPd725rY3X2zZaOAdBEEwXJQatz8Bu0saJWlp4H3An/O+NSRtkf/eB7iq37FXA++QtBqApDGFOrRjLyXWIbko7yC5FR906sq6P4MbZoBRwCvAo5KWAd4/hPft9D2CIAiCYaTIuNmeDJwJXAdMAn5ue0befTNwkKQbgaWA8f2OfQj4BHCepOnAGe3eQ9J9wDHARyXdV6URzGXN7U6S4b0AONj2i6RZ2MckTQJWY/bZWrvP9hhwCnATcA5w7WCvz3T0HkEQBMHwojTZqEmYtBYw0Xazy4U3gGXHruytx3cUSzPPeHDi6iOtwjzjNXv+vVZ5Tf/fveby+roCvHzLX2uTNRwsvN46I63CPGNB6gow7fgjp9refG6viwolQRAEQc9Raz8323eSk76DIAiCYKSImVsQBEHQc4RxC4IgCHqOMG5BEARBz1FrtGQwdCQ9AtwzhJeuANQX4lavvCbrVre8JutWt7wm69Z0eU3WrW55I6XbarZXnNuLwrg1HElThhL2OhLymqxb3fKarFvd8pqsW9PlNVm3uuU1WTcIt2QQBEHQg4RxC4IgCHqOMG7NZ/zcXzJi8pqsW93ymqxb3fKarFvT5TVZt7rlNVm3WHMLgiAIeo+YuQVBEAQ9Rxi3IAiCoOcI4xb0FJLmaBTbbiwIgt4mjFtDkfQeSZ+X9J/Vo8Pj183Pm7Z7dKHXZUMZK5C7VLcyMn8Z4tiQkLSNpA9K2q96dCFLkj5UfZeSVpW0Zam8piNpNUk75L9H5ea/I46kvYYyNkRZC0v6TPdazZK3i6Rarst165Zlrt5mbIs5X9mRzGE5T8K4NRBJxwMfAA4ldfXei9QEtROOzM8/aPP4foFOS+Su6StIWl7SmPxYHVilU3ktcreRdAtwa97eSNLPCuS8WtJmwChJm7QY8v8AlizU7TTS/2o7YIv86CbJ9GfA1qQO9QAzgZ8W6ratpEsl/VXSXZLulnRXqWKS1pY0UdItWd5dXco7CJgI/CIPvQ44twt5e0i6Q9KTkp6SNFPSU4XivjjEsbli+2XgfYV6tGNv4A5Jx0h6YzeChkE3gLMlvbbakPQ24KRSYXWfJ7PJjmjJ5iHpRtsbtjwvDZxte8cR1Olw4AiSIbufZHQBngJOsP2TQrnXAnsC59veJI/dZPtNHcrZH/gIyfhc16LfTOBXts8u0O1WYD3X9CORdL3tTSXd0PJZp9veqEDWbcBngKnAy9V47iRfottVwNeAHwK7AgeQrg9fK5Q3DdgSuLbls86wvUGhvDuBXW3fWnJ8lrEz8G5gHPA/LbtGk77nolm0pG8Dy2aZz1Tjtq8vlDeadAN0AGDgZOBM2zMboNsWpJu0XYFNge+Qvpd7C+XVep60Ums/t6A2nsvPz0paBXgMWKMTAZL2GGx/pxd728cCx0o61PZxnRw7BNn3Smodenmg1w4i4xTgFEnvt/3bmlS7CXg18GBN8v4taWHSBQtJKwKvFMp60vYfatILYJTtyyTJ9j3A0ZL+TDJ4Jbxg+8Xqe5W0CPlzF/JQN4Yt8wAwBXgv6aagYibpRqGUbfLzN1rGDLyjRJjtpyT9FhhFuqHcHThK0o8Lfnt163adpMOAS4DngXfafqREVqbu82QWYdyaye8kLQf8F3A96cv+ZYcydh1kn4GOZzIAto+T9CZgPWCJlvFTS+QB90raBrCkxYDDyC7KQl6X73xnAieQ7i6/YPuSoQqQdAHpf7QMcIukycAL1X7b7y3U7cfAOcBK+Y56T+ArnQhoWS+9XNJ/kb7HVt2K7siB5/Nazx2SPk2ana9UKAvgSklfIrmJ3wl8ErigC3lTJP0PyWXV+nmHfB7bng5Ml/Rr0sx+XdL3fLvtF0sVs/320mP7I2lX4EBgTeA0YEvbD0takvS76Mi41aVby2+iYkngSeBESd38Juo+T2YRbsmGoxTpt4TtJ0daFwBJXwP+g2Tcfg/sDFxle89CeSsAxwI7kC44lwCHd+Fem257I0k7AZ8CvgqcbHvIQTR5HWFAbF9ZoluWvS6wPemzXtbpbETS5YOr5qI78uxuuhVYDvgmyZV1jO1JhfIWAj4K7Ej6rBfbPqFEVpZ3cpth2z6wQNa7SWs8f8u6rQEcUjoTlrQyyT23iu2dJa0HbG37xAJZpwK/tP2nNvu2t91R8FZdug3Xb6LdeUL6/F0bpjBuDSXPZlanZXZdMjuStCzJtfTWPHQl8I1SYylpBrARcEM2IiuTTsbBZoqDyRtj+1/9xtawfXehvGqd8ljgCtvntK5xdShrDeBB28/n7VHAyrb/3qGc0dnVNKbd/v6fvxeQdHh2ZQ86NhLk9cpdbN+Zt9cELrS9bqG8P5DWxb6cfxOLkH4fXa8bdctw6JZ/81WE5GTbD3chayng+Rz8QnbbL2772VKZFREt2UBUb5TeSSQX3bj8eIp0spfynO1XgJey++9h4A1dyLsgywEgR4h145aYKukSUuDAxUphxaXrWmf1O/blPNYpv650I6359H/uGEnfya7rant5Sd8qkHOBpPMHepToltm/zdhHSoVJWkfSZZJuytsbSurIpdvCw5Vhy9xFOo9LWcH2BPK5YvslCtaNASRtJek6SU9LelHSyyqPCq1Vt6zfOGAyKYJ7HHCtpCKvTeYy0tpixSjgj13Im0WsuTWTzakvSm9N2+9v2f56jlAqZUq+qI4nXZyfBq7tQt53SAbuPcBY4FRg3y7kfRTYGLjL9rOSXkWKOithkda1mLzwvVinQmzvkp87CgqaCzvb/lLLezye3W2dXvA7TgsZDEn7AB8E1uhnHJchBUaVcgJwFDlk3PaNee2sY4MO3Czp98AE0jrSXsB1ykFYBZG1z+TzrAoU2oq0HlXCT0jpAGeRrgP7AWsVyqpbN4AvA1tUs7UcFPVHUjh/CUvYfrrasP10Xl/smjBuzaTOKL3nJG1n+ypI+VH0RWOW8GnSxWtl4J3AqqSoqSJsXyhpUdJa2zLAbrbv6ELeK5LuBtaRtMRcDxicRyS91/b5AJLeR5edh5VyhFZjdnfzHOsrQ2BhSYvbfiHLHQV0XImlm/XDAbiGdN6uQMqprJgJ3NiF3CVtT9bsUbUvFcpaAngIqNaRHgHGkIKwSoKtPgucD6wp6WpgRZLBLML2nZIWzq66kyVdUyqrbt2Ahfq5IR+jOw/gM5I2rQKhlHJVu7k+zSKMWzNZgfqi9D5BCpFfNm8/ThfuIVLS8SvAO2x/Q9KTJMPUUZUCSccxe/TVaJJ76NAcfXVYiXKSPgYcTkoGnQZsRapQUhJo8XHgDEk/IS1230u6ky5C0vdIyfm30OcaMlBi3E4HLsuBFiZF2J1SoNMMBgm9tr1hJ/JyGsE9pGT1Onk0r41VM5A9Kbz5s106kx+Im0mGcizpPLmd8gv+s9k7ME3SMaTP2E31njp1A7hI0sXAmXn7A6TAslKOAM6S9EDefk2W2TURUNJABopM6jJKb3SW0Y3/vrZEZKWk6wHJeWsl+s0gGdpJtjdWik78uu3iH4xSEr1ckETbT87twIbVbKtblJKSq8jLS2xfXCBj0Mo32ViV6DaTPqO5GLAo8Izt0QMfNai8N5Bc4duQbtDuBvYt0U/S60gh9dtmHa8iRejeV6jb9f2jcduNDVHWaqT1v0VJuXfLAj/rt0Y4Irq1HL8HKR5AwJ9sn1MqK8tblD7je5vtf3cjryJmbg2kTleRpO+QQrqfyNvLA0faLl2MryURudR4DYHnbT8viey2u03S2BJBSmkY7ydHrVYuMdvfGOSwwbiLdNGqxbjl0PWuErlLjdcQ5M5WH1DSbqRKFF2I9A45um4h2zOVollLOJkU5FO55z6Ux97ZiRBJrwZeSy75Rl9VnNEUlnxr+T6eA75eImO4dGvhauDfpGvA5G4E5fW1zwKr2T5IqQzcWNu/61LHmLk1kbzoexzwRtJd78IU3vWqTRh8N3dukvYluQ02JbnB9gS+YrujKEJJE2yPG8gt1qk7rEXuOaQAkiNIrsjHgUVtv7tA1kWkxff+Ja5+MOBB7eVULtjXktIoLmN2d3PHLti6zhFJV9nert9MC9LF0KUzrQHea5LtrQqPbTcDmWp7swJZ02xvPLexIcipreRb3e7hOnXrJ3ccqbjEFVnmW4CjbBcFlCgl5k8F9rP9prx2/JdOv4t2xMytmbSLmFq7UFYtgQcVts+QNJU+d9huLiuLdHh+3qVUl1aUc+Ns756HjlZKeF4WuKhQ7Otsv6sG9apw/6mkxf1WSu8ua4mqs71dfq61Yr9mL/+2EEnHjj9rdiuvDyzbT+ZoWirkdMijkj5E37rRPhREcrrekm/V7+BT+fm0/Lwv0HHOV826tVJ3tOSatj+Qo2yx/Zz6RQ2VEsatodQYMVVL4EE/3W4DbutSxoP5uS632ERgM0mX2d4+y+7WvXuNpA1sz+hGSOWC1QCJzV3IrTOqrkqgXZnZIzn/USiuNan/JeDvpJqOnTKWdOFfrp/MmcBBhbodSLo5+CHpN3FNHiul65Jv1e9A0ra2t23Z9YUc5VjqCu9at37UHS35Yr7hrpY51qQmt30Yt2ZSW8SU7WMk3UhfeatvlgQe1E0bN9isXZS5wxZSKg22jqTP9t9p+78L1NwO+IhSasELLboVuUxJic39K3R8pM3YUKg1qk7SoaRKNg/Rt4ZqoPSzLkQK0mhd6/0BHRoR2+cB50l6a/+UCaW0lo7JBru0FmI7DrR9rFLJt5VIbvGTSVHEnbKUZk/d2YbuoiXr1A3qj5b8Gsmz8npJZ5CCfD7ShbxZhHFrJh8mXRw+TYqYej0psKFj8qL7FbYvytujJK3uDktI1U3dbjCSi2430jldl+yd6xCigRObR1Oe2FzbOZI5HBjrwpqebdiwMmwwK8m84xJoLfyINOto5bg2YwOiOdNPZqM0/YS+9ax3k+qYTu/CtXYgaRa+LEnXJ+luVlmnbtg+StL7SUZIwPhuoiVtXyrpelLKjkg3RF3lklaEcWsgLa665+kiYipzFn1tL6CvhFRX3XObhu3bge8p9YKa7U5SA9R0HIrY7jUDhiGxueZzBFIOX53FuReStLztx2HWd9Dx9UbS1qTzd8V+M/LRpCCaTqjWPrclFf6uerrtxewtcDqlKvm2BvBFFZZ8UyoivJZTDcjRpIC/br+TWnRrJa/hdbWOJ2ndHMlc3ZxUOYurSlrV5d0t+t4joiWbR3a3HM2clSw6ruE4QGRYUYPM+QFJFwLvc6qhV4VEX1gYVVdFsIkUvLAGqT3K+l3oV0vR2brOkRaDsT5pfetCZo/kLHHnImk/UnfriaT/4Tjg27ZPG/TAOeW8jdSF4uPA8S27ZgIXuKCaTQ402rHKp1KukOPC9jDZKFUl355QKnf1Wtsd37hI+pPtt879lfNWt7qXESSNt31w/i7aRekWdbdoJWZuzeRE2nRZLqT2ElIN51xgYnadvJ4Unfi5EkHuVzk932UeUqqYpL1ItRyvIP2Ij5NUGkZd1zlSuXD/kR+L5UdX2D5V0hRSOoaAPWzfUiDnSlLPr1/VGHy0CulzV90Yls5jHVHNPkjGA+ANNQT6XSrpc8zZObuoc4RTObqXgLcqdQSo6Mi41b2MYPvg/Oe7ST3ctiMZuT8DP6/jPWLm1kAkXWv7zTXJWhM4g5RjBcn99GHbf6tDfhOR9CngXaTk60NsdxVF2E92NzmC00mdi2cLoy6ZRdd5jjQZST+yfYTmbJYJlJWkk3QAadZb9cZ7G3C0Oyws0Gb2oVYdS2YfOXipPy7x2mR5J5GCgm6mJVDIBX3whgNJE0idSs7IQ/sAy9ke17XsMG7NQ9J3SesJdXVZrq2EVFPptx4jUsDFDOAGKHOv9ZO5ELAZMMb2ToU6zmidDWaX0fT+M8Qhyqr1HJF0KbBXv+jG35R+1rqQtJntqXk2c12/3aNtF7VHkrQK6Ry5lVSx44H+0ZgdyBoHXOTUs++rpCCXb9axbtQtkm6xvd5I6zEQ7ZZI6lo2CbdkM6nuyFt7uJmC4r/q16xUUlfNShtMf7fJOQOMdyqzuvt7idRnrpuF9DrDqGs7RzIrtoluXKlQVm3YrgI9Pkjq5j0DZkWgHkFB7z/VW1wbUoWeCZK2I5Xw+gHJtdbxzFp95ahWzbPCtUlRrKXlqP4iab0Sl/A84gZJWzl3fJf0ZlJ5r66JmVuPI+m3pBY6lcvlw8BGtvcY+KgAQNIWwJeYvSN6N3lutRedrQulqjO75xywqoDvOaUu2LpRKpw8kVSxYztSRZZdSm7SVHNxbeUSd5L+PzDD9q9V3v291nJUkt5KugH4J/XkataKpFtJgUxVsYBVSbPpV+hSz5i5NRSl5p3r01JiyGUFe+tuVtpo8jrW55nzf1dyV346KRjlJroMn26hq6Kzkj5k+3S1SVSH8uhGUlmlq/LMHtJM/+BBXj9PsX2XpL1JAUP3kqIdS/t+1VZcO3O/pF+QCiV8T6ngdmnVjrrLUZ1En4u+rnO4Tuoob9eWMG4NRNLxpHWAtwO/JBUnLq2+XXez0qZzBinSbBdS+Pj+pGaUJTxSuqbTDs1ZdLYkWrKqVlF39NpFORq0Sqb9TF3JtN2gOQsKjyGtNV6r1Pev5M7+PqVu8ueSohMfBx6YyzGDMY50kf5+Drd/DalreAl1l6P6RxUp3URqjICdg3BLNhBJN9resOV5aeBs2zsWyNoIOJVUQBhSlfz9S3Jw5geUK8VX/7s8dqXttj3y5iJre1L0Vv8q/qUV1WuLlhwOVF+X8NrQMPWba5H/NnJxbdsvdiOrDiTtSJpFr0cqkbUtcIDtywc9cGB5PyPV5byAGs7h+YmYuTWTamb1bI7qeoyUQNwRORpvrPsqHnTdrHQ+oGp0+GB27T5AChwo4QBgXVIPttZ6i6UXhtqKzuY1qGNJMy2TAiI+Y/uuQnlVl/DZQsYp6xJeG8N5Z5/l19Y7sQ5sX5LXP+sqRzWKZNRab4y7OYfnG8K4NZPfZbfJMfSVBfplp0JyAuengQkLgFGr+FaOED2SVHtwNCnZuYSNSsL0B6FdtGRps9FfAz8FqhY/e2e5pblvu5FuhGqpyB6Uob6uFhe2GesY2wfUptx8RrglG0j2uX+C1AhwVta+7ecLZH2VNBOspeLBgoSkE4Af1hlGrdmLzhZHS7ZL4lZ3zUD/QMpze7rk+KA7JC1BWme/nFRurLVz9h9sv7ELuR9lzgCrRiRxDydh3BpIztqfSYrWgy6y9nPFg3aVHYoqHjQdSeuQcoxWzqHUGwLvtf2tAlm3AmsCdbW8qeSOZvZ1rY5vNHIS9xPAb0jf7wdITWh/WiIzp4zU0iU86Bylvn5HkMqA3U+fcXsKOMH2TwrlnkXqvfhBUk+4fYFbbRf3EZxfCOPWQOrM2s+zwP61247vIoy60eRQ9qOAX1R5RpJusv2mAlltgxlK14EkHUK6wDxHWteqjGVJQezWMk3Vj7i6IHYsU9L+7cbdYUmqoDskHWr7uBrlVTl4VXDaoqRk+K4LEzedWHNrJnVm7Z9Cuvv7cd7eJ491XbutoSxpe3K/1KCXSgQNQzDD54D1awqx/3/UWPIpjFhj+KekZWzPlPQV0vf6rdLvlb4AqyckvYmUzL16DXo2njBuDaIlp2dRYD9J/8jbqwGl6z5j+834Ls8h6b3Kozk3qMoT2pO+XlEjzd+AZ2uSVUvJJ0kTbI9rk08GUJpHFpTzVdtn5e91J1IXiaJSXpnxSnVCv0LqkLE08NVaNG04YdyaxS7DIHPYarc1lE8B44F1Jd1PWi/bd2RVmsUXgWskXUv361pVm5v3kNzM50k6ukBOtfYyHOde0Dmt3+vPu/heK04jdWhfnb4SfCt3IW++IYxbg6jTDTZMs8BGk/P6Nre9g6SlSHllTeqC8Avgf6mnFFItJZ9sP5ifhzWfLBgydZbyAjiP1GF9Kt1VOpnviICSHmW4Kzs0FdXcybhOJF1je5uaZC1JKvk0w/YdueTTBrYvKZS3B/A9YCVSYEpRh+WgO4bhey0KpuoFwrgFPUWT8/okfRu4hzlLITVBtzuBXW3fOtK6LOjk9ba1bZ+cS7QtbbtdE9OhyBoPHOfcKmhBIoxb0FM0Oa9vgPD9puh2te1tR1qPBR1JXyP16Btre51cfu+sTr+blmWJRYC1gbtoYMub4STW3IJeYz3a5PWNqEZ9tA3fH0mFsjsSYIpSL7FzWcAK7DaM3YFNgOsBbD8gqaQDxAIfIBTGLeg1mpzXV1vH5hrZNT+blKawwBXYbRgv2rakKpVlqbkd0I5eXVPvhDBuQa/R5Ly+usL3a6MqrCvpFFIF+ify9vIk4xvMWybkaMnlJB0EHAicMMI6zZeEcQt6jSbn9dUd5l0nG1aGDcD245I2GUmFFlBeAP5I8j6MBf7T9qUjq9L8SRi3oCeYT/L66uzYXDcLSVre9uMAksYQ14eRYGVSYv31wEkkQxcUENGSQU+woOb11YWk/UgVVCaSbgrGAd+2fdqIKrYAolQYdUdSs9zNgQnAibb/NqKKzWfEnVnQE4Tx6g7bp0qaAryDFC6+R5197IKhkwNK/kkqcvwSsDwwUdKltj8/strNP8TMLQiCoCFIOgzYH3gU+CW7fRWcAAAAXElEQVRwru1/59Jyd9hec0QVnI+ImVsQBEFzWIE0a57NE2H7FUkLfO5aJ8TMLQiCIOg5mhKGHARBEAS1EcYtCIIg6DnCuAVBEAQ9Rxi3IAiCoOcI4xYEQRD0HP8HvfdQ5I85VVoAAAAASUVORK5CYII=\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Perplexity: -1.46\n", | |
"Topic 0: ['kitten', 'cute', 'chinchilla']\n", | |
"Topic 1: ['broccoli', 'banana', 'munch']\n", | |
"I like to eat broccoli and bananas. ['(0, 8.3%)', '(1, 91.7%)']\n", | |
"I munched a banana and spinach smoothie for breakfast. ['(0, 6.9%)', '(1, 93.1%)']\n", | |
"Chinchillas and kittens are cute. ['(0, 86.8%)', '(1, 13.2%)']\n", | |
"My sister adopted a kitten yesterday. ['(0, 89.3%)', '(1, 10.7%)']\n", | |
"Look at this cute hamster munching on a piece of broccoli. ['(0, 22.9%)', '(1, 77.1%)']\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<gensim.models.ldamodel.LdaModel at 0x2b8c7c4aef0>" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"apriori_opposite = {\n", | |
" 'cute':0, 'kitten':0,\n", | |
" 'banana':1, 'broccoli':1, 'munch':1\n", | |
"}\n", | |
"eta = create_eta(apriori_opposite, dictionary, 2)\n", | |
"test_eta(eta, dictionary, 2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Success! We've pushed the model in the opposite direction. Terms that were previously assigned to topic 0 are now topic 1, and vice-versa. However it looks like the model struggled with this a bit! The distribution is not as clear-cut. What if we push a little harder?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbQAAAByCAYAAAA78iRbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnXe4XGW59n83zYQSBCkqSDFAkC5NmngEREFQQYggCAIfYEOwHo+iYsHvk6OfIioaEA5NNKA0UaQcRAEpSSSE5kc7SBEVpIQucH9/vO9KZk9m7+x519qZ2TvP77rmmpk1s+55ZmbNPOt936fINkEQBEEw2lmk1wYEQRAEQROEQwuCIAjGBOHQgiAIgjFBOLQgCIJgTBAOLQiCIBgThEMLgiAIxgTh0IIgCIIxQTi0IAiCYEwQDi0IgiAYE4RDC4IgCMYEi/XagIWJJfQKj2OpXpvRmaXG99qCoXn62V5bMDh9/tm99NqXG9N6w/jHGtMaCWb9c8Vem7DAWGz8i702YVAWvfP5RvVm89gjtuf75YZDW4CMYynepB17bUZnNtqo1xYMzXU399qCwenzz+6JLz3TmNZ1m5zbmNZIMPFnH+q1CQuMFdd9pNcmDMqyu97VqN7lPve+4TwvphyDIAiCMUE4tCAIgmBMEA4tCIIgGBOEQwuCIAjGBI07NEmvknRTvjws6cGW+0t0qXWqpEldPP9oSXdJukPSTt1bP1/9T0oa17RuEARBUJ/GoxxtPwpsAiDpGOAp298q1DpouM+VtBGwJ7Ae8DrgEkmTbDcXswyfBE4BnmtQMwiCIGiABTrlKOmzkm7JlyPytrUk3SrpDEmzJE2VND4/drWkyjm+U9IMSTMlXdpB/t3A2bZfsH038Bdgs/nYs46k/86aMyStIWknSee3POdHkvaX9AlgJeAPki7Pj+0i6Y95359L6tMksyAIgrHPAnNokrYE9gO2BLYGPpJHVZBGVT+wvSFp9HN4276vBk4E9rC9MbBPh5dYBbi/5f4DeRuSfitppQ77nA18J2tuA/x9MPttfyc//mbbO2W9zwE72t4UuBk4coiPIAiCIBhBFuQI7c3AL2w/Y3s2cD6wXX7sXtvX5dtntmyv2Bq40vZ9ALb/2UFfHbY5P//ttgc4K0nLASvYvig/5znb3WSgbkNyxNdKuonkrNeYxyjpMEnTJE37F81mzwdBEARzWZCVQjo5nArP5746bGvnAdLaWcWqwEPz2aeT5osMdPSDBYEIuMT2B4Z8AXsKMAVggpaf33sIgiAIClmQI7TfA3tIGi9padKa1x/yY2tK2iLf3he4um3fa4AdJK0OIGn5DvoXAvtKWkLSRGB1YPpgxth+DHhE0u5Zc5ykJYH7gPWzznLADi27zQaWybevBd4i6fV5/6UkrT3/jyEIgiAYCRaYQ7N9A2nN6kbgOuBE27Pyw7cCh0q6GViKPKJp2fdvwIeBCyTNBM7qoD+TNI15O/Br4CNVhOMQa2j7AZ/Kr3s1sKLte7POLOB0YEbL86cAl0u6PNt0CPDzbNO1wDpdfixBEARBQ8ju7SyYpLWAc21v0lNDFgATtLz7tjjxVv1dYLevixP3+WcXxYnHJgtZceLptjef3/OiUkgQBEEwJuh5+xjbd5ETsYMgCIKglBihBUEQBGOCcGhBEATBmKDnU44LE6/Z8Bm+cOFNvTajI4eetXWvTRgSH7lxY1onv+m0xrSg/z+7NXZtLqDm7X2+OrD2ZrN7bcICw9ObDbxoknvPbu73CsA+wwtGihFaEARBMCYIhxYEQRCMCcKhBUEQBGOCcGhBEATBmCAcWhAEQTAmGNKhSXqVpJvy5WFJD7bcX6KbF5J0qqRJw3zuSpJ+J+lpSd/t5nUWFK3NR4MgCILeM2TYvu1HyVU8JB0DPGX7WyUvZPugLp7+DPAF4I3AWiWvFwRBECxcFE85SvqspFvy5Yi8bS1Jt0o6Q9IsSVMljc+PzRnRSHqnpBmSZkq6tF3b9lO2ryF1rx6uPWdK+oGkKyXdLWl7SadJukPST/JzFpP0eMs++0g6uWX/4yVdK+keSXu0PO/z+f3MlHRsy8vuI+kGSX+WtE13n2AQBEHQJEWJ1ZK2JLVe2RJYFLhB0lWkkdV6wCG2r5N0OnA48N2WfV8NnAi82fZ9g/Q2G+q1TwWOt90pQ3lZ22+V9F7gIlKn6zuAGZI2yLeHYiVgW2BDYCpwXu6Xtguwpe1n2+yV7S0lvQv4EvCObt5LEARB0BylI7Q3A7+w/Yzt2aT+Ydvlx+61fV2+fWbL9oqtgStt3wdg+5/dvLDtgwZxZpCcGKReZg/Zvi33RLsNWGMY8uc7cTOwSt62E3CK7Wc72PvLfD19MH1Jh0maJmnaE4++NAwTgiAIghJKHZqGeKy9wVr7fXXY1hTP5+uXW25X9xfL1622jxtkf1qeN5S91fNfYpDRru0ptje3vfmyr1p0aOuDIAiCYkod2u+BPSSNl7Q08G7gD/mxNSVtkW/vS+oE3co1wA6SVgfodsqxDnm09piktSUtAuwxv32AS4FDWtYCF5i9QRAEwfApcmi2bwDOBm4ErgNOtD0rP3wrcKikm4GlgClt+/4N+DBwgaSZwFmdXkPSA8BxJGfyQBXyn8P/64TL/ztwCXAF8MD8nmz7V/n50yTdBHyixmsHQRAEI8Swg0JsH9N2/ziSw2nnJduHddh/u5bbFwMXz+f1Vh1ke8fwf9v7t9we0DS07bGfAz8fav98f+mW28cCx7Y93vp+HibSC4IgCHpKVAoJgiAIxgSN9kNrHxkFQRAEwYIiRmhBEATBmCAcWhAEQTAmCIcWBEEQjAlkj1SOc9COpH8A9w3jqSsAjzT0sk1qLWx6/Wxb03r9bFvTev1sW9N6/WxbN3qr215xfk8Kh9aHSJpme/N+01rY9PrZtqb1+tm2pvX62bam9frZtpHQiynHIAiCYEwQDi0IgiAYE4RD60+mzP8pPdFa2PT62bam9frZtqb1+tm2pvX62bbG9WINLQiCIBgTxAgtCIIgGBOEQwuCIAjGBOHQglGNpFcMZ1sQBGOfcGh9hKR3SvqspC9Vly73Xzdfb9rpUsOuK4azrUB3qboawB+HuW3YSNpG0vslHVBdamhJ0v7VdylpNUlb1rGvn5G0uqSd8u3xkpbptU0AkvYezrZhai0qqbG+iJJ2yw2Hm9Bq1LasuUaHbVvM+8yuNEfkOAmH1idI+hHwPuAIQMDewOpdynwqX3+7w+VbBTaNyx26V5C0nKTl82UN4LXd6rXobiPpNuD2fH9jST/sUuPVkjYDxkt6Y4vj/jdgyRq2nUH6rLYDtsiXOomfPwS2JnVvB5gN/KDQtm0lXSbp/0m6R9K9ku4pNSx3bj9X0m1Z756aeocC5wI/zptWBc6vobenpDslPSHpSUmzJT1ZKPcfw9w2X2y/BLy70I5O7APcKek4SW+oIzQCtgH8UtIq1R1JbwFOKRVr+jgZoB1Rjv2BpJttb9RyvTTwS9s799CmI4GjSM7rQZKjBXgSOMn29wt1rwf2Ai60/ca87RbbG3ShcSDwQZKzubHFttnAf9n+ZaFttwPruaEfhqQZtjeV9KeW9zrT9sYFWneQOqZPB16qttt+tNC2q4EvA98BdgcOIv0nfLlQ7yZgS+D6lvc6y/aGhXp3Abvbvr1k/6yxC7ArMJmBjX0nkL7notGypGOBZbPm09V22zMK9SaQTnoOAgycCpxte3Yf2LYF6cRsd2BT4Buk7+X+Qr1Gj5NWGu2HFtTi2Xz9jKTXAo8Ca3YjIGnPoR7v9k/e9vHA8ZKOsH1CN/sOQ/t+Sa2bXhrsuYPsfxpwmqT32v5Fg6bdArwa+GtDev+StCjpTwpJKwIvF2o9Yfs3DdkFMN72FZJk+z7gGEl/IDm5Ep63/UL1vUpajPy+C/lbHWeWeQiYBryLdCJQMZt0clDKNvn6qy3bDOxQImb7SUm/AMaTTiL3AD4j6XsFv72mbbtR0seBS4HngLfZ/keJVqbp42QO4dD6h19JeiXwn8AM0hd8cpcauw/xmIGiUYvtEyRtAKwHjGvZfnqJHnC/pG0AS1oC+Dh5+rGAVfPZ7WzgJNIZ5OdsX9qNiKSLSJ/RMsBtkm4Anq8et/2uQvu+B5wHrJTPnPcCju7Stmr980pJ/0n6HlttKzrzBp7Lazd3SvoYaRS+UqEWwFWSPk+aBn4b8BHgohp60yT9nDQd1fp+h30c254JzJT0U9Iofl3S9/xn2y+UGmb7raX7tiNpd+BgYCJwBrCl7b9LWpL0u+jKoTVlW8tvomJJ4AngJ5Lq/CaaPk7mEFOOfYhSlN4420/02hYASV8G/o3k0H4N7AJcbXuvQr0VgOOBnUh/MpcCR5ZMnVXTd5LeDnwU+CJwqu2ugmDyusCg2L6qW9tatNcFdiS91yu6HXVIunJo01x05p2nkm4HXgl8jTRNdZzt6wr1FgEOAXYmvdff2j6pRCvrndphs20fXKC1K2nN5u5s25rA4aUjXkkrk6beXmt7F0nrAVvb/kmB1unAybZ/3+GxHW13FYDVlG0j9ZvodJyQ3n9tZxQOrY/Io5Y1aBk5l4yCJC1LmjbaPm+6CvhqqYOUNAvYGPhTdh4rkw7AoUaEQ+ktb/ufbdvWtH1vgVa15ng88Dvb57WuVxXorQn81fZz+f54YGXb/9OlzoQ8jbR8p8fb3/9YQNKReZp6yG29IK8/7mb7rnx/InCx7XUL9X5DWuf6Qv5NLEb6fdReB6rLSNiWf/NVZOMNtv9eQ2sp4LkcwEKekn+F7WdKNSsiyrFPULPRdaeQpuAm58uTpAO8lGdtvwy8mKf3/g68vobeRVkHgBzZVTrlMF3SpaSF/98qhf+WrlEBnNO2/0t5W7f8tLKPtIbTft01kr6Rp6Wr+8tJ+nqBzkWSLhzsUmJb5sAO2z5YKiZpHUlXSLol399IUlfTtS38vXJmmXtIx3EpK9ieSj5WbL9Il+vAFZK2knSjpKckvSDpJZVHczZqW7ZvMnADKfJ6MnC9pKLZmcwVpLXCivHA5TX05hBraP3D5jQXXTfR9ntb7n8lRxaVMi3/kU4h/SE/BVxfQ+8bJKf2TmAScDqwX6HWIcAmwD22n5H0KlKkWCmLta6t5MXrJboVsb1bvu4qsGc+7GL78y2v8VieSuv2T77rFI6hkLQv8H5gzTaHuAwpuKmUk4DPkMO7bd+c18K6duLArZJ+DUwlrQvtDdyoHEhVEBX7dD7WqmCfrUjrSyV8nxS6fw7pf+AAYK1CraZtA/gCsEU1KsuBTZeTQu9LGGf7qeqO7afyemFtwqH1D01G1z0raTvbV0PKX2JuFGUJHyP9Ya0MvA1YjRTtVITtiyUtTlo7WwZ4j+07C7VelnQvsI6kcfPdYf78Q9K7bF8IIOnd1OzQq5TDszoDp5LnWS8ZBotKeoXt57PueKDrqih11gMH4VrScbsCKeexYjZwcw3dJW3foIHRsC8Wao0D/gZU60L/AJYnBVKVBEx9ErgQmCjpGmBFkpMswvZdkhbN03CnSrq2VKtp24BF2qYYH6Xe7N7TkjatgpmU8knr/D/NIRxa/7ACzUXXfZgU0r5svv8YNaZ+SInALwM72P6qpCdIzqiragGSTmBg1NQE0tTPETlq6uPdGibpfwFHkpIzbwK2IlUKKQqUAD4EnCXp+6QF6/tJZ8xFSPomKWH+NuZO+xgocWhnAlfkYAmTIuNOK7BpFkOESdveqBu9HPJ/HymBvEkeyWtd1UhjLwpP+GzXGbV34laSc5xEOk7+TPmf/DN5FuAmSceR3mOdKjpN2gZwiaTfAmfn++8jBYeVchRwjqSH8v3XZM3aRFBInzBYRFHN6LoJWaPOfHxjycFKydCDknPLurVtFsmxXmd7E6WIwq/YrvUDUUpslwsSW9t0/gxsVI2q6qKUKFxFTF5q+7cFGkNWoMkOqsS22cx1lEsAiwNP254w+F5D6r2eNM29Demk7F5gvxL7JK1KCn/fNtt4NSmy9oFC22a0R9J22jZMrdVJ63mLk3LjlgV+2Lbm1xPbWvbfk7S+L+D3ts8r1cp6izPX4d5h+1919CpihNYnNDkNJOkbpPDrx/P95YBP2S5dUG8kObjEYQ2D52w/J4k8HXeHpEmlYkopE+8lR5tW0122vzrEbkNxD+mPqhGHlsPMayVXlzqsYegOqMcn6T2kihA1JL1TjopbxPZspSjUEk4lBepUU2/7521v60ZE0quBVcgl15hboWYChSXXWr6PZ4GvlGiMlG0tXAP8i/QfcEMdobxe9klgdduHKpVgm2T7VzVtjBFav5AXbk8A3kA6u12UwrNbdQhbr3OGJmk/0pTApqQprr2Ao213Ff0naartyYNNeXU71ZU1zyMFgRxFmmZ8DFjc9q7damW9S0gL6O3lpb496E6ddarp1VVIKQ9XMHAquWR6tZFjRNLVtrdrG1FB+gN06YhqkNe6zvZWhft2GmlMt71ZgdZNtjeZ37Zh6DRWcq3pqd8mbWvTnUwq+PC7rPlm4DO2i4JClJLlpwMH2N4grwX/sdvvohMxQusfOkU6rV2o1UjwQIXtsyRNZ+5U13tcVpLoyHy9W6ktFcp5a7b3yJuOUUpAXha4pIb0qrbfUdc+5obmTyct0LdSehbZSDSc7e3ydaOV8DWw9NoiJBu7fq952nh9YNk2zQm0VKrpkkck7c/cdaB9KYjAdLMl16rfwUfz9Rn5ej+g65yshm1rpekox4m235ejY7H9rNoif0oJh9ZHNBjp1EjwQJttdwB31NT4a75uYsrrXGAzSVfY3jHrNjFte62kDW3PqiNSTa9qkGTjGrpNRsNVSa0rMzAC8y+Fcq2J9i8C/0Oqodgtk0h/9q9s05wNHFpo28GkE4LvkH4T1+ZtpdQuuVb9DiRta3vbloc+l6MTS6e5GykH10LTUY4v5JPsagljIg1NyYdD6x8ai3SyfZykm5lbWuprJcEDTdNhimvOQ3Q/1bWIUkmudSR9sv1B2/+30MztgA8qpQI832Jb19OhmQNJZb5a+WCHbcOh0Wg4SUeQKsr8jblrogZK3+sipECL1rXbb9Ol47B9AXCBpO3b0xuUUlC6Jjvp0tqDnTjY9vFKJddWIk17n0qK/u2WpTQwzWYb6kU5NmkbNB/l+GXSLMrrJJ1FCtT5YA29OYRD6x8+QPpD+Bgp0ul1pOCErskL57+zfUm+P17SGu6yfFPTNDzFtQ/wHtIx3KTuLk2IaPBk4wmUJxs3doxkjgQmubD9TAc2qpwZzEn8LipBlvkuaXTRygkdtg2K5k0VGUDJWmYlna93JdUOnVlj2uxg0mh7WZKtT1Bv9Nikbdj+jKT3khyPgCl1ohxtXyZpBinFRqSToFq5nhXh0PqElmm456gR6ZQ5h7ktJGBu+aZaXWb7Cdt/Br6p1EdpwNmiBqmfOFzpepbNofFk44aPEUg5dk0WwF5E0nK2H4M530PX/zGStiYdvyu2jb4nkAJhuqFay9yWVFy76om2NwPbyXRLVXJtTeA/VFhyTalQ71pONRcnkAL16n4njdjWSl6Tq7UuJ2ndHIVcnZBUOYWrSVrN5V0j5r5GRDn2B3kq5RjmrSjRdc3EQSK6ippK9juSLgbe7VSvrgpdvrgkEi7vX0WeiRSAsCap1cj6NWxspLBrU8dIi5NYn7RedTEDIzCLpmslHUDqAn0u6TOcDBxr+4whd5xX5y2k7g4fAn7U8tBs4CIXVJXJAUM7V/lOypVqXNhqJTuiquTa40qlplax3fXJiqTf295+/s9csLY1vESApCm2D8vfRafo2tJiCHOIEVr/8BM6dCMupPHyTX3M+cC5eUrkdaSIwk+XirmtInk+mzy8VE/S3qTaib8j/XBPkFQa8tzUMVJN0f4lX5bIl1rYPl3SNFL6hIA9bd9WoHMVqWfWfzUUQASp6/oyQNXlYOm8rSuqUQbJYQC8voEAvcskfZp5O0wXdWRwKgf3IrC9UqX9iq4cWtNRsLYPyzd3JfVA247k2P4AnNjEa8QIrU+QdL3tNzWkNRE4i5QDBWlq6QO2725Cv9+Q9FHgHaRk6MNt14r866BfJ4dvJqnD74CQ55LRcpPHSD8j6bu2j9K8DSaBsnJwkg4ijW6r3nJvAY5xl8n+HUYZarWxZJSRA5DaccnsTNY7hRTYcystwT4u6CM3EkiaSuoAclbetC/wStuTa2uHQ+sPJP0f0vpAU92IGyvf1I+0ra2IFDAxC/gT1Jo2a9VdBNgMWN722wv1ZrWO+vJ00Mz2keAwtRo9RiRdBuzdFpX4s9L32hSSNrM9PY9abmx7eILtolZDkl5LOk5uJ1XOeKg9irILrcnAJU49775IClT5WhPrQHWRdJvt9Xptx2B0Wv5oakkkphz7h+rMu7UHmikosqu2Bp+SajX47FPap0POG2R7iW51lvciqU9bncXwJkOeGztGMit2iEpcqVCrMWxXwRrvJ3W9ngVzIkePoqB3npovYn207amStiOVz/o2adqs6xG05paCWi2P/tYmRZ+WloL6o6T1SqZ7FxB/krSVc2d0SW8ildaqTYzQxiCSfkFqR1NNp3wA2Nj2noPvFQBI2gL4PAM7h9fJQ2u8sGtTKFV/2SPnaFVFcs8rnV5tGqXixOeSKmdsR6qMslvJiZkaLmKtXF5O0v8GZtn+qQo7pavhUlCStic5/YdpJpeyUSTdTgpGqhL4VyONml+mpp0xQusjlBperk9LeR+XFcVtusFn35LXpD7LvJ9b6Zn3maSgkluoGercQq3CrpL2t32mOiSQQ60k8i8AV+cRPKQR/WFDPH+BYvseSfuQAn/uJ0UplvbNarSINfCgpB+Tihd8U6modWn1jKZLQZ3C3Cn4po7hJmmitFxHwqH1CZJ+RJrXfytwMqkAcGlV66YbfPYzZ5Giw3YjhXkfSGreWMo/StdoOqF5C7uWRDlWVSOajjq7JEdxVgmun2gqwbUOmrdo7/KktcPrlfrmlZzBP6DUdf18UlThY8BD89lnKCaT/pi/lUPjX0Pqrl1C06Wg/lJFOPcjDUauzkNMOfYJkm62vVHL9dLAL23vXKC1MXA6qVAvpAr0B5bkyPQ7ytXXq88tb7vKdsf+csPQ25EUddVeHb+0UnljUY4jgZrrpt0YGqF+bS36byEXsbb9Qh2tJpC0M2m0vB6pPNW2wEG2rxxyx8H1fkiqg3kRDRzDo4kYofUP1QjqmRyN9SgpqbcrchTdJM+tPFC7wWefUzUG/Guesn2ItPBfykHAuqQeZq31DUv/DBor7JrXlI4njahMCmr4hO17CvWqbtoDwrsp66bdGCN5Bp/1G+s92AS2L83rmU2VghpPcmStJ8N1juFRQzi0/uFXeUrkOOaW5Dm5W5GcVPkxYOoYd2QVX89RnZ8i1fmbQEo+LmXjkpD6IegU5VjaoPOnwA+AqmXOPlm3NDftPaSTn0YqnQdlaG7HiIs7bOsa2wc1ZtwoI6Yc+4Q8h/5hUvO8Odnztp8r0PoiacTXSOWBhQlJJwHfaTLkWQMLuxZHOXZKrFa9Bpq/IeWhPVWyf1APSeNI6+ZXkkp9tXaY/o3tN9TQPYR5A6X6IrF6JAmH1ifk7PnZpCg7qJE9nysPdKqwUFR5oJ+RtA4p/2flHPK8EfAu218v1LsdmAg01T6m0p3AwHWqrk8ucmL148DPSN/v+0iNW39QopnTOxrpph10j1JfvKNIJbgeZK5DexI4yfb3C3XPIfUufD+pp9p+wO22i/vwjRbCofUJTWbP59Fee620H9UIee5bcsj5Z4AfVzlAkm6xvUGhXseAhNJ1HUmHk/5UniWtU1UOsqTodGuJpOqHW/0Jdq0p6cBO291lOaigHpKOsH1Cg3pVjlwVYLY4KUG9dvHffifW0PqHJrPnTyOd5X0v3983b6tdK60PWdL2DW1pOy+Wio1AQMKngfUbCof/dxostxSOq294WNIytmdLOpr0vX699HtlbqDU45I2ICVYr9GAnX1POLQe05JzszhwgKS/5PurA6XrOJPaRnZX5vDxscgjOW+nyuHZi7l9lvqBu4FnGtJqpNySpKm2J3fI9wIozfMKyvmi7XPy9/p2UneGojJamSlKdTmPJnWfWBr4YiOW9jnh0HrPbiOgOWK10vqQjwJTgHUlPUha+9qvtyYN4D+AayVdT/11qqplzDtJU8gXSDqmQKdaSxmJYy/ontbv9cQa32vFGaRO5mswt/zdyjX0Rg3h0HpMk1NcIzTa61tyzt3mtneStBQp56vfOgv8GPhvmilD1Ei5Jdt/zdcjmu8VDJsmy2gBXEDqRD6dehVHRh0RFDKGGOkKC/2IGu722zSSrrW9TUNaS5LKLc2yfWcut7Sh7UsL9fYEvgmsRAouKepEHNRjBL7X4qCo0U44tGBU0+85d5KOBe5j3jJEPbdP0l3A7rZv77UtCzt5/Wxt26fm8mhL2+7U+HM4WlOAE5zb7ixMhEMLRjX9nnM3SKh9X9gn6Rrb2/bajoUdSV8m9bibZHudXPrunG6/m5Ylh8WAtYF76MP2MSNJrKEFo5316JBz11OLBtIx1L6XBuWpRoBpSr24zmchK2LbZ+wBvBGYAWD7IUklnRUW+iCfcGjBaKffc+4a62zcILvna5NSCha6IrZ9xgu2LalKPVlqfjt0YiyukXdLOLRgtNPvOXdNhdo3RlW8VtJppMruj+f7y5EcbrBgmZqjHF8p6VDgYOCkHts0KgmHFox2+j3nrumQ7CbZqHJmALYfk/TGXhq0kPI8cDlppmES8CXbl/XWpNFJOLRgVDKKcu6a7GzcNItIWs72YwCSlif+E3rByqRk9xnAKSTnFhQQUY7BqGRhzLlrGkkHkCqZnEs6GZgMHGv7jJ4athCiVIx0Z1KD2c2BqcBPbN/dU8NGGXE2FoxKwmHVx/bpkqYBO5BCu/dssg9cMHxyUMjDpELCLwLLAedKusz2Z3tr3eghRmhBEAQ9RNLHgQOBR0hd6s+3/a9c2u1O2xN7auAoIkZoQRAEvWUF0uh4wKyD7ZclLfS5Zd0QI7QgCIJgTNAv4cNBEARBUItwaEEQBMGYIBxaEARBMCYIhxYEQRCMCcKhBUEQBGNd2GEOAAAAB0lEQVSC/w/x5UM6vOC6HwAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Perplexity: -1.17\n", | |
"Topic 0: ['kitten', 'cute', 'look']\n", | |
"Topic 1: ['munch', 'banana', 'broccoli']\n", | |
"I like to eat broccoli and bananas. ['(0, 4.1%)', '(1, 95.9%)']\n", | |
"I munched a banana and spinach smoothie for breakfast. ['(0, 3.3%)', '(1, 96.7%)']\n", | |
"Chinchillas and kittens are cute. ['(0, 94.6%)', '(1, 5.4%)']\n", | |
"My sister adopted a kitten yesterday. ['(0, 95.8%)', '(1, 4.2%)']\n", | |
"Look at this cute hamster munching on a piece of broccoli. ['(0, 50.0%)', '(1, 50.0%)']\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<gensim.models.ldamodel.LdaModel at 0x2b8c7cefda0>" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"apriori_harder = {\n", | |
" 'cute':0, 'kitten':0, 'hamster':0, 'chinchilla':0, 'look':0,\n", | |
" 'banana':1, 'broccoli':1, 'piece':1, 'breakfast':1, 'munch':1\n", | |
"}\n", | |
"eta = create_eta(apriori_harder, dictionary, 2)\n", | |
"test_eta(eta, dictionary, 2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Clearly marking additional terms as associated with topics has provided more polarization to the model." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We have seen that by providing a prior term-topic distribution to the model we can guide the LDA towards a useful topic model." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"However I have one last question. Since the LDA training algorithm is iterative in nature, does the order of the words in the dictionary have an effect on the result? Let's find out. First let's take a look at the dictionary we have right now." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['banana', 'broccoli', 'eat', 'like', 'breakfast', 'munch', 'smoothie', 'spinach', 'chinchilla', 'cute', 'kitten', 'adopt', 'sister', 'yesterday', 'hamster', 'look', 'piece']\n" | |
] | |
} | |
], | |
"source": [ | |
"print([dictionary[w] for w in dictionary.keys()])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's create a new one with a different word ordering." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['adopt', 'banana', 'breakfast', 'broccoli', 'chinchilla', 'cute', 'eat', 'hamster', 'kitten', 'like', 'look', 'munch', 'piece', 'sister', 'smoothie', 'spinach', 'yesterday']\n" | |
] | |
} | |
], | |
"source": [ | |
"dictionary2 = gensim.corpora.Dictionary(\n", | |
" [['banana', 'broccoli', 'eat', 'like', 'breakfast', 'munch', 'smoothie', 'spinach', 'chinchilla',\n", | |
" 'cute', 'kitten', 'adopt', 'sister', 'yesterday', 'hamster', 'look', 'piece']]\n", | |
")\n", | |
"print([dictionary2[w] for w in dictionary2.keys()])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbQAAAByCAYAAAA78iRbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnXe4XGW59n83zYQSBCkqSDFAkC5NmngEREFQQYggCAIfYEOwHo+iYsHvk6OfIioaEA5NNKA0UaQcRAEpSSSE5kc7SBEVpIQucH9/vO9KZk9m7+x519qZ2TvP77rmmpk1s+55ZmbNPOt936fINkEQBEEw2lmk1wYEQRAEQROEQwuCIAjGBOHQgiAIgjFBOLQgCIJgTBAOLQiCIBgThEMLgiAIxgTh0IIgCIIxQTi0IAiCYEwQDi0IgiAYE4RDC4IgCMYEi/XagIWJJfQKj2OpXpvRmaXG99qCoXn62V5bMDh9/tm99NqXG9N6w/jHGtMaCWb9c8Vem7DAWGz8i702YVAWvfP5RvVm89gjtuf75YZDW4CMYynepB17bUZnNtqo1xYMzXU399qCwenzz+6JLz3TmNZ1m5zbmNZIMPFnH+q1CQuMFdd9pNcmDMqyu97VqN7lPve+4TwvphyDIAiCMUE4tCAIgmBMEA4tCIIgGBOEQwuCIAjGBI07NEmvknRTvjws6cGW+0t0qXWqpEldPP9oSXdJukPSTt1bP1/9T0oa17RuEARBUJ/GoxxtPwpsAiDpGOAp298q1DpouM+VtBGwJ7Ae8DrgEkmTbDcXswyfBE4BnmtQMwiCIGiABTrlKOmzkm7JlyPytrUk3SrpDEmzJE2VND4/drWkyjm+U9IMSTMlXdpB/t3A2bZfsH038Bdgs/nYs46k/86aMyStIWknSee3POdHkvaX9AlgJeAPki7Pj+0i6Y95359L6tMksyAIgrHPAnNokrYE9gO2BLYGPpJHVZBGVT+wvSFp9HN4276vBk4E9rC9MbBPh5dYBbi/5f4DeRuSfitppQ77nA18J2tuA/x9MPttfyc//mbbO2W9zwE72t4UuBk4coiPIAiCIBhBFuQI7c3AL2w/Y3s2cD6wXX7sXtvX5dtntmyv2Bq40vZ9ALb/2UFfHbY5P//ttgc4K0nLASvYvig/5znb3WSgbkNyxNdKuonkrNeYxyjpMEnTJE37F81mzwdBEARzWZCVQjo5nArP5746bGvnAdLaWcWqwEPz2aeT5osMdPSDBYEIuMT2B4Z8AXsKMAVggpaf33sIgiAIClmQI7TfA3tIGi9padKa1x/yY2tK2iLf3he4um3fa4AdJK0OIGn5DvoXAvtKWkLSRGB1YPpgxth+DHhE0u5Zc5ykJYH7gPWzznLADi27zQaWybevBd4i6fV5/6UkrT3/jyEIgiAYCRaYQ7N9A2nN6kbgOuBE27Pyw7cCh0q6GViKPKJp2fdvwIeBCyTNBM7qoD+TNI15O/Br4CNVhOMQa2j7AZ/Kr3s1sKLte7POLOB0YEbL86cAl0u6PNt0CPDzbNO1wDpdfixBEARBQ8ju7SyYpLWAc21v0lNDFgATtLz7tjjxVv1dYLevixP3+WcXxYnHJgtZceLptjef3/OiUkgQBEEwJuh5+xjbd5ETsYMgCIKglBihBUEQBGOCcGhBEATBmKDnU44LE6/Z8Bm+cOFNvTajI4eetXWvTRgSH7lxY1onv+m0xrSg/z+7NXZtLqDm7X2+OrD2ZrN7bcICw9ObDbxoknvPbu73CsA+wwtGihFaEARBMCYIhxYEQRCMCcKhBUEQBGOCcGhBEATBmCAcWhAEQTAmGNKhSXqVpJvy5WFJD7bcX6KbF5J0qqRJw3zuSpJ+J+lpSd/t5nUWFK3NR4MgCILeM2TYvu1HyVU8JB0DPGX7WyUvZPugLp7+DPAF4I3AWiWvFwRBECxcFE85SvqspFvy5Yi8bS1Jt0o6Q9IsSVMljc+PzRnRSHqnpBmSZkq6tF3b9lO2ryF1rx6uPWdK+oGkKyXdLWl7SadJukPST/JzFpP0eMs++0g6uWX/4yVdK+keSXu0PO/z+f3MlHRsy8vuI+kGSX+WtE13n2AQBEHQJEWJ1ZK2JLVe2RJYFLhB0lWkkdV6wCG2r5N0OnA48N2WfV8NnAi82fZ9g/Q2G+q1TwWOt90pQ3lZ22+V9F7gIlKn6zuAGZI2yLeHYiVgW2BDYCpwXu6Xtguwpe1n2+yV7S0lvQv4EvCObt5LEARB0BylI7Q3A7+w/Yzt2aT+Ydvlx+61fV2+fWbL9oqtgStt3wdg+5/dvLDtgwZxZpCcGKReZg/Zvi33RLsNWGMY8uc7cTOwSt62E3CK7Wc72PvLfD19MH1Jh0maJmnaE4++NAwTgiAIghJKHZqGeKy9wVr7fXXY1hTP5+uXW25X9xfL1622jxtkf1qeN5S91fNfYpDRru0ptje3vfmyr1p0aOuDIAiCYkod2u+BPSSNl7Q08G7gD/mxNSVtkW/vS+oE3co1wA6SVgfodsqxDnm09piktSUtAuwxv32AS4FDWtYCF5i9QRAEwfApcmi2bwDOBm4ErgNOtD0rP3wrcKikm4GlgClt+/4N+DBwgaSZwFmdXkPSA8BxJGfyQBXyn8P/64TL/ztwCXAF8MD8nmz7V/n50yTdBHyixmsHQRAEI8Swg0JsH9N2/ziSw2nnJduHddh/u5bbFwMXz+f1Vh1ke8fwf9v7t9we0DS07bGfAz8fav98f+mW28cCx7Y93vp+HibSC4IgCHpKVAoJgiAIxgSN9kNrHxkFQRAEwYIiRmhBEATBmCAcWhAEQTAmCIcWBEEQjAlkj1SOc9COpH8A9w3jqSsAjzT0sk1qLWx6/Wxb03r9bFvTev1sW9N6/WxbN3qr215xfk8Kh9aHSJpme/N+01rY9PrZtqb1+tm2pvX62bam9frZtpHQiynHIAiCYEwQDi0IgiAYE4RD60+mzP8pPdFa2PT62bam9frZtqb1+tm2pvX62bbG9WINLQiCIBgTxAgtCIIgGBOEQwuCIAjGBOHQglGNpFcMZ1sQBGOfcGh9hKR3SvqspC9Vly73Xzdfb9rpUsOuK4azrUB3qboawB+HuW3YSNpG0vslHVBdamhJ0v7VdylpNUlb1rGvn5G0uqSd8u3xkpbptU0AkvYezrZhai0qqbG+iJJ2yw2Hm9Bq1LasuUaHbVvM+8yuNEfkOAmH1idI+hHwPuAIQMDewOpdynwqX3+7w+VbBTaNyx26V5C0nKTl82UN4LXd6rXobiPpNuD2fH9jST/sUuPVkjYDxkt6Y4vj/jdgyRq2nUH6rLYDtsiXOomfPwS2JnVvB5gN/KDQtm0lXSbp/0m6R9K9ku4pNSx3bj9X0m1Z756aeocC5wI/zptWBc6vobenpDslPSHpSUmzJT1ZKPcfw9w2X2y/BLy70I5O7APcKek4SW+oIzQCtgH8UtIq1R1JbwFOKRVr+jgZoB1Rjv2BpJttb9RyvTTwS9s799CmI4GjSM7rQZKjBXgSOMn29wt1rwf2Ai60/ca87RbbG3ShcSDwQZKzubHFttnAf9n+ZaFttwPruaEfhqQZtjeV9KeW9zrT9sYFWneQOqZPB16qttt+tNC2q4EvA98BdgcOIv0nfLlQ7yZgS+D6lvc6y/aGhXp3Abvbvr1k/6yxC7ArMJmBjX0nkL7notGypGOBZbPm09V22zMK9SaQTnoOAgycCpxte3Yf2LYF6cRsd2BT4Buk7+X+Qr1Gj5NWGu2HFtTi2Xz9jKTXAo8Ca3YjIGnPoR7v9k/e9vHA8ZKOsH1CN/sOQ/t+Sa2bXhrsuYPsfxpwmqT32v5Fg6bdArwa+GtDev+StCjpTwpJKwIvF2o9Yfs3DdkFMN72FZJk+z7gGEl/IDm5Ep63/UL1vUpajPy+C/lbHWeWeQiYBryLdCJQMZt0clDKNvn6qy3bDOxQImb7SUm/AMaTTiL3AD4j6XsFv72mbbtR0seBS4HngLfZ/keJVqbp42QO4dD6h19JeiXwn8AM0hd8cpcauw/xmIGiUYvtEyRtAKwHjGvZfnqJHnC/pG0AS1oC+Dh5+rGAVfPZ7WzgJNIZ5OdsX9qNiKSLSJ/RMsBtkm4Anq8et/2uQvu+B5wHrJTPnPcCju7Stmr980pJ/0n6HlttKzrzBp7Lazd3SvoYaRS+UqEWwFWSPk+aBn4b8BHgohp60yT9nDQd1fp+h30c254JzJT0U9Iofl3S9/xn2y+UGmb7raX7tiNpd+BgYCJwBrCl7b9LWpL0u+jKoTVlW8tvomJJ4AngJ5Lq/CaaPk7mEFOOfYhSlN4420/02hYASV8G/o3k0H4N7AJcbXuvQr0VgOOBnUh/MpcCR5ZMnVXTd5LeDnwU+CJwqu2ugmDyusCg2L6qW9tatNcFdiS91yu6HXVIunJo01x05p2nkm4HXgl8jTRNdZzt6wr1FgEOAXYmvdff2j6pRCvrndphs20fXKC1K2nN5u5s25rA4aUjXkkrk6beXmt7F0nrAVvb/kmB1unAybZ/3+GxHW13FYDVlG0j9ZvodJyQ3n9tZxQOrY/Io5Y1aBk5l4yCJC1LmjbaPm+6CvhqqYOUNAvYGPhTdh4rkw7AoUaEQ+ktb/ufbdvWtH1vgVa15ng88Dvb57WuVxXorQn81fZz+f54YGXb/9OlzoQ8jbR8p8fb3/9YQNKReZp6yG29IK8/7mb7rnx/InCx7XUL9X5DWuf6Qv5NLEb6fdReB6rLSNiWf/NVZOMNtv9eQ2sp4LkcwEKekn+F7WdKNSsiyrFPULPRdaeQpuAm58uTpAO8lGdtvwy8mKf3/g68vobeRVkHgBzZVTrlMF3SpaSF/98qhf+WrlEBnNO2/0t5W7f8tLKPtIbTft01kr6Rp6Wr+8tJ+nqBzkWSLhzsUmJb5sAO2z5YKiZpHUlXSLol399IUlfTtS38vXJmmXtIx3EpK9ieSj5WbL9Il+vAFZK2knSjpKckvSDpJZVHczZqW7ZvMnADKfJ6MnC9pKLZmcwVpLXCivHA5TX05hBraP3D5jQXXTfR9ntb7n8lRxaVMi3/kU4h/SE/BVxfQ+8bJKf2TmAScDqwX6HWIcAmwD22n5H0KlKkWCmLta6t5MXrJboVsb1bvu4qsGc+7GL78y2v8VieSuv2T77rFI6hkLQv8H5gzTaHuAwpuKmUk4DPkMO7bd+c18K6duLArZJ+DUwlrQvtDdyoHEhVEBX7dD7WqmCfrUjrSyV8nxS6fw7pf+AAYK1CraZtA/gCsEU1KsuBTZeTQu9LGGf7qeqO7afyemFtwqH1D01G1z0raTvbV0PKX2JuFGUJHyP9Ya0MvA1YjRTtVITtiyUtTlo7WwZ4j+07C7VelnQvsI6kcfPdYf78Q9K7bF8IIOnd1OzQq5TDszoDp5LnWS8ZBotKeoXt57PueKDrqih11gMH4VrScbsCKeexYjZwcw3dJW3foIHRsC8Wao0D/gZU60L/AJYnBVKVBEx9ErgQmCjpGmBFkpMswvZdkhbN03CnSrq2VKtp24BF2qYYH6Xe7N7TkjatgpmU8knr/D/NIRxa/7ACzUXXfZgU0r5svv8YNaZ+SInALwM72P6qpCdIzqiragGSTmBg1NQE0tTPETlq6uPdGibpfwFHkpIzbwK2IlUKKQqUAD4EnCXp+6QF6/tJZ8xFSPomKWH+NuZO+xgocWhnAlfkYAmTIuNOK7BpFkOESdveqBu9HPJ/HymBvEkeyWtd1UhjLwpP+GzXGbV34laSc5xEOk7+TPmf/DN5FuAmSceR3mOdKjpN2gZwiaTfAmfn++8jBYeVchRwjqSH8v3XZM3aRFBInzBYRFHN6LoJWaPOfHxjycFKydCDknPLurVtFsmxXmd7E6WIwq/YrvUDUUpslwsSW9t0/gxsVI2q6qKUKFxFTF5q+7cFGkNWoMkOqsS22cx1lEsAiwNP254w+F5D6r2eNM29Demk7F5gvxL7JK1KCn/fNtt4NSmy9oFC22a0R9J22jZMrdVJ63mLk3LjlgV+2Lbm1xPbWvbfk7S+L+D3ts8r1cp6izPX4d5h+1919CpihNYnNDkNJOkbpPDrx/P95YBP2S5dUG8kObjEYQ2D52w/J4k8HXeHpEmlYkopE+8lR5tW0122vzrEbkNxD+mPqhGHlsPMayVXlzqsYegOqMcn6T2kihA1JL1TjopbxPZspSjUEk4lBepUU2/7521v60ZE0quBVcgl15hboWYChSXXWr6PZ4GvlGiMlG0tXAP8i/QfcEMdobxe9klgdduHKpVgm2T7VzVtjBFav5AXbk8A3kA6u12UwrNbdQhbr3OGJmk/0pTApqQprr2Ao213Ff0naartyYNNeXU71ZU1zyMFgRxFmmZ8DFjc9q7damW9S0gL6O3lpb496E6ddarp1VVIKQ9XMHAquWR6tZFjRNLVtrdrG1FB+gN06YhqkNe6zvZWhft2GmlMt71ZgdZNtjeZ37Zh6DRWcq3pqd8mbWvTnUwq+PC7rPlm4DO2i4JClJLlpwMH2N4grwX/sdvvohMxQusfOkU6rV2o1UjwQIXtsyRNZ+5U13tcVpLoyHy9W6ktFcp5a7b3yJuOUUpAXha4pIb0qrbfUdc+5obmTyct0LdSehbZSDSc7e3ydaOV8DWw9NoiJBu7fq952nh9YNk2zQm0VKrpkkck7c/cdaB9KYjAdLMl16rfwUfz9Rn5ej+g65yshm1rpekox4m235ejY7H9rNoif0oJh9ZHNBjp1EjwQJttdwB31NT4a75uYsrrXGAzSVfY3jHrNjFte62kDW3PqiNSTa9qkGTjGrpNRsNVSa0rMzAC8y+Fcq2J9i8C/0Oqodgtk0h/9q9s05wNHFpo28GkE4LvkH4T1+ZtpdQuuVb9DiRta3vbloc+l6MTS6e5GykH10LTUY4v5JPsagljIg1NyYdD6x8ai3SyfZykm5lbWuprJcEDTdNhimvOQ3Q/1bWIUkmudSR9sv1B2/+30MztgA8qpQI832Jb19OhmQNJZb5a+WCHbcOh0Wg4SUeQKsr8jblrogZK3+sipECL1rXbb9Ol47B9AXCBpO3b0xuUUlC6Jjvp0tqDnTjY9vFKJddWIk17n0qK/u2WpTQwzWYb6kU5NmkbNB/l+GXSLMrrJJ1FCtT5YA29OYRD6x8+QPpD+Bgp0ul1pOCErskL57+zfUm+P17SGu6yfFPTNDzFtQ/wHtIx3KTuLk2IaPBk4wmUJxs3doxkjgQmubD9TAc2qpwZzEn8LipBlvkuaXTRygkdtg2K5k0VGUDJWmYlna93JdUOnVlj2uxg0mh7WZKtT1Bv9Nikbdj+jKT3khyPgCl1ohxtXyZpBinFRqSToFq5nhXh0PqElmm456gR6ZQ5h7ktJGBu+aZaXWb7Cdt/Br6p1EdpwNmiBqmfOFzpepbNofFk44aPEUg5dk0WwF5E0nK2H4M530PX/zGStiYdvyu2jb4nkAJhuqFay9yWVFy76om2NwPbyXRLVXJtTeA/VFhyTalQ71pONRcnkAL16n4njdjWSl6Tq7UuJ2ndHIVcnZBUOYWrSVrN5V0j5r5GRDn2B3kq5RjmrSjRdc3EQSK6ippK9juSLgbe7VSvrgpdvrgkEi7vX0WeiRSAsCap1cj6NWxspLBrU8dIi5NYn7RedTEDIzCLpmslHUDqAn0u6TOcDBxr+4whd5xX5y2k7g4fAn7U8tBs4CIXVJXJAUM7V/lOypVqXNhqJTuiquTa40qlplax3fXJiqTf295+/s9csLY1vESApCm2D8vfRafo2tJiCHOIEVr/8BM6dCMupPHyTX3M+cC5eUrkdaSIwk+XirmtInk+mzy8VE/S3qTaib8j/XBPkFQa8tzUMVJN0f4lX5bIl1rYPl3SNFL6hIA9bd9WoHMVqWfWfzUUQASp6/oyQNXlYOm8rSuqUQbJYQC8voEAvcskfZp5O0wXdWRwKgf3IrC9UqX9iq4cWtNRsLYPyzd3JfVA247k2P4AnNjEa8QIrU+QdL3tNzWkNRE4i5QDBWlq6QO2725Cv9+Q9FHgHaRk6MNt14r866BfJ4dvJqnD74CQ55LRcpPHSD8j6bu2j9K8DSaBsnJwkg4ijW6r3nJvAY5xl8n+HUYZarWxZJSRA5DaccnsTNY7hRTYcystwT4u6CM3EkiaSuoAclbetC/wStuTa2uHQ+sPJP0f0vpAU92IGyvf1I+0ra2IFDAxC/gT1Jo2a9VdBNgMWN722wv1ZrWO+vJ00Mz2keAwtRo9RiRdBuzdFpX4s9L32hSSNrM9PY9abmx7eILtolZDkl5LOk5uJ1XOeKg9irILrcnAJU49775IClT5WhPrQHWRdJvt9Xptx2B0Wv5oakkkphz7h+rMu7UHmikosqu2Bp+SajX47FPap0POG2R7iW51lvciqU9bncXwJkOeGztGMit2iEpcqVCrMWxXwRrvJ3W9ngVzIkePoqB3npovYn207amStiOVz/o2adqs6xG05paCWi2P/tYmRZ+WloL6o6T1SqZ7FxB/krSVc2d0SW8ildaqTYzQxiCSfkFqR1NNp3wA2Nj2noPvFQBI2gL4PAM7h9fJQ2u8sGtTKFV/2SPnaFVFcs8rnV5tGqXixOeSKmdsR6qMslvJiZkaLmKtXF5O0v8GZtn+qQo7pavhUlCStic5/YdpJpeyUSTdTgpGqhL4VyONml+mpp0xQusjlBperk9LeR+XFcVtusFn35LXpD7LvJ9b6Zn3maSgkluoGercQq3CrpL2t32mOiSQQ60k8i8AV+cRPKQR/WFDPH+BYvseSfuQAn/uJ0UplvbNarSINfCgpB+Tihd8U6modWn1jKZLQZ3C3Cn4po7hJmmitFxHwqH1CZJ+RJrXfytwMqkAcGlV66YbfPYzZ5Giw3YjhXkfSGreWMo/StdoOqF5C7uWRDlWVSOajjq7JEdxVgmun2gqwbUOmrdo7/KktcPrlfrmlZzBP6DUdf18UlThY8BD89lnKCaT/pi/lUPjX0Pqrl1C06Wg/lJFOPcjDUauzkNMOfYJkm62vVHL9dLAL23vXKC1MXA6qVAvpAr0B5bkyPQ7ytXXq88tb7vKdsf+csPQ25EUddVeHb+0UnljUY4jgZrrpt0YGqF+bS36byEXsbb9Qh2tJpC0M2m0vB6pPNW2wEG2rxxyx8H1fkiqg3kRDRzDo4kYofUP1QjqmRyN9SgpqbcrchTdJM+tPFC7wWefUzUG/Guesn2ItPBfykHAuqQeZq31DUv/DBor7JrXlI4njahMCmr4hO17CvWqbtoDwrsp66bdGCN5Bp/1G+s92AS2L83rmU2VghpPcmStJ8N1juFRQzi0/uFXeUrkOOaW5Dm5W5GcVPkxYOoYd2QVX89RnZ8i1fmbQEo+LmXjkpD6IegU5VjaoPOnwA+AqmXOPlm3NDftPaSTn0YqnQdlaG7HiIs7bOsa2wc1ZtwoI6Yc+4Q8h/5hUvO8Odnztp8r0PoiacTXSOWBhQlJJwHfaTLkWQMLuxZHOXZKrFa9Bpq/IeWhPVWyf1APSeNI6+ZXkkp9tXaY/o3tN9TQPYR5A6X6IrF6JAmH1ifk7PnZpCg7qJE9nysPdKqwUFR5oJ+RtA4p/2flHPK8EfAu218v1LsdmAg01T6m0p3AwHWqrk8ucmL148DPSN/v+0iNW39QopnTOxrpph10j1JfvKNIJbgeZK5DexI4yfb3C3XPIfUufD+pp9p+wO22i/vwjRbCofUJTWbP59Fee620H9UIee5bcsj5Z4AfVzlAkm6xvUGhXseAhNJ1HUmHk/5UniWtU1UOsqTodGuJpOqHW/0Jdq0p6cBO291lOaigHpKOsH1Cg3pVjlwVYLY4KUG9dvHffifW0PqHJrPnTyOd5X0v3983b6tdK60PWdL2DW1pOy+Wio1AQMKngfUbCof/dxostxSOq294WNIytmdLOpr0vX699HtlbqDU45I2ICVYr9GAnX1POLQe05JzszhwgKS/5PurA6XrOJPaRnZX5vDxscgjOW+nyuHZi7l9lvqBu4FnGtJqpNySpKm2J3fI9wIozfMKyvmi7XPy9/p2UneGojJamSlKdTmPJnWfWBr4YiOW9jnh0HrPbiOgOWK10vqQjwJTgHUlPUha+9qvtyYN4D+AayVdT/11qqplzDtJU8gXSDqmQKdaSxmJYy/ontbv9cQa32vFGaRO5mswt/zdyjX0Rg3h0HpMk1NcIzTa61tyzt3mtneStBQp56vfOgv8GPhvmilD1Ei5Jdt/zdcjmu8VDJsmy2gBXEDqRD6dehVHRh0RFDKGGOkKC/2IGu722zSSrrW9TUNaS5LKLc2yfWcut7Sh7UsL9fYEvgmsRAouKepEHNRjBL7X4qCo0U44tGBU0+85d5KOBe5j3jJEPbdP0l3A7rZv77UtCzt5/Wxt26fm8mhL2+7U+HM4WlOAE5zb7ixMhEMLRjX9nnM3SKh9X9gn6Rrb2/bajoUdSV8m9bibZHudXPrunG6/m5Ylh8WAtYF76MP2MSNJrKEFo5316JBz11OLBtIx1L6XBuWpRoBpSr24zmchK2LbZ+wBvBGYAWD7IUklnRUW+iCfcGjBaKffc+4a62zcILvna5NSCha6IrZ9xgu2LalKPVlqfjt0YiyukXdLOLRgtNPvOXdNhdo3RlW8VtJppMruj+f7y5EcbrBgmZqjHF8p6VDgYOCkHts0KgmHFox2+j3nrumQ7CbZqHJmALYfk/TGXhq0kPI8cDlppmES8CXbl/XWpNFJOLRgVDKKcu6a7GzcNItIWs72YwCSlif+E3rByqRk9xnAKSTnFhQQUY7BqGRhzLlrGkkHkCqZnEs6GZgMHGv7jJ4athCiVIx0Z1KD2c2BqcBPbN/dU8NGGXE2FoxKwmHVx/bpkqYBO5BCu/dssg9cMHxyUMjDpELCLwLLAedKusz2Z3tr3eghRmhBEAQ9RNLHgQOBR0hd6s+3/a9c2u1O2xN7auAoIkZoQRAEvWUF0uh4wKyD7ZclLfS5Zd0QI7QgCIJgTNAv4cNBEARBUItwaEEQBMGYIBxaEARBMCYIhxYEQRCMCcKhBUEQBGNd2GEOAAAAB0lEQVSC/w/x5UM6vOC6HwAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Perplexity: -1.17\n", | |
"Topic 0: ['kitten', 'cute', 'look']\n", | |
"Topic 1: ['munch', 'banana', 'broccoli']\n", | |
"I like to eat broccoli and bananas. ['(0, 4.1%)', '(1, 95.9%)']\n", | |
"I munched a banana and spinach smoothie for breakfast. ['(0, 3.3%)', '(1, 96.7%)']\n", | |
"Chinchillas and kittens are cute. ['(0, 94.6%)', '(1, 5.4%)']\n", | |
"My sister adopted a kitten yesterday. ['(0, 95.8%)', '(1, 4.2%)']\n", | |
"Look at this cute hamster munching on a piece of broccoli. ['(0, 50.0%)', '(1, 50.0%)']\n" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbcAAAByCAYAAADQxZ9YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnXn8rVPZ/98f8zEckaFHhAxHCiEyNTylZCrEiYhUaBKl+jWoNP6ePPWTJDrEg1QORUnJ8DQYMh05jiERyVhIHHOOz++Ptbazv/vs77DXvb6+2z7X+/Xar7332vv+3Nfe+973da9rXetask0QBEEQDBILTLQBQRAEQVCbcG5BEATBwBHOLQiCIBg4wrkFQRAEA0c4tyAIgmDgCOcWBEEQDBzh3IIgCIKBI5xbEARBMHCEcwuCIAgGjnBuQRAEwcCx0EQbML+y4JJLeKFllq2mt+idj1bT6nfmrLVoNa2XTXqwmhbAn//ywqp6c1Z6pqpezc/b7591wZufrKpXkydXXqKq3qIP1v3uavLkMnX7UE/deef9tpcf7X3h3CaIhZZZlpUOObia3pofu6yaVr/z0LfXrKZ12SvPqKYFsM0ue1fVe+jzj1XVq/l5+/2zLr3dLVX1anLLIZtV1Vtzet3vria3TF28qt5fP/rx28fyvghLBkEQBANHOLcgCIJg4AjnFgRBEAwc4dyCIAiCgWNcnZukF0q6Jt/ulXRX2/NFetQ6UdKUHt5/qKRbJP1J0ta9Wz+q/sckLVZbNwiCIGjOuGZL2n4AeCWApMOAR2x/o1Br37G+V9L6wC7AusAqwLmSptiumS/7MeAE4ImKmkEQBEEFJiwsKemTkq7LtwNz25qSrpd0iqRZkqZLmpRfu1hSy1FuL+lqSTMlnddF/m3Aj2w/ZfsvwN+AjUexZ21J/5s1r5a0mqStJZ3V9p5jJe0l6aPACsBFki7Ir20r6Q9529Mk1Z3IEgRBEIyZCXFukjYF9gQ2BTYHPph7W5B6W0fbXo/UKzqgY9sXAccAO9veANi9yy5eDNzR9vzO3IakX0taocs2PwKOyJpbAP8Yzn7bR+TXX2N766z3KeCNtjcCrgUOGuErCIIgCMaRieq5vQb4ie3HbM8GzgK2yq/dZrs1I/kHbe0tNgd+Y/t2ANv/7KKvLm3O79/G9hDHJWkZYDnbZ+f3PGG7l1mRW5Cc8qWSriE57tXmMUraX9JVkq6a8+j8U1EkCILguWaiKpR0cz4tPMpzdWnr5E7SWFuLlYG7R9mmm+bTDL0AGC6BRMC5tt814g7sacA0gEVXWWW0zxAEQRAUMlE9t98DO0uaJGlJ0hjZRfm11SVtkh/vAVzcse0lwBskrQogqVuBxp8De0haRNIawKrAjOGMsf0gcL+kHbPmYpIWB24HXp51lgHe0LbZbGCp/PhS4HWSXpq3X0LSWqN/DUEQBMF4MCHOzfYVpDGuK4HLgGNsz8ovXw/sJ+laYAlyT6dt278DHwB+JmkmcGoX/ZmkUOeNwC+BD7YyJUcYc9sTOCTv92Jgedu3ZZ1ZwMnA1W3vnwZcIOmCbNN7gdOyTZcCa/f4tQRBEASVeM7CkrYP63h+OHB4l7fOsb1/l+23ant8DnDOKPv7EvClLu3bDPP+m4DXd2k/BDikS/sRwBFtz88Hzh/JpiAIguC5ISqUBEEQBANHXy15Y/sW8qTvIAiCICglem5BEATBwBHOLQiCIBg4+iosOT+x6D/nsNaps6vpfebWa6ppve/yfappARz/6pOq6n31pfW0tqkcBf/rl+tWXVttu2ur6tX8vP3+WT/bx/+Jtb5R778PcNtOk6vq1WStUx+uqvfXMb4vem5BEATBwBHOLQiCIBg4wrkFQRAEA0c4tyAIgmDgCOcWBEEQDBwjOjdJL5R0Tb7dK+mutueL9LIjSSdKmjLG964g6beSHpX0rTFu8wNJO/ViUxAEQTCYjDgVwPYD5Iohkg4DHrH9jZId2d63h7c/BnwW2BBYs2R/QRAEwfxLcVhS0iclXZdvB+a2NSVdL+kUSbMkTZc0Kb92saSWo9xe0tWSZko6r1Pb9iO2LyGtxN0L20i6SNKfJW2b97VGbvujpBmSXp3bt5Z0oaSfSrpJ0sltn+2Lkq7Mn+1YSWr7DP8l6Yq8zRYj7SMIgiCYGIqcm6RNSUvEbEpaGfuDktbPL68LHG17PZJzOqBj2xcBxwA7294A2L3HfZ/YcpJdWAV4HbAjME3SosA9wJtsb5ht/nbb+zcCPpRtfpmkzXL7kbY3AdYDlgbe0m6C7U2BTwCfz20j7SMIgiB4jintub0G+Intx2zPJq151lqS5jbbl+XHP2hrb7E58BvbtwPY/mcvO7a9r+3hSg9Mt/1MXr7mDmAtYFHg+5KuA35McmQtLrN9j+05wDXAarn9jZKuAGaSnOXL27b5ab6f0fb+kfbxLJL2l3SVpKv+/fRjY/7MQRAEQW+UOjeN8JpHea4ubbXotu9DSI5uPVJPc9G2159sezwHWCivwP0dUs9yfeAEYLEu28xh7pjlSPuYa4w9zfarbL9q4YUW7/GjBUEQBGOl1Ln9HthZ0iRJSwJvAy7Kr60uaZP8eA/SqtbtXAK8QdKqAJKWLbShG7spsTYpRHkzKax4j20D+zCyYwaYBDwD3C9pKeDtY9hvr/sIgiAIxpEi52b7CuBHwJXAZcAxtmfll68H9pN0LbAEMK1j278DHwB+JmkmcGq3fUi6k7RS93sl3dmaRjDKmNstJMd7NrC/7adIvbD3SboMWJWhvbVun+0B4CTgOuBM4PKR3p/paR9BEATB+DLmVQFsH9bx/HCS8+lkju39u2y/Vdvjc4BzRtnfysO0d51SYHuvYdpvIoULWxya2y8ALmh73/vbHn8K+NQon+Fe8jSF4fYRBEEQTAxRoSQIgiAYOKqu52b7Fqi8QFYQBEEQ9Ej03IIgCIKBI5xbEARBMHCEcwuCIAgGDqWpWcFzjaT7gNvH8NblgPsr7rqmXj/bVluvn22rrdfPtvW7Xj/bVltvomxb1fbyo70pnFufI+kq26/qR71+tq22Xj/bVluvn23rd71+tq22Xj/bBhGWDIIgCAaQcG5BEATBwBHOrf+ZNvpbJkyvn22rrdfPttXW62fb+l2vn22rrdfPtsWYWxAEQTB4RM8tCIIgGDjCuQVBEAQDRzi3oCckzbMQa7e2IAiCiSScWx8iabextPWgt72kT0r6fOvWwLw/jLFtNJvWyfcbdbs1sK8qklaVtHV+PCkvYFuqdeFY2saotaCkj5ba8lwhaYlKOqt1adtk3neOWW8LSe+UtHfr1tC+KsdJ7d81L968V+s/L+klkjatpd8ESTtIGjcfFM6tP/n0GNtGRdKxwDuAA0krhO9GWlC1V50XSdoYmCRpwzZH9Hpg8QLTDsn33+xy+0aBHpLWknSGpBsk3dq6lWhlvf2AM4Dv5aaVgbMKdBbLK84vJ2kZScvm22rASiW22Z4DvK1k22Fs3EXSzZIekvSwpNmSHm6gt4WkG4Ab8/MNJH23gYk/lfTiNv3XAScU2nYK6RjbCtgk34onD9c6TqD+7wp8F9gc2CM/nw0cXSomaUtJ50v6c/5/3dbgP7Y7cLOkwyW9rNSm4YhsyT5C0rbAdsBU4LS2lyYD69ru+YpL0rW212+7XxL4qe0396izD/Bu0kngSpKjhPRn+R/bP+3VttpIuhj4AnAEsCOwL+kY/0Kh3jXApsDltjfMbbNsrzfylvPoHAQcTHJkdzH3u3sYOM72dwrt+yqwNOlYebTVbvvqAq1bgB1t31hiSxe9y4FdgZ+3fXfX2X5Fod4mpBP1jsBGwNeyvXcUaN1I+j9VOfnVOk7a9Gr+rlfb3kjSH9tsm2l7g0Lb/gR8FJgBzGmz7YFCvckkx7svYOBE4Ee2Z5fotVN1PbegMXcDVwFvJR08LWaTDqgSHs/3j0laCXgAWL1XEdsnASdJervtnxTa8iySdhllfyXOcpLtCyXJ9u3AYZIuIjm8Ep60/ZSUfJGkhUh/wJ6wfSRwpKQDbR9VaEs3tsj3X2rfHfCGAq2/13Jszxpi39H67jJzhnvvGLSulPQR4DzgCeBNtu8rlLsOeBFwT6k9HVQ5Ttqo+bv+W9KCLXskLQ8808C2h2z/qsH2Q7D9sKSfAJNIF4A7A5+Q9O2m/5Vwbn2E7ZnATEk/JF3dr0M6KG+y/VSh7C8kvQD4b+DqrHd8AzNXzldbs4HjSFfRn7J9Xo86O47wmoES5/ZEjuHfLOnDpF7SCgU6LX4n6TOkUOybgA8CZ5eK2T5K0iuAdYHF2tpPLtT7z1JbunCVpNNI4bQn2/ZR2iO/Q9IWgCUtAnyEHKLsBUlnM9RRLA48BHxfErbfWqC1FHCDpCsY+lnHrNVB7eOk5u/6beBMYIXcI9wVOLRXEc0dB/+NpP8m/T/bv7uSXuWOwHuANYBTgE1t/0PS4qRjpZFzi7BkHyJpO1L8/i8kJ7c6cEDTKyalrMbFbD/UQGOm7Q0kbQN8CPgccKLtCU8CyaGrG4EXAF8mhXYOt31Zod4CwHuBN5N+h1/bPq6BfV8AXk9ybr8EtgUutr1rod6KpPDcSra3lbQusLnt7xdondil2bbfU2jbcsCRwNak7+484KBew1d5bG1YbP9uIrQ6dOc5ToDjS8OeNX/XrLcO8MZs24UlPXRJvxnhZdvuuVcp6WTS9/T7Lq+90XZRstWzGuHc+o8c197B9i35+RrAObbXKdTbAliNtp56aW+hbezuSOC3ts9sj+cX6C1NChu+Njf9DvhSEwdcC0kH5ZDiiG096M0CNgD+mC8QViT9uUfqxY6k9yvSGMVns95CWbtorKcmkpa1/c+OttVt39ZAc0VS8gfAFbb/UaizOnCP7Sfy80nAirb/Wqi3BPBETgYhhwEXtf1YoV7j31XS5BzyW7bb652/zSAS2ZL9yT9aji1zK1D6R66aGQbMkHQeKfHl10opz01i+CeQQpxT8+1h0h97zEg6W9LPh7s1sG2fLm3vbqD3uO1ngKdzaPcfwEsb6C1nezr5+7f9NIXjWpLWlnShpOvy8/Ul9Ry+auPs/Blb+i+jQahO0lTgClK271TgcklFPV7gdIYes3NyWykXksaMWkwCLmigV+N3/WG+n0Eax++8L0LS1/IwR+v5MpK+Uqi1maQrJT0i6SlJc9QgQ7eTGHPrT66X9EtgOmmMYDfgylYSRo/jIK+iYmYYKfzySuBW249JeiEp06mUNWy/ve35F5Wyz3qhaOrAcEjaA3gnsHqHc1yKlJBTylX5xDCNdJJ5BLi8gd6j+ftvJQtsRhqPKuE44BPkdHbb1+ax36ITFymsdrak7YEpwMnAnoVaAJ8FNmn11nJixAWkFPxeWah9DDsngyzSwLbFbD/SpvdIHjcqpfHvanuHfN9z8tgobGv7M237eTAPo5RcCH2HNB3gdNJ5am9gzSpWEs6tX1kM+DvQGiO4D1iWlITRa7JF1cww289Iug1YW9Jio24wOo9L2sr2xZDm0TA3w3OsNhWNlYzApaTvaznSvLsWs4FrG+h+mOQ0VwTeBLyElPlXyseAnwNrSLoEWJ50IVTC4rav0NDsxqdLDbN9jqSFSWNtSwE72b65VA9YoCMM+QDlkaf7JL3V9s8BJL2NZitKPyppo1ZShdJ80J6O4Q5q/q4ozQ9claHDEvOMc42RBSUtavvJrD0JKK5QZPsWSQvmkO6Jki4t1eoknFsfYrtJT6iT5aiYGSbpfcBBpImq1wCbkSqUlKQpA3yANMVg6fz8QXoM/eWxrGF7prbX70UvTyO4nTT5tSZHk0JNb7D9JUkPkU7+pZU2riddAE0hJQvcRPkJ//48ttvqLexKwQWRpKMY+ltMJoXVD1TKbvxIoX3nSvo18KP8/B2kpJwS3g+cKuk7pO/tDlKvoZSDgdMl3Z2f/0e2r5Rqv6ukr2dbbmBuaNNAqXP7AXBhTkAyKdvxpEKtx3KP+RpJh5OOtyoVbSASSvoSSSuT0mC3JB1AF5Myze4s0OqaIdYgM2wW6WR8me1X5kysL9pu8mduTebEds8xd0kjVlzJzqrEptnMPVEvAiwMPGp78vBbjahXe0Lt1Z1Zqt3axqj1UlK4dAvSBcZtwJ69fndKk/2HxWm+ZBE5LL8V6YT/e9tnlmplvSVJ58DGE4ZzL7XljP5k+98NtGr+rjcB67d6WjVQKjbRyr48z/avC3VWJY07L0yax7s08N2OfINioufWn5xIGhBuhSL2ym1v6lVoHEJ2T9h+QhI5PPEnSVNKxSR9jZSu/6/8fBngENtjjuGXOq8x6A6pDyhpJ1IlilKqTKiV9CLgxeRSaMyteDKZslJokNK5t86ZfwvYnp2zCnsVKXZeY+AS4N+k7++KUhGlKTFvJ2cQt0Kxtr80wmYj6S1OCiWuans/pTJwU2z/oked8fhdbyU5j2rOLU9JajyRu+1/+zjwxaZ6nUTPrQ+RdI3tV47WNkatzUi9wJeReh8L0qz3cSYpgeRgUijyQWBh29sV6s0zjaDXq1RJF9veqqOnBenk4NLPOsy+LrO9WeG2e5JCRBuRQjm7Aofa7ilTT+NQCm2Y3sIM2xv3qDPd9tThQsW9hojbdKeSChH8lvR5XwN8wnbPCSWSziUlaHSWkPrmsBuNrHda1trb9ivyONQfev2/1vxd28LDLyZNP7mQocMSReHhGueT2sMIwxE9t/7kfkl7MXd8YQ/Ks/S6ZSSt1auI8hwl2zvnpsOUJnYuDZxbaBtUGKC2vVW+L67Y3w0NLRG2AOn7K74atH2qpBnMDens5IIJta5YCi2HlV8OLN3xeSfTVkWlBw7K9zs0sasLNbMlV7b9loq2rWH7HTnLFtuPqyMzZyzU/F2Zm+4/g5ScMmRXDXRrZDi2jo0P5ftT8v2eQNHcwG6Ec+tP3kM6iI4gHYiX5rYiKmUknQFsLOlC22/MujVCnjUHqFsTaFdkaGbY3wrl2idXPw38lVT3sxjbfwL+1ESjjRql0KaQTjYvYOjnnQ3s16tBtu/J97VDxTWzJS+VtJ7tWRXsAngqX5S1ws1r0CwM2Ph3bYWHNUwhgga2NT6ftI4NSVva3rLtpU/l7NCi8HAn4dz6kHwybnQSbaNWRtICSuWj1pb0sc4Xbf+/EuNsHy7pWuaWafpygwHqA0nVTv7O3LEsA6VhjgVIiTzt44HfpMGFRmXeY/tIpVJoK5DCxSeSMjDHhO2fAT+T9NrO9HClaRk90SU0/OxLNAsR18yW3Ap4t9KUlifbbCs9Tr5Ail6sIulUUiLYuwu1oMLv2sY+pDJo7by7S9tYqZnhuISGTgPaooHWPIRz6yO6pFEPoTBO/i7SSfrDpIykVUiD6b2yO7AT6ZipFv7LSQu/tX1ufj5J0mouK4V0EDDFhctvdGH9lmODZyesFpUZGydaoa/tSPU9Z5aEwzLfIvUQ2jmqS9uI1A4Nt+l+QtLbSY5DwLQG2ZLb1rMMbJ8v6WrStBiRLoiazJtr/Ltq+EIEk2lWiKDW+QTSReKJStOATBoHrXbhGM6tv2jFybckFddtrem2G0OXwBkzbeGhJ2iQkWT7JuDrSutUDbli1jD168bI6cxd4gPmlkIqmft1B+UVOrqxgKRlbD8Iz37OfvrPtEqhrQ58WgWl0CRtTvr+l+/okU8mJQv0DXkcqvFySzQbc3oWSevkbOHWBUBrXuBLJL3EBZXyM41/V8apEEGt84lSsek1nWpnTiYlN1atJxvZkn1ITtR4c2uujHKlBxcshZFDS4cxb4WCopqGks4B3uZU766VvnxOr1l1bXrdMkN7mvvVdlJ+OWkM6RyGZoYVhUwl7U1aAf0M0glxKvBV26eMuOFzRD5BtEqh/UupZNOLbY/55KU0D/L1pInNx7a9NBs4282qijRmPMKcbdl6IiXNrE5aVurlPepMs71//r92y9ItKmxQ43ft0KtScDprVTufSPq97deO/s4y+ukqNJjLSqTQX6ty95K5rYTv02Xl3AacBZyRQ0SrkDKxPt5Ar0YppFYo7G/5tki+NcL2yZKuIk15ELCL7Rua6tbCqRTa08BrlSrHtxjzSTAnBf1O0v+MQxJIY8YjzOmO6vq553VAgc7++eF2pDXctiI5uYuAY3rVa/UESY4N4KXlUeZnNXcj1V79LekYPkpS0RSKTM3zyfmSPs68K45XWbEgem59iKR9SVdHrTWUXgcc5oIJspIut/3qiuYh6UPAW0iTYA+wXVwPLmeWnUqajwMptPgu239pauegI+kEUrLM9bQl0LiHNdgkfcv2wZp3UdCWWK3Epr5GhRVA8rbTSatZnJqb9gBeYHtqjzqdPUHR9puU9AQlzSStWj5kCkUvkZEOvWrnk5zQ04lLo0rz6Idz608krUQavL2RVJ3g7s5stjHq/Bdp7KTRyrkd4zHKts0C/pj1ikJ/bfqNSyFJOh/YrSO78ce2t2liW78i6Qbb6zbU2Nj2jHwFfWXHy5NtFy9T0690HMsLABsDy5YeJ93C6L2G1ju2nQqc67Qe2+dIST1fLhnDy2Pk67U9XwCY2dl77UGvyvnkuSDCkn2I6hYnbl1lta/h5gKtzvDQmcO094Q6FiuV1GSx0uW7ZDeu0MS+PucPktZtEiq13UpUeidppfFZ8Gy23cE0WIOtj1mKuT2ip0mfsUmiyh8lbea84rukV5NKhZVyqO3pkrYildz7JinMWdJjqjmFAuqdT9rLlr0k91jXImU791S2bFj96Ln1Hxqn4sT9iKSfkJblaYVc3wVsYHuX4bcaVmsGsHOeJ9gqzHpmabip35H0WtKJ+V4aztdSKpx8BqlKxFakyhM71M5g6wckbQJ8hqGr0xfPc5N0IymRqVUs4CWkiMszJbrKJekk/V9glu0fqtlq91ULTtdClcqWDUf03PqT2sWJtydlEj5bTsnlRWKXBz7ZRa90yZsai5W2+Cxwce79QeoN7j/C+5/vnMDc8HCT1dCxfauk3UkJQ3eQsnWbrEnWz/yAlAR1HQ2/t0zNUl4Ad0n6HqmwwdeVCj2XVmOBCgWnJe1l+wfqUsABioclqpQtG45wbv3JnUorNp9Fyih6ELh7lG26IulY0pjdfwLHk4r1FldUJw2an0Yq2fR+UgWE+xroNV6stIXtc3PmW2sy7UcbTqbtd/7WyjItRfMWsV2WNKZyudL6a1WK2PYZ99UcSxyHLNOpJIf5jTwV4D9Iq6T3jOYtOF2aLdmqHFIze7V22bIhRFiyz8nzkJYmDTA/VbD9tbbXb7tfEvip7TcX2jPD9sYtvdz2O9td140bg94GwMmkzwhplYF9GszpqbnqcF8j6bukmpBnM3Rwv5fq8eOyFl4/I+mNpIzGzkr5Pa+m0O/UzpasiaQ3k6It65JKi20J7Gv7NyNuOEai59bnuHlx4lYv6LGcgfkAadJqKa1FGO/J4c67SYkvPZMzt6Z4bpWCosVK2/Raqw4PSY2nfNXhfmcS6eTcfqFiUibbmBhE5zUG9gXWIa1z1n6cDJxzo27B6dbY7JGk6IhJiW4ftX1rr1q2z8vj5LXKlg0hnNvg84sc4jycuSW8jm+g95Wc4XgIqfbgZNKkzp7Jk5A/DExv4tTa2InkLKuFNvoZ2/tOtA3PUzYoTYV/HtItW7LJQqM/BI4GWktf7Z61e87k1NwVRs7p0taYCEsOODmm/QHS4o7PVk+w/cSEGpbJ83gep0KVAkm/Is1ze6Sehf2LpMWA9zJvck+/rFrQl0g6DjiiyRSK5xMaWnC6UbZkt0nc6nEB33zcLk4qUvH6bBekC+Vf2X5ZqX1D9hPObbDJ1RNmkzLEoLB6Qpve2qQ5Nyvm9N31gbfa/kqh3m10r4xRUqvuJ1RcdbjfkXQ6aW24d5LWwNoTuNF2o/W6Bp2cur8GUGvJm74nh/3bx6GLSlzlSdz/An5M+t++g7S48NFj1VVaT+5gUknBu5jr3B4GjrP9nRLb5tlPOLfBZhyqJ/yOlLn1vda8G0nX2X5Fod4k5q3Ld2xJGrqkfbq1u6Bs2fOBtvlQrWShhUkTsUunZcwXDJdEM4jjj5IOIF34PE4aX2w58tLC6e0ls1rOo+WcetKVdKDto0rsGAsx5jb41K6esLjtKzqmozzdQO8k0hXbt/PzPXJbzz3LQXViI9BK7vmXpFeQJnOvNnHmPD8YRCc2Ah8HXl4xUeP/UKk0GHCvpKVsz5Z0aNb6Sq1SXuHcBpS2+UsLA3tL+lt+virQZKzh/jwfpTU3ZVfmrmNVwpSOXuRvcvrymJE03fbULnO2AAZ1rhbANKX6mYeSVmdYEvjcxJoU9Bl/AR6rqFezNNjnbJ+etbYhrV5QqjUP4dwGlx3GSfdDwDRgHUl3kcYt9mygV6Nn2RpjGq/P3K+cQloFeTXmli9bccKsCfqRTwOXSrqcOuPQrWVuticNH/xM0mEVtI5pqDUP4dwGlPEIveR5aa+yvbWkJUhzaIqq+NfsWdq+J9/PT+EmgJ+RVh6fQcXKDsFA8T3gf6lQoi1TszRY7TJjQ4iEkqAnVGn13PGojJELxH4dWIE0yF28WvPzgSaJPMH8gaRLbW9RUW9xUmmwWbZvzqXB1rN93kRqddUP5xb0Qs15abWRdAuwo+0bJ9qW5wJJ04CjnJepCYJOJH0VuJ15S7RN+P8VII+3rWX7xFwabEnb3RYx7V07nFvQCzXnpdVG0iW2t5xoO8abtpDuQsBawK3MJ/O1gt4YJnW/X/6vXyCtCzfF9tq5PODptf7DMeYW9Mq6dJmXNpEG5XAkwFVKa0SdxWAXxJ3fEmeCcrqm7k+wTS12BjYErgawfbekaqsOhHMLeqXavLSK7JjvTUp7Li4k/HxgPkycCcqpmbpfm6dsW1JrWtESo23QC+Hcgl5pPC+tNq0CwpJOIlUW/1d+vgzpzxwE8ys1U/drMz1nS75A0n7Ae4DjaomHcwt6pXbFk5qs33JsALYflLThRBoUBBPMuKbbN+RJ4AJSJGgK8Hnb59cSD+cWjIlxrHhSkwUkLWP7QQBJyxLHeDB/U21V73FgRVIBhquBE0iOrhqRLRmMiefDis2S9iZVZDiD5HinAl+1fcqEGhYEQVeUitS+mbSA7Kv2qSfEAAAAoUlEQVSA6cD3bf+lqXZc1QZjoh+c12jYPlnSVcAbSGnxu8wva3YFwfORnFByL6no99PAMsAZks63/ckm2tFzC4IgCJ5zJH0E2Ae4HzgeOMv2v3OZv5ttr9FEP3puQRAEwUSwHCm6MiQqZPsZSY3nckbPLQiCIBg4+iUlNAiCIAiqEc4tCIIgGDjCuQVBEAQDRzi3IAiCYOAI5xYEQRAMHP8fPauCZCXZ/AEAAAAASUVORK5CYII=\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Perplexity: -1.17\n", | |
"Topic 0: ['cute', 'kitten', 'chinchilla']\n", | |
"Topic 1: ['munch', 'banana', 'broccoli']\n", | |
"I like to eat broccoli and bananas. ['(0, 3.9%)', '(1, 96.1%)']\n", | |
"I munched a banana and spinach smoothie for breakfast. ['(0, 3.2%)', '(1, 96.8%)']\n", | |
"Chinchillas and kittens are cute. ['(0, 94.8%)', '(1, 5.2%)']\n", | |
"My sister adopted a kitten yesterday. ['(0, 96.0%)', '(1, 4.0%)']\n", | |
"Look at this cute hamster munching on a piece of broccoli. ['(0, 50.0%)', '(1, 50.0%)']\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<gensim.models.ldamodel.LdaModel at 0x2b8c796e898>" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"eta = create_eta(apriori_harder, dictionary, 2)\n", | |
"test_eta(eta, dictionary, 2)\n", | |
"\n", | |
"eta = create_eta(apriori_harder, dictionary2, 2)\n", | |
"test_eta(eta, dictionary2, 2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"While there are minor differences in the ordering of the topic terms and the document topic probabilities, the two models are almost identical." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.6" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment