Skip to content

Instantly share code, notes, and snippets.

@docmarionum1
Last active December 18, 2017 23:48
Show Gist options
  • Save docmarionum1/ce6c10c75c521cfec36c5fe86e2b6f64 to your computer and use it in GitHub Desktop.
Save docmarionum1/ce6c10c75c521cfec36c5fe86e2b6f64 to your computer and use it in GitHub Desktop.
Codenames Blog
**.bin
.ipynb_checkpoints
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll use the python library gensim: https://radimrehurek.com/gensim/"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import gensim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load a premade word2vec model built on Google News articles.\n",
"\n",
"Download from: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True, limit=500000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's an example Codenames board. `positive` is one team's words, `negative` the other and `assassin` is the assassin word."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"board = {\n",
" 'positive': ['ambulance', 'hospital', 'spell', 'lock', 'charge', 'tail', 'link', 'cook', 'web'],\n",
" 'negative': ['cat', 'button', 'pipe', 'pants', 'mount', 'sleep', 'stick', 'file', 'worm'],\n",
" 'assassin': 'doctor'\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use gensim to find the 10 words most related to 'ambulance' in this word2vec model."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('paramedics', 0.7590752243995667),\n",
" ('ambulances', 0.7493595480918884),\n",
" ('Ambulance', 0.7236292362213135),\n",
" ('paramedic', 0.662133514881134),\n",
" ('Ambulance_paramedics', 0.6315338611602783),\n",
" ('Ambulances', 0.6211477518081665),\n",
" ('LifeFlight_helicopter', 0.6147335171699524),\n",
" ('hospital', 0.6099206209182739),\n",
" ('Paramedics', 0.6081751585006714),\n",
" ('Ambulance_Service', 0.6080097556114197)]"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.similar_by_word('ambulance', topn=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some of these words word be useful, \"parametics\" for instance, but many are just other forms of the word \"ambulance.\"\n",
"\n",
"gensim allows us to directly find words the most similar to a whole group of words at one time."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('%_#F########_3v.jsn', 0.5153687000274658),\n",
" ('By_TBT_staff', 0.4811619818210602),\n",
" ('By_HARVEY_SIMPSON', 0.47336331009864807),\n",
" ('try_resubmitting', 0.46592575311660767),\n",
" ('By_Salvatore_Landolina', 0.4655460715293884),\n",
" ('By_Jason_Kaneshiro', 0.4612027108669281),\n",
" ('%_#F########_2v.jsn', 0.45537447929382324),\n",
" ('%_#F########_1v.jsn', 0.4508393406867981),\n",
" ('BY_VINCENT_MAO', 0.4498888850212097),\n",
" ('Visit_BBC_Webwise', 0.4431522786617279)]"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.most_similar(positive=board['positive'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, it produces a lot of nonsense words. We can use `restrict_vocab` to limit results to only the top n most common words in the corpus."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('For_Restrictions', 0.43488097190856934),\n",
" ('bed', 0.39588358998298645),\n",
" ('links', 0.38411831855773926),\n",
" ('hook', 0.38367366790771484),\n",
" ('paramedics', 0.38072746992111206),\n",
" ('emergency', 0.37950167059898376),\n",
" ('jail', 0.3759669065475464),\n",
" ('log', 0.37062549591064453),\n",
" ('intensive_care', 0.3661930561065674),\n",
" ('call', 0.36543411016464233),\n",
" ('webpage', 0.3649423122406006),\n",
" ('tow_truck', 0.3592333197593689),\n",
" ('click', 0.35906946659088135),\n",
" ('cooked', 0.3552851676940918),\n",
" ('care', 0.3537469208240509),\n",
" ('handcuff', 0.35027384757995605),\n",
" ('then', 0.34921103715896606),\n",
" ('stay', 0.3478427529335022),\n",
" ('turn', 0.34607696533203125),\n",
" ('bookmark', 0.3458564579486847)]"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.most_similar(\n",
" positive=board['positive'],\n",
" restrict_vocab=50000,\n",
" topn=20\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This looks much better, and produces some decent clues. \n",
"* \"bed\", \"paramedics\", \"emergency\" all relate to \"ambulance\" and \"hospital.\" \n",
"* \"jail\" could relate to \"lock\" and \"charge.\" \n",
"* \"click\" to \"web\" and \"link.\"\n",
"\n",
"But \"bed\" would also relate to the other team's word \"sleep\" and \"click\" with \"button.\"\n",
"\n",
"We can also include `negative` words so that we'd avoid words that are the other team's."
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('Hospital', 0.27265793085098267),\n",
" ('ambulances', 0.2605472207069397),\n",
" ('hospitals', 0.24624229967594147),\n",
" ('outpatient', 0.24339225888252258),\n",
" ('inpatient', 0.2404019981622696),\n",
" ('paramedics', 0.23482689261436462),\n",
" ('escort', 0.23161748051643372),\n",
" ('Partnerships', 0.23104971647262573),\n",
" ('Medical_Center', 0.2306305170059204),\n",
" ('telemedicine', 0.22638411819934845)]"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.most_similar(\n",
" positive=board['positive'],\n",
" negative=board['negative'],\n",
" restrict_vocab=50000\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I really like the clue \"telemedicine.\" It's non-obvious, but relates to four words: \"web,\" \"link,\" \"ambulance\" and \"hospital.\" This shows the potential for this method to produce novel clues.\n",
"\n",
"Let's say that the clue were \"telemedicine\" and the four words were removed from the board, then the next team got a turn. What might their clues be?"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('pillow', 0.43686941266059875),\n",
" ('bra', 0.3842337727546692),\n",
" ('couch', 0.38342970609664917),\n",
" ('tub', 0.37922778725624084),\n",
" ('closet', 0.36959999799728394),\n",
" ('sofa', 0.36713898181915283),\n",
" ('bathroom', 0.366258829832077),\n",
" ('bed', 0.36348700523376465),\n",
" ('crotch', 0.36245280504226685),\n",
" ('spoon', 0.36179912090301514)]"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"board = {\n",
" 'positive': ['spell', 'lock', 'charge', 'tail', 'link'],\n",
" 'negative': ['cat', 'button', 'pipe', 'pants', 'mount', 'sleep', 'stick', 'file', 'worm'],\n",
" 'assassin': 'doctor'\n",
"}\n",
"\n",
"model.most_similar(\n",
" positive=board['negative'],\n",
" negative=board['positive'],\n",
" restrict_vocab=50000\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This appears much less successful. The top words mostly just seem to relate to a singe word:\n",
"* pillow -> sleep\n",
"* bra -> pants\n",
"* couch -> sleep? cat?"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('Partnerships', 0.19860073924064636),\n",
" ('partnership', 0.1707054078578949),\n",
" ('Affiliates', 0.1595458686351776),\n",
" ('Partnership', 0.1545657068490982),\n",
" ('spells', 0.15078961849212646),\n",
" ('signing', 0.15013918280601501),\n",
" ('guiding', 0.14804501831531525),\n",
" ('reserve', 0.14592880010604858),\n",
" ('tutelage', 0.1441878080368042),\n",
" ('plundered', 0.14354334771633148)]"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.most_similar(\n",
" positive=board['positive'],\n",
" negative=board['negative'],\n",
" restrict_vocab=50000\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('pillow', 0.4124183654785156),\n",
" ('bra', 0.38460129499435425),\n",
" ('crotch', 0.3793488144874573),\n",
" ('buttons', 0.37311607599258423),\n",
" ('couch', 0.36672115325927734),\n",
" ('strap', 0.35612520575523376),\n",
" ('backpack', 0.3539729714393616),\n",
" ('mailbox', 0.351298451423645),\n",
" ('bug', 0.35095250606536865),\n",
" ('files', 0.34630095958709717)]"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.most_similar(\n",
" positive=board['negative'],\n",
" negative=board['positive'],\n",
" restrict_vocab=50000\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('Partnerships', 0.24295198917388916),\n",
" ('Ashford', 0.24151912331581116),\n",
" ('Procurement', 0.211196631193161),\n",
" ('Partnership', 0.20416252315044403),\n",
" ('booking', 0.199225515127182),\n",
" ('Affiliates', 0.19800953567028046),\n",
" ('Interchange', 0.19791358709335327),\n",
" ('service', 0.1973651796579361),\n",
" ('ambulances', 0.19639825820922852),\n",
" ('reserve', 0.19472390413284302)]"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We can use `restrict_vocab` to limit results to only the top n most common words\n",
"model.most_similar(\n",
" positive=board['positive'],\n",
" negative=board['negative'] + [board['assassin']],\n",
" restrict_vocab=50000\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('downward_spiral', 0.2583349347114563),\n",
" ('dip', 0.24365955591201782),\n",
" ('tumble', 0.23384986817836761),\n",
" ('drop', 0.2305331975221634),\n",
" ('byproduct', 0.22404420375823975),\n",
" ('Decline', 0.22345377504825592),\n",
" ('report', 0.21692954003810883),\n",
" ('paragraph', 0.21681174635887146),\n",
" ('Fall', 0.21671724319458008),\n",
" ('spring', 0.2162589132785797)]"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.most_similar(\n",
" positive=[mapping[w] for w in board['positive']], \n",
" negative=[mapping[w] for w in (board['negative'] + [board['assassin']])],\n",
" topn=10, restrict_vocab=50000\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Problems:\n",
"# Not capturing words that match well with multiple positive words. One good connection overpowers the rest\n",
"# word parts\n",
"# Obscure words\n",
"\n",
"# Idea: \n",
"# Run similarites between individual words and then combine the lists. Words that appear the most often are good candidiates"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>sum</th>\n",
" <th>std</th>\n",
" </tr>\n",
" <tr>\n",
" <th>word</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>flatten</th>\n",
" <td>4.0</td>\n",
" <td>0.236673</td>\n",
" <td>0.946692</td>\n",
" <td>0.079847</td>\n",
" </tr>\n",
" <tr>\n",
" <th>roll</th>\n",
" <td>4.0</td>\n",
" <td>0.228831</td>\n",
" <td>0.915322</td>\n",
" <td>0.047536</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Thursday</th>\n",
" <td>4.0</td>\n",
" <td>0.223386</td>\n",
" <td>0.893544</td>\n",
" <td>0.035845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wednesday</th>\n",
" <td>4.0</td>\n",
" <td>0.220163</td>\n",
" <td>0.880650</td>\n",
" <td>0.032037</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Monday</th>\n",
" <td>4.0</td>\n",
" <td>0.217271</td>\n",
" <td>0.869085</td>\n",
" <td>0.037886</td>\n",
" </tr>\n",
" <tr>\n",
" <th>downwards</th>\n",
" <td>4.0</td>\n",
" <td>0.209756</td>\n",
" <td>0.839022</td>\n",
" <td>0.014385</td>\n",
" </tr>\n",
" <tr>\n",
" <th>surge</th>\n",
" <td>3.0</td>\n",
" <td>0.302306</td>\n",
" <td>0.906918</td>\n",
" <td>0.076502</td>\n",
" </tr>\n",
" <tr>\n",
" <th>spiral</th>\n",
" <td>3.0</td>\n",
" <td>0.295132</td>\n",
" <td>0.885395</td>\n",
" <td>0.052286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>slumps</th>\n",
" <td>3.0</td>\n",
" <td>0.289296</td>\n",
" <td>0.867887</td>\n",
" <td>0.094156</td>\n",
" </tr>\n",
" <tr>\n",
" <th>just</th>\n",
" <td>3.0</td>\n",
" <td>0.281075</td>\n",
" <td>0.843225</td>\n",
" <td>0.127939</td>\n",
" </tr>\n",
" <tr>\n",
" <th>statement</th>\n",
" <td>3.0</td>\n",
" <td>0.280905</td>\n",
" <td>0.842714</td>\n",
" <td>0.086204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>push</th>\n",
" <td>3.0</td>\n",
" <td>0.279088</td>\n",
" <td>0.837264</td>\n",
" <td>0.076753</td>\n",
" </tr>\n",
" <tr>\n",
" <th>wave</th>\n",
" <td>3.0</td>\n",
" <td>0.255928</td>\n",
" <td>0.767784</td>\n",
" <td>0.067337</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tailspin</th>\n",
" <td>3.0</td>\n",
" <td>0.255911</td>\n",
" <td>0.767734</td>\n",
" <td>0.052425</td>\n",
" </tr>\n",
" <tr>\n",
" <th>NYT</th>\n",
" <td>3.0</td>\n",
" <td>0.254757</td>\n",
" <td>0.764271</td>\n",
" <td>0.090522</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freefall</th>\n",
" <td>3.0</td>\n",
" <td>0.254118</td>\n",
" <td>0.762353</td>\n",
" <td>0.091539</td>\n",
" </tr>\n",
" <tr>\n",
" <th>happens</th>\n",
" <td>3.0</td>\n",
" <td>0.253574</td>\n",
" <td>0.760722</td>\n",
" <td>0.027510</td>\n",
" </tr>\n",
" <tr>\n",
" <th>trend</th>\n",
" <td>3.0</td>\n",
" <td>0.252779</td>\n",
" <td>0.758338</td>\n",
" <td>0.071111</td>\n",
" </tr>\n",
" <tr>\n",
" <th>laughed</th>\n",
" <td>3.0</td>\n",
" <td>0.249634</td>\n",
" <td>0.748903</td>\n",
" <td>0.072787</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wall_Street_Journal</th>\n",
" <td>3.0</td>\n",
" <td>0.247608</td>\n",
" <td>0.742825</td>\n",
" <td>0.088031</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ascent</th>\n",
" <td>3.0</td>\n",
" <td>0.246717</td>\n",
" <td>0.740150</td>\n",
" <td>0.039043</td>\n",
" </tr>\n",
" <tr>\n",
" <th>graph</th>\n",
" <td>3.0</td>\n",
" <td>0.245881</td>\n",
" <td>0.737642</td>\n",
" <td>0.036799</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mentality</th>\n",
" <td>3.0</td>\n",
" <td>0.244612</td>\n",
" <td>0.733837</td>\n",
" <td>0.071997</td>\n",
" </tr>\n",
" <tr>\n",
" <th>day</th>\n",
" <td>3.0</td>\n",
" <td>0.242762</td>\n",
" <td>0.728287</td>\n",
" <td>0.036740</td>\n",
" </tr>\n",
" <tr>\n",
" <th>inertia</th>\n",
" <td>3.0</td>\n",
" <td>0.241835</td>\n",
" <td>0.725505</td>\n",
" <td>0.045367</td>\n",
" </tr>\n",
" <tr>\n",
" <th>catapult</th>\n",
" <td>3.0</td>\n",
" <td>0.240552</td>\n",
" <td>0.721657</td>\n",
" <td>0.024719</td>\n",
" </tr>\n",
" <tr>\n",
" <th>depress</th>\n",
" <td>3.0</td>\n",
" <td>0.240457</td>\n",
" <td>0.721371</td>\n",
" <td>0.065503</td>\n",
" </tr>\n",
" <tr>\n",
" <th>spear</th>\n",
" <td>3.0</td>\n",
" <td>0.239582</td>\n",
" <td>0.718747</td>\n",
" <td>0.009128</td>\n",
" </tr>\n",
" <tr>\n",
" <th>if</th>\n",
" <td>3.0</td>\n",
" <td>0.238999</td>\n",
" <td>0.716997</td>\n",
" <td>0.028044</td>\n",
" </tr>\n",
" <tr>\n",
" <th>equation</th>\n",
" <td>3.0</td>\n",
" <td>0.237353</td>\n",
" <td>0.712060</td>\n",
" <td>0.008637</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MacIntyre</th>\n",
" <td>1.0</td>\n",
" <td>0.171345</td>\n",
" <td>0.171345</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Rick_Carlisle</th>\n",
" <td>1.0</td>\n",
" <td>0.171174</td>\n",
" <td>0.171174</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Seifert</th>\n",
" <td>1.0</td>\n",
" <td>0.171157</td>\n",
" <td>0.171157</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>thumped</th>\n",
" <td>1.0</td>\n",
" <td>0.171149</td>\n",
" <td>0.171149</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>shoe</th>\n",
" <td>1.0</td>\n",
" <td>0.171143</td>\n",
" <td>0.171143</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>occupant</th>\n",
" <td>1.0</td>\n",
" <td>0.171142</td>\n",
" <td>0.171142</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>knight</th>\n",
" <td>1.0</td>\n",
" <td>0.171085</td>\n",
" <td>0.171085</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>medic</th>\n",
" <td>1.0</td>\n",
" <td>0.171074</td>\n",
" <td>0.171074</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>consults</th>\n",
" <td>1.0</td>\n",
" <td>0.170966</td>\n",
" <td>0.170966</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>champs</th>\n",
" <td>1.0</td>\n",
" <td>0.170955</td>\n",
" <td>0.170955</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lice</th>\n",
" <td>1.0</td>\n",
" <td>0.170927</td>\n",
" <td>0.170927</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>toothless</th>\n",
" <td>1.0</td>\n",
" <td>0.170917</td>\n",
" <td>0.170917</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>elects</th>\n",
" <td>1.0</td>\n",
" <td>0.170865</td>\n",
" <td>0.170865</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Matta</th>\n",
" <td>1.0</td>\n",
" <td>0.170856</td>\n",
" <td>0.170856</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ugly</th>\n",
" <td>1.0</td>\n",
" <td>0.170833</td>\n",
" <td>0.170833</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>knife</th>\n",
" <td>1.0</td>\n",
" <td>0.170772</td>\n",
" <td>0.170772</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>arbiter</th>\n",
" <td>1.0</td>\n",
" <td>0.170728</td>\n",
" <td>0.170728</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>motionless</th>\n",
" <td>1.0</td>\n",
" <td>0.170710</td>\n",
" <td>0.170710</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>pinning</th>\n",
" <td>1.0</td>\n",
" <td>0.170696</td>\n",
" <td>0.170696</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>decides</th>\n",
" <td>1.0</td>\n",
" <td>0.170694</td>\n",
" <td>0.170694</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sergei</th>\n",
" <td>1.0</td>\n",
" <td>0.170564</td>\n",
" <td>0.170564</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>co_ordinate</th>\n",
" <td>1.0</td>\n",
" <td>0.170548</td>\n",
" <td>0.170548</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bangs</th>\n",
" <td>1.0</td>\n",
" <td>0.170530</td>\n",
" <td>0.170530</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>laurels</th>\n",
" <td>1.0</td>\n",
" <td>0.170458</td>\n",
" <td>0.170458</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>notches</th>\n",
" <td>1.0</td>\n",
" <td>0.170422</td>\n",
" <td>0.170422</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sprain</th>\n",
" <td>1.0</td>\n",
" <td>0.170372</td>\n",
" <td>0.170372</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>acumen</th>\n",
" <td>1.0</td>\n",
" <td>0.170307</td>\n",
" <td>0.170307</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>complacent</th>\n",
" <td>1.0</td>\n",
" <td>0.170185</td>\n",
" <td>0.170185</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>For_Restrictions</th>\n",
" <td>1.0</td>\n",
" <td>0.170162</td>\n",
" <td>0.170162</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>swords</th>\n",
" <td>1.0</td>\n",
" <td>0.170084</td>\n",
" <td>0.170084</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8125 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" count mean sum std\n",
"word \n",
"flatten 4.0 0.236673 0.946692 0.079847\n",
"roll 4.0 0.228831 0.915322 0.047536\n",
"Thursday 4.0 0.223386 0.893544 0.035845\n",
"Wednesday 4.0 0.220163 0.880650 0.032037\n",
"Monday 4.0 0.217271 0.869085 0.037886\n",
"downwards 4.0 0.209756 0.839022 0.014385\n",
"surge 3.0 0.302306 0.906918 0.076502\n",
"spiral 3.0 0.295132 0.885395 0.052286\n",
"slumps 3.0 0.289296 0.867887 0.094156\n",
"just 3.0 0.281075 0.843225 0.127939\n",
"statement 3.0 0.280905 0.842714 0.086204\n",
"push 3.0 0.279088 0.837264 0.076753\n",
"wave 3.0 0.255928 0.767784 0.067337\n",
"tailspin 3.0 0.255911 0.767734 0.052425\n",
"NYT 3.0 0.254757 0.764271 0.090522\n",
"freefall 3.0 0.254118 0.762353 0.091539\n",
"happens 3.0 0.253574 0.760722 0.027510\n",
"trend 3.0 0.252779 0.758338 0.071111\n",
"laughed 3.0 0.249634 0.748903 0.072787\n",
"Wall_Street_Journal 3.0 0.247608 0.742825 0.088031\n",
"ascent 3.0 0.246717 0.740150 0.039043\n",
"graph 3.0 0.245881 0.737642 0.036799\n",
"mentality 3.0 0.244612 0.733837 0.071997\n",
"day 3.0 0.242762 0.728287 0.036740\n",
"inertia 3.0 0.241835 0.725505 0.045367\n",
"catapult 3.0 0.240552 0.721657 0.024719\n",
"depress 3.0 0.240457 0.721371 0.065503\n",
"spear 3.0 0.239582 0.718747 0.009128\n",
"if 3.0 0.238999 0.716997 0.028044\n",
"equation 3.0 0.237353 0.712060 0.008637\n",
"... ... ... ... ...\n",
"MacIntyre 1.0 0.171345 0.171345 NaN\n",
"Rick_Carlisle 1.0 0.171174 0.171174 NaN\n",
"Seifert 1.0 0.171157 0.171157 NaN\n",
"thumped 1.0 0.171149 0.171149 NaN\n",
"shoe 1.0 0.171143 0.171143 NaN\n",
"occupant 1.0 0.171142 0.171142 NaN\n",
"knight 1.0 0.171085 0.171085 NaN\n",
"medic 1.0 0.171074 0.171074 NaN\n",
"consults 1.0 0.170966 0.170966 NaN\n",
"champs 1.0 0.170955 0.170955 NaN\n",
"lice 1.0 0.170927 0.170927 NaN\n",
"toothless 1.0 0.170917 0.170917 NaN\n",
"elects 1.0 0.170865 0.170865 NaN\n",
"Matta 1.0 0.170856 0.170856 NaN\n",
"ugly 1.0 0.170833 0.170833 NaN\n",
"knife 1.0 0.170772 0.170772 NaN\n",
"arbiter 1.0 0.170728 0.170728 NaN\n",
"motionless 1.0 0.170710 0.170710 NaN\n",
"pinning 1.0 0.170696 0.170696 NaN\n",
"decides 1.0 0.170694 0.170694 NaN\n",
"Sergei 1.0 0.170564 0.170564 NaN\n",
"co_ordinate 1.0 0.170548 0.170548 NaN\n",
"bangs 1.0 0.170530 0.170530 NaN\n",
"laurels 1.0 0.170458 0.170458 NaN\n",
"notches 1.0 0.170422 0.170422 NaN\n",
"sprain 1.0 0.170372 0.170372 NaN\n",
"acumen 1.0 0.170307 0.170307 NaN\n",
"complacent 1.0 0.170185 0.170185 NaN\n",
"For_Restrictions 1.0 0.170162 0.170162 NaN\n",
"swords 1.0 0.170084 0.170084 NaN\n",
"\n",
"[8125 rows x 4 columns]"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"dfs = []\n",
"for w in board['positive']:\n",
" dfs.append(pd.DataFrame.from_records(model.similar_by_word(mapping[w], topn=1000, restrict_vocab=50000), columns=['word', 'similarity']))\n",
"#model.similar_by_word('head', topn=1000)\n",
"\n",
"df = pd.concat(dfs)\n",
"\n",
"df.groupby('word').agg(['count', 'mean', 'sum', 'std']).T.reset_index(level=0, drop=True).T.sort_values(['count', 'mean'], ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>sum</th>\n",
" <th>std</th>\n",
" </tr>\n",
" <tr>\n",
" <th>word</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>beam</th>\n",
" <td>4.0</td>\n",
" <td>0.259435</td>\n",
" <td>1.037739</td>\n",
" <td>0.055484</td>\n",
" </tr>\n",
" <tr>\n",
" <th>robotic</th>\n",
" <td>3.0</td>\n",
" <td>0.438389</td>\n",
" <td>1.315166</td>\n",
" <td>0.325974</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Great_Wall</th>\n",
" <td>3.0</td>\n",
" <td>0.308924</td>\n",
" <td>0.926773</td>\n",
" <td>0.109942</td>\n",
" </tr>\n",
" <tr>\n",
" <th>observatory</th>\n",
" <td>3.0</td>\n",
" <td>0.304838</td>\n",
" <td>0.914514</td>\n",
" <td>0.119003</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Kong</th>\n",
" <td>3.0</td>\n",
" <td>0.294825</td>\n",
" <td>0.884476</td>\n",
" <td>0.048162</td>\n",
" </tr>\n",
" <tr>\n",
" <th>skull</th>\n",
" <td>3.0</td>\n",
" <td>0.289561</td>\n",
" <td>0.868683</td>\n",
" <td>0.029377</td>\n",
" </tr>\n",
" <tr>\n",
" <th>rope</th>\n",
" <td>3.0</td>\n",
" <td>0.282066</td>\n",
" <td>0.846198</td>\n",
" <td>0.051396</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bat</th>\n",
" <td>3.0</td>\n",
" <td>0.279069</td>\n",
" <td>0.837208</td>\n",
" <td>0.082658</td>\n",
" </tr>\n",
" <tr>\n",
" <th>backboard</th>\n",
" <td>3.0</td>\n",
" <td>0.274719</td>\n",
" <td>0.824156</td>\n",
" <td>0.058485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>helmet</th>\n",
" <td>3.0</td>\n",
" <td>0.273247</td>\n",
" <td>0.819742</td>\n",
" <td>0.055086</td>\n",
" </tr>\n",
" <tr>\n",
" <th>beams</th>\n",
" <td>3.0</td>\n",
" <td>0.272344</td>\n",
" <td>0.817032</td>\n",
" <td>0.066079</td>\n",
" </tr>\n",
" <tr>\n",
" <th>spider</th>\n",
" <td>3.0</td>\n",
" <td>0.271149</td>\n",
" <td>0.813448</td>\n",
" <td>0.107322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>dummy</th>\n",
" <td>3.0</td>\n",
" <td>0.270908</td>\n",
" <td>0.812724</td>\n",
" <td>0.035488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>space</th>\n",
" <td>3.0</td>\n",
" <td>0.266982</td>\n",
" <td>0.800947</td>\n",
" <td>0.043191</td>\n",
" </tr>\n",
" <tr>\n",
" <th>laser</th>\n",
" <td>3.0</td>\n",
" <td>0.264970</td>\n",
" <td>0.794909</td>\n",
" <td>0.047325</td>\n",
" </tr>\n",
" <tr>\n",
" <th>device</th>\n",
" <td>3.0</td>\n",
" <td>0.264215</td>\n",
" <td>0.792644</td>\n",
" <td>0.115501</td>\n",
" </tr>\n",
" <tr>\n",
" <th>crane</th>\n",
" <td>3.0</td>\n",
" <td>0.257942</td>\n",
" <td>0.773826</td>\n",
" <td>0.091387</td>\n",
" </tr>\n",
" <tr>\n",
" <th>balcony</th>\n",
" <td>3.0</td>\n",
" <td>0.256093</td>\n",
" <td>0.768280</td>\n",
" <td>0.115100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cage</th>\n",
" <td>3.0</td>\n",
" <td>0.253862</td>\n",
" <td>0.761586</td>\n",
" <td>0.057666</td>\n",
" </tr>\n",
" <tr>\n",
" <th>canister</th>\n",
" <td>3.0</td>\n",
" <td>0.252995</td>\n",
" <td>0.758986</td>\n",
" <td>0.044794</td>\n",
" </tr>\n",
" <tr>\n",
" <th>penis</th>\n",
" <td>3.0</td>\n",
" <td>0.249482</td>\n",
" <td>0.748447</td>\n",
" <td>0.027952</td>\n",
" </tr>\n",
" <tr>\n",
" <th>elevator</th>\n",
" <td>3.0</td>\n",
" <td>0.249475</td>\n",
" <td>0.748424</td>\n",
" <td>0.079156</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Doctor_Who</th>\n",
" <td>3.0</td>\n",
" <td>0.244777</td>\n",
" <td>0.734330</td>\n",
" <td>0.055631</td>\n",
" </tr>\n",
" <tr>\n",
" <th>headset</th>\n",
" <td>3.0</td>\n",
" <td>0.241369</td>\n",
" <td>0.724106</td>\n",
" <td>0.065166</td>\n",
" </tr>\n",
" <tr>\n",
" <th>arrow</th>\n",
" <td>3.0</td>\n",
" <td>0.240591</td>\n",
" <td>0.721774</td>\n",
" <td>0.040477</td>\n",
" </tr>\n",
" <tr>\n",
" <th>steering_wheel</th>\n",
" <td>3.0</td>\n",
" <td>0.240482</td>\n",
" <td>0.721445</td>\n",
" <td>0.023518</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LCD_screen</th>\n",
" <td>3.0</td>\n",
" <td>0.240454</td>\n",
" <td>0.721361</td>\n",
" <td>0.039021</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dragon</th>\n",
" <td>3.0</td>\n",
" <td>0.240302</td>\n",
" <td>0.720905</td>\n",
" <td>0.010778</td>\n",
" </tr>\n",
" <tr>\n",
" <th>eyeball</th>\n",
" <td>3.0</td>\n",
" <td>0.240087</td>\n",
" <td>0.720261</td>\n",
" <td>0.008821</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hilltop</th>\n",
" <td>3.0</td>\n",
" <td>0.239593</td>\n",
" <td>0.718780</td>\n",
" <td>0.025651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>wallet</th>\n",
" <td>1.0</td>\n",
" <td>0.177335</td>\n",
" <td>0.177335</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Capital</th>\n",
" <td>1.0</td>\n",
" <td>0.177247</td>\n",
" <td>0.177247</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>slant</th>\n",
" <td>1.0</td>\n",
" <td>0.177133</td>\n",
" <td>0.177133</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>soles</th>\n",
" <td>1.0</td>\n",
" <td>0.177132</td>\n",
" <td>0.177132</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>repeatedly</th>\n",
" <td>1.0</td>\n",
" <td>0.177131</td>\n",
" <td>0.177131</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>He'sa</th>\n",
" <td>1.0</td>\n",
" <td>0.177035</td>\n",
" <td>0.177035</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>little</th>\n",
" <td>1.0</td>\n",
" <td>0.176945</td>\n",
" <td>0.176945</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>independently</th>\n",
" <td>1.0</td>\n",
" <td>0.176882</td>\n",
" <td>0.176882</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>grasped</th>\n",
" <td>1.0</td>\n",
" <td>0.176845</td>\n",
" <td>0.176845</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>skills</th>\n",
" <td>1.0</td>\n",
" <td>0.176832</td>\n",
" <td>0.176832</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sphere</th>\n",
" <td>1.0</td>\n",
" <td>0.176820</td>\n",
" <td>0.176820</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cain</th>\n",
" <td>1.0</td>\n",
" <td>0.176809</td>\n",
" <td>0.176809</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>deflect</th>\n",
" <td>1.0</td>\n",
" <td>0.176771</td>\n",
" <td>0.176771</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bit</th>\n",
" <td>1.0</td>\n",
" <td>0.176725</td>\n",
" <td>0.176725</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>chunks</th>\n",
" <td>1.0</td>\n",
" <td>0.176717</td>\n",
" <td>0.176717</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>barbs</th>\n",
" <td>1.0</td>\n",
" <td>0.176678</td>\n",
" <td>0.176678</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gravely</th>\n",
" <td>1.0</td>\n",
" <td>0.176582</td>\n",
" <td>0.176582</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>perspective</th>\n",
" <td>1.0</td>\n",
" <td>0.176530</td>\n",
" <td>0.176530</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sleeping_bag</th>\n",
" <td>1.0</td>\n",
" <td>0.176454</td>\n",
" <td>0.176454</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>leaped</th>\n",
" <td>1.0</td>\n",
" <td>0.176303</td>\n",
" <td>0.176303</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>handcuffed</th>\n",
" <td>1.0</td>\n",
" <td>0.176248</td>\n",
" <td>0.176248</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>pockets</th>\n",
" <td>1.0</td>\n",
" <td>0.176243</td>\n",
" <td>0.176243</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>fragment</th>\n",
" <td>1.0</td>\n",
" <td>0.176235</td>\n",
" <td>0.176235</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>apex</th>\n",
" <td>1.0</td>\n",
" <td>0.176228</td>\n",
" <td>0.176228</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>looked</th>\n",
" <td>1.0</td>\n",
" <td>0.176170</td>\n",
" <td>0.176170</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gun</th>\n",
" <td>1.0</td>\n",
" <td>0.176114</td>\n",
" <td>0.176114</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lift</th>\n",
" <td>1.0</td>\n",
" <td>0.176109</td>\n",
" <td>0.176109</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Rhino</th>\n",
" <td>1.0</td>\n",
" <td>0.176108</td>\n",
" <td>0.176108</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>booster</th>\n",
" <td>1.0</td>\n",
" <td>0.176050</td>\n",
" <td>0.176050</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>luckily</th>\n",
" <td>1.0</td>\n",
" <td>0.175979</td>\n",
" <td>0.175979</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7189 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" count mean sum std\n",
"word \n",
"beam 4.0 0.259435 1.037739 0.055484\n",
"robotic 3.0 0.438389 1.315166 0.325974\n",
"Great_Wall 3.0 0.308924 0.926773 0.109942\n",
"observatory 3.0 0.304838 0.914514 0.119003\n",
"Kong 3.0 0.294825 0.884476 0.048162\n",
"skull 3.0 0.289561 0.868683 0.029377\n",
"rope 3.0 0.282066 0.846198 0.051396\n",
"bat 3.0 0.279069 0.837208 0.082658\n",
"backboard 3.0 0.274719 0.824156 0.058485\n",
"helmet 3.0 0.273247 0.819742 0.055086\n",
"beams 3.0 0.272344 0.817032 0.066079\n",
"spider 3.0 0.271149 0.813448 0.107322\n",
"dummy 3.0 0.270908 0.812724 0.035488\n",
"space 3.0 0.266982 0.800947 0.043191\n",
"laser 3.0 0.264970 0.794909 0.047325\n",
"device 3.0 0.264215 0.792644 0.115501\n",
"crane 3.0 0.257942 0.773826 0.091387\n",
"balcony 3.0 0.256093 0.768280 0.115100\n",
"cage 3.0 0.253862 0.761586 0.057666\n",
"canister 3.0 0.252995 0.758986 0.044794\n",
"penis 3.0 0.249482 0.748447 0.027952\n",
"elevator 3.0 0.249475 0.748424 0.079156\n",
"Doctor_Who 3.0 0.244777 0.734330 0.055631\n",
"headset 3.0 0.241369 0.724106 0.065166\n",
"arrow 3.0 0.240591 0.721774 0.040477\n",
"steering_wheel 3.0 0.240482 0.721445 0.023518\n",
"LCD_screen 3.0 0.240454 0.721361 0.039021\n",
"Dragon 3.0 0.240302 0.720905 0.010778\n",
"eyeball 3.0 0.240087 0.720261 0.008821\n",
"hilltop 3.0 0.239593 0.718780 0.025651\n",
"... ... ... ... ...\n",
"wallet 1.0 0.177335 0.177335 NaN\n",
"Capital 1.0 0.177247 0.177247 NaN\n",
"slant 1.0 0.177133 0.177133 NaN\n",
"soles 1.0 0.177132 0.177132 NaN\n",
"repeatedly 1.0 0.177131 0.177131 NaN\n",
"He'sa 1.0 0.177035 0.177035 NaN\n",
"little 1.0 0.176945 0.176945 NaN\n",
"independently 1.0 0.176882 0.176882 NaN\n",
"grasped 1.0 0.176845 0.176845 NaN\n",
"skills 1.0 0.176832 0.176832 NaN\n",
"sphere 1.0 0.176820 0.176820 NaN\n",
"Cain 1.0 0.176809 0.176809 NaN\n",
"deflect 1.0 0.176771 0.176771 NaN\n",
"bit 1.0 0.176725 0.176725 NaN\n",
"chunks 1.0 0.176717 0.176717 NaN\n",
"barbs 1.0 0.176678 0.176678 NaN\n",
"gravely 1.0 0.176582 0.176582 NaN\n",
"perspective 1.0 0.176530 0.176530 NaN\n",
"sleeping_bag 1.0 0.176454 0.176454 NaN\n",
"leaped 1.0 0.176303 0.176303 NaN\n",
"handcuffed 1.0 0.176248 0.176248 NaN\n",
"pockets 1.0 0.176243 0.176243 NaN\n",
"fragment 1.0 0.176235 0.176235 NaN\n",
"apex 1.0 0.176228 0.176228 NaN\n",
"looked 1.0 0.176170 0.176170 NaN\n",
"gun 1.0 0.176114 0.176114 NaN\n",
"lift 1.0 0.176109 0.176109 NaN\n",
"Rhino 1.0 0.176108 0.176108 NaN\n",
"booster 1.0 0.176050 0.176050 NaN\n",
"luckily 1.0 0.175979 0.175979 NaN\n",
"\n",
"[7189 rows x 4 columns]"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"dfs = []\n",
"for w in board['negative']:\n",
" dfs.append(pd.DataFrame.from_records(model.similar_by_word(mapping[w], topn=1000, restrict_vocab=50000), columns=['word', 'similarity']))\n",
"#model.similar_by_word('head', topn=1000)\n",
"\n",
"df = pd.concat(dfs)\n",
"\n",
"df.groupby('word').agg(['count', 'mean', 'sum', 'std']).T.reset_index(level=0, drop=True).T.sort_values(['count', 'mean'], ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
import random
def get_board():
cards = []
with open('../codenames_cards.txt', 'r') as f:
for r in f.readlines():
cards.append(r.lower().strip().replace(' ', '_'))
sample = random.sample(cards, 18)
return {
'positive': sample[:9],
'negative': sample[9:-1],
'assassin': sample[-1]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment