Last active
December 18, 2017 23:48
-
-
Save docmarionum1/ce6c10c75c521cfec36c5fe86e2b6f64 to your computer and use it in GitHub Desktop.
Codenames Blog
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**.bin | |
.ipynb_checkpoints |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll use the python library gensim: https://radimrehurek.com/gensim/" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 69, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import gensim" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Load a premade word2vec model built on Google News articles.\n", | |
"\n", | |
"Download from: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 70, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True, limit=500000)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Here's an example Codenames board. `positive` is one team's words, `negative` the other and `assassin` is the assassin word." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 79, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"board = {\n", | |
" 'positive': ['ambulance', 'hospital', 'spell', 'lock', 'charge', 'tail', 'link', 'cook', 'web'],\n", | |
" 'negative': ['cat', 'button', 'pipe', 'pants', 'mount', 'sleep', 'stick', 'file', 'worm'],\n", | |
" 'assassin': 'doctor'\n", | |
"}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can use gensim to find the 10 words most related to 'ambulance' in this word2vec model." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 55, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('paramedics', 0.7590752243995667),\n", | |
" ('ambulances', 0.7493595480918884),\n", | |
" ('Ambulance', 0.7236292362213135),\n", | |
" ('paramedic', 0.662133514881134),\n", | |
" ('Ambulance_paramedics', 0.6315338611602783),\n", | |
" ('Ambulances', 0.6211477518081665),\n", | |
" ('LifeFlight_helicopter', 0.6147335171699524),\n", | |
" ('hospital', 0.6099206209182739),\n", | |
" ('Paramedics', 0.6081751585006714),\n", | |
" ('Ambulance_Service', 0.6080097556114197)]" | |
] | |
}, | |
"execution_count": 55, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.similar_by_word('ambulance', topn=10)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Some of these words word be useful, \"parametics\" for instance, but many are just other forms of the word \"ambulance.\"\n", | |
"\n", | |
"gensim allows us to directly find words the most similar to a whole group of words at one time." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('%_#F########_3v.jsn', 0.5153687000274658),\n", | |
" ('By_TBT_staff', 0.4811619818210602),\n", | |
" ('By_HARVEY_SIMPSON', 0.47336331009864807),\n", | |
" ('try_resubmitting', 0.46592575311660767),\n", | |
" ('By_Salvatore_Landolina', 0.4655460715293884),\n", | |
" ('By_Jason_Kaneshiro', 0.4612027108669281),\n", | |
" ('%_#F########_2v.jsn', 0.45537447929382324),\n", | |
" ('%_#F########_1v.jsn', 0.4508393406867981),\n", | |
" ('BY_VINCENT_MAO', 0.4498888850212097),\n", | |
" ('Visit_BBC_Webwise', 0.4431522786617279)]" | |
] | |
}, | |
"execution_count": 73, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.most_similar(positive=board['positive'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As we can see, it produces a lot of nonsense words. We can use `restrict_vocab` to limit results to only the top n most common words in the corpus." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 74, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('For_Restrictions', 0.43488097190856934),\n", | |
" ('bed', 0.39588358998298645),\n", | |
" ('links', 0.38411831855773926),\n", | |
" ('hook', 0.38367366790771484),\n", | |
" ('paramedics', 0.38072746992111206),\n", | |
" ('emergency', 0.37950167059898376),\n", | |
" ('jail', 0.3759669065475464),\n", | |
" ('log', 0.37062549591064453),\n", | |
" ('intensive_care', 0.3661930561065674),\n", | |
" ('call', 0.36543411016464233),\n", | |
" ('webpage', 0.3649423122406006),\n", | |
" ('tow_truck', 0.3592333197593689),\n", | |
" ('click', 0.35906946659088135),\n", | |
" ('cooked', 0.3552851676940918),\n", | |
" ('care', 0.3537469208240509),\n", | |
" ('handcuff', 0.35027384757995605),\n", | |
" ('then', 0.34921103715896606),\n", | |
" ('stay', 0.3478427529335022),\n", | |
" ('turn', 0.34607696533203125),\n", | |
" ('bookmark', 0.3458564579486847)]" | |
] | |
}, | |
"execution_count": 74, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.most_similar(\n", | |
" positive=board['positive'],\n", | |
" restrict_vocab=50000,\n", | |
" topn=20\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This looks much better, and produces some decent clues. \n", | |
"* \"bed\", \"paramedics\", \"emergency\" all relate to \"ambulance\" and \"hospital.\" \n", | |
"* \"jail\" could relate to \"lock\" and \"charge.\" \n", | |
"* \"click\" to \"web\" and \"link.\"\n", | |
"\n", | |
"But \"bed\" would also relate to the other team's word \"sleep\" and \"click\" with \"button.\"\n", | |
"\n", | |
"We can also include `negative` words so that we'd avoid words that are the other team's." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 80, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('Hospital', 0.27265793085098267),\n", | |
" ('ambulances', 0.2605472207069397),\n", | |
" ('hospitals', 0.24624229967594147),\n", | |
" ('outpatient', 0.24339225888252258),\n", | |
" ('inpatient', 0.2404019981622696),\n", | |
" ('paramedics', 0.23482689261436462),\n", | |
" ('escort', 0.23161748051643372),\n", | |
" ('Partnerships', 0.23104971647262573),\n", | |
" ('Medical_Center', 0.2306305170059204),\n", | |
" ('telemedicine', 0.22638411819934845)]" | |
] | |
}, | |
"execution_count": 80, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.most_similar(\n", | |
" positive=board['positive'],\n", | |
" negative=board['negative'],\n", | |
" restrict_vocab=50000\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I really like the clue \"telemedicine.\" It's non-obvious, but relates to four words: \"web,\" \"link,\" \"ambulance\" and \"hospital.\" This shows the potential for this method to produce novel clues.\n", | |
"\n", | |
"Let's say that the clue were \"telemedicine\" and the four words were removed from the board, then the next team got a turn. What might their clues be?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 88, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('pillow', 0.43686941266059875),\n", | |
" ('bra', 0.3842337727546692),\n", | |
" ('couch', 0.38342970609664917),\n", | |
" ('tub', 0.37922778725624084),\n", | |
" ('closet', 0.36959999799728394),\n", | |
" ('sofa', 0.36713898181915283),\n", | |
" ('bathroom', 0.366258829832077),\n", | |
" ('bed', 0.36348700523376465),\n", | |
" ('crotch', 0.36245280504226685),\n", | |
" ('spoon', 0.36179912090301514)]" | |
] | |
}, | |
"execution_count": 88, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"board = {\n", | |
" 'positive': ['spell', 'lock', 'charge', 'tail', 'link'],\n", | |
" 'negative': ['cat', 'button', 'pipe', 'pants', 'mount', 'sleep', 'stick', 'file', 'worm'],\n", | |
" 'assassin': 'doctor'\n", | |
"}\n", | |
"\n", | |
"model.most_similar(\n", | |
" positive=board['negative'],\n", | |
" negative=board['positive'],\n", | |
" restrict_vocab=50000\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This appears much less successful. The top words mostly just seem to relate to a singe word:\n", | |
"* pillow -> sleep\n", | |
"* bra -> pants\n", | |
"* couch -> sleep? cat?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 89, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('Partnerships', 0.19860073924064636),\n", | |
" ('partnership', 0.1707054078578949),\n", | |
" ('Affiliates', 0.1595458686351776),\n", | |
" ('Partnership', 0.1545657068490982),\n", | |
" ('spells', 0.15078961849212646),\n", | |
" ('signing', 0.15013918280601501),\n", | |
" ('guiding', 0.14804501831531525),\n", | |
" ('reserve', 0.14592880010604858),\n", | |
" ('tutelage', 0.1441878080368042),\n", | |
" ('plundered', 0.14354334771633148)]" | |
] | |
}, | |
"execution_count": 89, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.most_similar(\n", | |
" positive=board['positive'],\n", | |
" negative=board['negative'],\n", | |
" restrict_vocab=50000\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 84, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('pillow', 0.4124183654785156),\n", | |
" ('bra', 0.38460129499435425),\n", | |
" ('crotch', 0.3793488144874573),\n", | |
" ('buttons', 0.37311607599258423),\n", | |
" ('couch', 0.36672115325927734),\n", | |
" ('strap', 0.35612520575523376),\n", | |
" ('backpack', 0.3539729714393616),\n", | |
" ('mailbox', 0.351298451423645),\n", | |
" ('bug', 0.35095250606536865),\n", | |
" ('files', 0.34630095958709717)]" | |
] | |
}, | |
"execution_count": 84, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.most_similar(\n", | |
" positive=board['negative'],\n", | |
" negative=board['positive'],\n", | |
" restrict_vocab=50000\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 81, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('Partnerships', 0.24295198917388916),\n", | |
" ('Ashford', 0.24151912331581116),\n", | |
" ('Procurement', 0.211196631193161),\n", | |
" ('Partnership', 0.20416252315044403),\n", | |
" ('booking', 0.199225515127182),\n", | |
" ('Affiliates', 0.19800953567028046),\n", | |
" ('Interchange', 0.19791358709335327),\n", | |
" ('service', 0.1973651796579361),\n", | |
" ('ambulances', 0.19639825820922852),\n", | |
" ('reserve', 0.19472390413284302)]" | |
] | |
}, | |
"execution_count": 81, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# We can use `restrict_vocab` to limit results to only the top n most common words\n", | |
"model.most_similar(\n", | |
" positive=board['positive'],\n", | |
" negative=board['negative'] + [board['assassin']],\n", | |
" restrict_vocab=50000\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('downward_spiral', 0.2583349347114563),\n", | |
" ('dip', 0.24365955591201782),\n", | |
" ('tumble', 0.23384986817836761),\n", | |
" ('drop', 0.2305331975221634),\n", | |
" ('byproduct', 0.22404420375823975),\n", | |
" ('Decline', 0.22345377504825592),\n", | |
" ('report', 0.21692954003810883),\n", | |
" ('paragraph', 0.21681174635887146),\n", | |
" ('Fall', 0.21671724319458008),\n", | |
" ('spring', 0.2162589132785797)]" | |
] | |
}, | |
"execution_count": 44, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"model.most_similar(\n", | |
" positive=[mapping[w] for w in board['positive']], \n", | |
" negative=[mapping[w] for w in (board['negative'] + [board['assassin']])],\n", | |
" topn=10, restrict_vocab=50000\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# Problems:\n", | |
"# Not capturing words that match well with multiple positive words. One good connection overpowers the rest\n", | |
"# word parts\n", | |
"# Obscure words\n", | |
"\n", | |
"# Idea: \n", | |
"# Run similarites between individual words and then combine the lists. Words that appear the most often are good candidiates" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>count</th>\n", | |
" <th>mean</th>\n", | |
" <th>sum</th>\n", | |
" <th>std</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>word</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>flatten</th>\n", | |
" <td>4.0</td>\n", | |
" <td>0.236673</td>\n", | |
" <td>0.946692</td>\n", | |
" <td>0.079847</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>roll</th>\n", | |
" <td>4.0</td>\n", | |
" <td>0.228831</td>\n", | |
" <td>0.915322</td>\n", | |
" <td>0.047536</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Thursday</th>\n", | |
" <td>4.0</td>\n", | |
" <td>0.223386</td>\n", | |
" <td>0.893544</td>\n", | |
" <td>0.035845</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Wednesday</th>\n", | |
" <td>4.0</td>\n", | |
" <td>0.220163</td>\n", | |
" <td>0.880650</td>\n", | |
" <td>0.032037</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Monday</th>\n", | |
" <td>4.0</td>\n", | |
" <td>0.217271</td>\n", | |
" <td>0.869085</td>\n", | |
" <td>0.037886</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>downwards</th>\n", | |
" <td>4.0</td>\n", | |
" <td>0.209756</td>\n", | |
" <td>0.839022</td>\n", | |
" <td>0.014385</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>surge</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.302306</td>\n", | |
" <td>0.906918</td>\n", | |
" <td>0.076502</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>spiral</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.295132</td>\n", | |
" <td>0.885395</td>\n", | |
" <td>0.052286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>slumps</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.289296</td>\n", | |
" <td>0.867887</td>\n", | |
" <td>0.094156</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>just</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.281075</td>\n", | |
" <td>0.843225</td>\n", | |
" <td>0.127939</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>statement</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.280905</td>\n", | |
" <td>0.842714</td>\n", | |
" <td>0.086204</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>push</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.279088</td>\n", | |
" <td>0.837264</td>\n", | |
" <td>0.076753</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>wave</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.255928</td>\n", | |
" <td>0.767784</td>\n", | |
" <td>0.067337</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>tailspin</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.255911</td>\n", | |
" <td>0.767734</td>\n", | |
" <td>0.052425</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>NYT</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.254757</td>\n", | |
" <td>0.764271</td>\n", | |
" <td>0.090522</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>freefall</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.254118</td>\n", | |
" <td>0.762353</td>\n", | |
" <td>0.091539</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>happens</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.253574</td>\n", | |
" <td>0.760722</td>\n", | |
" <td>0.027510</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>trend</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.252779</td>\n", | |
" <td>0.758338</td>\n", | |
" <td>0.071111</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>laughed</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.249634</td>\n", | |
" <td>0.748903</td>\n", | |
" <td>0.072787</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Wall_Street_Journal</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.247608</td>\n", | |
" <td>0.742825</td>\n", | |
" <td>0.088031</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>ascent</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.246717</td>\n", | |
" <td>0.740150</td>\n", | |
" <td>0.039043</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>graph</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.245881</td>\n", | |
" <td>0.737642</td>\n", | |
" <td>0.036799</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>mentality</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.244612</td>\n", | |
" <td>0.733837</td>\n", | |
" <td>0.071997</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>day</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.242762</td>\n", | |
" <td>0.728287</td>\n", | |
" <td>0.036740</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>inertia</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.241835</td>\n", | |
" <td>0.725505</td>\n", | |
" <td>0.045367</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>catapult</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.240552</td>\n", | |
" <td>0.721657</td>\n", | |
" <td>0.024719</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>depress</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.240457</td>\n", | |
" <td>0.721371</td>\n", | |
" <td>0.065503</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>spear</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.239582</td>\n", | |
" <td>0.718747</td>\n", | |
" <td>0.009128</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>if</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.238999</td>\n", | |
" <td>0.716997</td>\n", | |
" <td>0.028044</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>equation</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.237353</td>\n", | |
" <td>0.712060</td>\n", | |
" <td>0.008637</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>MacIntyre</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171345</td>\n", | |
" <td>0.171345</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Rick_Carlisle</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171174</td>\n", | |
" <td>0.171174</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Seifert</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171157</td>\n", | |
" <td>0.171157</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>thumped</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171149</td>\n", | |
" <td>0.171149</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>shoe</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171143</td>\n", | |
" <td>0.171143</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>occupant</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171142</td>\n", | |
" <td>0.171142</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>knight</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171085</td>\n", | |
" <td>0.171085</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>medic</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.171074</td>\n", | |
" <td>0.171074</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>consults</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170966</td>\n", | |
" <td>0.170966</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>champs</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170955</td>\n", | |
" <td>0.170955</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>lice</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170927</td>\n", | |
" <td>0.170927</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>toothless</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170917</td>\n", | |
" <td>0.170917</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>elects</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170865</td>\n", | |
" <td>0.170865</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Matta</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170856</td>\n", | |
" <td>0.170856</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>ugly</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170833</td>\n", | |
" <td>0.170833</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>knife</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170772</td>\n", | |
" <td>0.170772</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>arbiter</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170728</td>\n", | |
" <td>0.170728</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>motionless</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170710</td>\n", | |
" <td>0.170710</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>pinning</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170696</td>\n", | |
" <td>0.170696</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>decides</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170694</td>\n", | |
" <td>0.170694</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Sergei</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170564</td>\n", | |
" <td>0.170564</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>co_ordinate</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170548</td>\n", | |
" <td>0.170548</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>bangs</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170530</td>\n", | |
" <td>0.170530</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>laurels</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170458</td>\n", | |
" <td>0.170458</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>notches</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170422</td>\n", | |
" <td>0.170422</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>sprain</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170372</td>\n", | |
" <td>0.170372</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>acumen</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170307</td>\n", | |
" <td>0.170307</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>complacent</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170185</td>\n", | |
" <td>0.170185</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>For_Restrictions</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170162</td>\n", | |
" <td>0.170162</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>swords</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.170084</td>\n", | |
" <td>0.170084</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>8125 rows × 4 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" count mean sum std\n", | |
"word \n", | |
"flatten 4.0 0.236673 0.946692 0.079847\n", | |
"roll 4.0 0.228831 0.915322 0.047536\n", | |
"Thursday 4.0 0.223386 0.893544 0.035845\n", | |
"Wednesday 4.0 0.220163 0.880650 0.032037\n", | |
"Monday 4.0 0.217271 0.869085 0.037886\n", | |
"downwards 4.0 0.209756 0.839022 0.014385\n", | |
"surge 3.0 0.302306 0.906918 0.076502\n", | |
"spiral 3.0 0.295132 0.885395 0.052286\n", | |
"slumps 3.0 0.289296 0.867887 0.094156\n", | |
"just 3.0 0.281075 0.843225 0.127939\n", | |
"statement 3.0 0.280905 0.842714 0.086204\n", | |
"push 3.0 0.279088 0.837264 0.076753\n", | |
"wave 3.0 0.255928 0.767784 0.067337\n", | |
"tailspin 3.0 0.255911 0.767734 0.052425\n", | |
"NYT 3.0 0.254757 0.764271 0.090522\n", | |
"freefall 3.0 0.254118 0.762353 0.091539\n", | |
"happens 3.0 0.253574 0.760722 0.027510\n", | |
"trend 3.0 0.252779 0.758338 0.071111\n", | |
"laughed 3.0 0.249634 0.748903 0.072787\n", | |
"Wall_Street_Journal 3.0 0.247608 0.742825 0.088031\n", | |
"ascent 3.0 0.246717 0.740150 0.039043\n", | |
"graph 3.0 0.245881 0.737642 0.036799\n", | |
"mentality 3.0 0.244612 0.733837 0.071997\n", | |
"day 3.0 0.242762 0.728287 0.036740\n", | |
"inertia 3.0 0.241835 0.725505 0.045367\n", | |
"catapult 3.0 0.240552 0.721657 0.024719\n", | |
"depress 3.0 0.240457 0.721371 0.065503\n", | |
"spear 3.0 0.239582 0.718747 0.009128\n", | |
"if 3.0 0.238999 0.716997 0.028044\n", | |
"equation 3.0 0.237353 0.712060 0.008637\n", | |
"... ... ... ... ...\n", | |
"MacIntyre 1.0 0.171345 0.171345 NaN\n", | |
"Rick_Carlisle 1.0 0.171174 0.171174 NaN\n", | |
"Seifert 1.0 0.171157 0.171157 NaN\n", | |
"thumped 1.0 0.171149 0.171149 NaN\n", | |
"shoe 1.0 0.171143 0.171143 NaN\n", | |
"occupant 1.0 0.171142 0.171142 NaN\n", | |
"knight 1.0 0.171085 0.171085 NaN\n", | |
"medic 1.0 0.171074 0.171074 NaN\n", | |
"consults 1.0 0.170966 0.170966 NaN\n", | |
"champs 1.0 0.170955 0.170955 NaN\n", | |
"lice 1.0 0.170927 0.170927 NaN\n", | |
"toothless 1.0 0.170917 0.170917 NaN\n", | |
"elects 1.0 0.170865 0.170865 NaN\n", | |
"Matta 1.0 0.170856 0.170856 NaN\n", | |
"ugly 1.0 0.170833 0.170833 NaN\n", | |
"knife 1.0 0.170772 0.170772 NaN\n", | |
"arbiter 1.0 0.170728 0.170728 NaN\n", | |
"motionless 1.0 0.170710 0.170710 NaN\n", | |
"pinning 1.0 0.170696 0.170696 NaN\n", | |
"decides 1.0 0.170694 0.170694 NaN\n", | |
"Sergei 1.0 0.170564 0.170564 NaN\n", | |
"co_ordinate 1.0 0.170548 0.170548 NaN\n", | |
"bangs 1.0 0.170530 0.170530 NaN\n", | |
"laurels 1.0 0.170458 0.170458 NaN\n", | |
"notches 1.0 0.170422 0.170422 NaN\n", | |
"sprain 1.0 0.170372 0.170372 NaN\n", | |
"acumen 1.0 0.170307 0.170307 NaN\n", | |
"complacent 1.0 0.170185 0.170185 NaN\n", | |
"For_Restrictions 1.0 0.170162 0.170162 NaN\n", | |
"swords 1.0 0.170084 0.170084 NaN\n", | |
"\n", | |
"[8125 rows x 4 columns]" | |
] | |
}, | |
"execution_count": 72, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import pandas as pd\n", | |
"dfs = []\n", | |
"for w in board['positive']:\n", | |
" dfs.append(pd.DataFrame.from_records(model.similar_by_word(mapping[w], topn=1000, restrict_vocab=50000), columns=['word', 'similarity']))\n", | |
"#model.similar_by_word('head', topn=1000)\n", | |
"\n", | |
"df = pd.concat(dfs)\n", | |
"\n", | |
"df.groupby('word').agg(['count', 'mean', 'sum', 'std']).T.reset_index(level=0, drop=True).T.sort_values(['count', 'mean'], ascending=False)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>count</th>\n", | |
" <th>mean</th>\n", | |
" <th>sum</th>\n", | |
" <th>std</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>word</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>beam</th>\n", | |
" <td>4.0</td>\n", | |
" <td>0.259435</td>\n", | |
" <td>1.037739</td>\n", | |
" <td>0.055484</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>robotic</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.438389</td>\n", | |
" <td>1.315166</td>\n", | |
" <td>0.325974</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Great_Wall</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.308924</td>\n", | |
" <td>0.926773</td>\n", | |
" <td>0.109942</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>observatory</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.304838</td>\n", | |
" <td>0.914514</td>\n", | |
" <td>0.119003</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Kong</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.294825</td>\n", | |
" <td>0.884476</td>\n", | |
" <td>0.048162</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>skull</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.289561</td>\n", | |
" <td>0.868683</td>\n", | |
" <td>0.029377</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>rope</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.282066</td>\n", | |
" <td>0.846198</td>\n", | |
" <td>0.051396</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>bat</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.279069</td>\n", | |
" <td>0.837208</td>\n", | |
" <td>0.082658</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>backboard</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.274719</td>\n", | |
" <td>0.824156</td>\n", | |
" <td>0.058485</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>helmet</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.273247</td>\n", | |
" <td>0.819742</td>\n", | |
" <td>0.055086</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>beams</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.272344</td>\n", | |
" <td>0.817032</td>\n", | |
" <td>0.066079</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>spider</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.271149</td>\n", | |
" <td>0.813448</td>\n", | |
" <td>0.107322</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>dummy</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.270908</td>\n", | |
" <td>0.812724</td>\n", | |
" <td>0.035488</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>space</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.266982</td>\n", | |
" <td>0.800947</td>\n", | |
" <td>0.043191</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>laser</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.264970</td>\n", | |
" <td>0.794909</td>\n", | |
" <td>0.047325</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>device</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.264215</td>\n", | |
" <td>0.792644</td>\n", | |
" <td>0.115501</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>crane</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.257942</td>\n", | |
" <td>0.773826</td>\n", | |
" <td>0.091387</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>balcony</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.256093</td>\n", | |
" <td>0.768280</td>\n", | |
" <td>0.115100</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>cage</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.253862</td>\n", | |
" <td>0.761586</td>\n", | |
" <td>0.057666</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>canister</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.252995</td>\n", | |
" <td>0.758986</td>\n", | |
" <td>0.044794</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>penis</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.249482</td>\n", | |
" <td>0.748447</td>\n", | |
" <td>0.027952</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>elevator</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.249475</td>\n", | |
" <td>0.748424</td>\n", | |
" <td>0.079156</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Doctor_Who</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.244777</td>\n", | |
" <td>0.734330</td>\n", | |
" <td>0.055631</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>headset</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.241369</td>\n", | |
" <td>0.724106</td>\n", | |
" <td>0.065166</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>arrow</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.240591</td>\n", | |
" <td>0.721774</td>\n", | |
" <td>0.040477</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>steering_wheel</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.240482</td>\n", | |
" <td>0.721445</td>\n", | |
" <td>0.023518</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>LCD_screen</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.240454</td>\n", | |
" <td>0.721361</td>\n", | |
" <td>0.039021</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Dragon</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.240302</td>\n", | |
" <td>0.720905</td>\n", | |
" <td>0.010778</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>eyeball</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.240087</td>\n", | |
" <td>0.720261</td>\n", | |
" <td>0.008821</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>hilltop</th>\n", | |
" <td>3.0</td>\n", | |
" <td>0.239593</td>\n", | |
" <td>0.718780</td>\n", | |
" <td>0.025651</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>wallet</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.177335</td>\n", | |
" <td>0.177335</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Capital</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.177247</td>\n", | |
" <td>0.177247</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>slant</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.177133</td>\n", | |
" <td>0.177133</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>soles</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.177132</td>\n", | |
" <td>0.177132</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>repeatedly</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.177131</td>\n", | |
" <td>0.177131</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>He'sa</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.177035</td>\n", | |
" <td>0.177035</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>little</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176945</td>\n", | |
" <td>0.176945</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>independently</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176882</td>\n", | |
" <td>0.176882</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>grasped</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176845</td>\n", | |
" <td>0.176845</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>skills</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176832</td>\n", | |
" <td>0.176832</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>sphere</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176820</td>\n", | |
" <td>0.176820</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Cain</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176809</td>\n", | |
" <td>0.176809</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>deflect</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176771</td>\n", | |
" <td>0.176771</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>bit</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176725</td>\n", | |
" <td>0.176725</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>chunks</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176717</td>\n", | |
" <td>0.176717</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>barbs</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176678</td>\n", | |
" <td>0.176678</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>gravely</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176582</td>\n", | |
" <td>0.176582</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>perspective</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176530</td>\n", | |
" <td>0.176530</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>sleeping_bag</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176454</td>\n", | |
" <td>0.176454</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>leaped</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176303</td>\n", | |
" <td>0.176303</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>handcuffed</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176248</td>\n", | |
" <td>0.176248</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>pockets</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176243</td>\n", | |
" <td>0.176243</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>fragment</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176235</td>\n", | |
" <td>0.176235</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>apex</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176228</td>\n", | |
" <td>0.176228</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>looked</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176170</td>\n", | |
" <td>0.176170</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>gun</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176114</td>\n", | |
" <td>0.176114</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>lift</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176109</td>\n", | |
" <td>0.176109</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Rhino</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176108</td>\n", | |
" <td>0.176108</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>booster</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.176050</td>\n", | |
" <td>0.176050</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>luckily</th>\n", | |
" <td>1.0</td>\n", | |
" <td>0.175979</td>\n", | |
" <td>0.175979</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7189 rows × 4 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" count mean sum std\n", | |
"word \n", | |
"beam 4.0 0.259435 1.037739 0.055484\n", | |
"robotic 3.0 0.438389 1.315166 0.325974\n", | |
"Great_Wall 3.0 0.308924 0.926773 0.109942\n", | |
"observatory 3.0 0.304838 0.914514 0.119003\n", | |
"Kong 3.0 0.294825 0.884476 0.048162\n", | |
"skull 3.0 0.289561 0.868683 0.029377\n", | |
"rope 3.0 0.282066 0.846198 0.051396\n", | |
"bat 3.0 0.279069 0.837208 0.082658\n", | |
"backboard 3.0 0.274719 0.824156 0.058485\n", | |
"helmet 3.0 0.273247 0.819742 0.055086\n", | |
"beams 3.0 0.272344 0.817032 0.066079\n", | |
"spider 3.0 0.271149 0.813448 0.107322\n", | |
"dummy 3.0 0.270908 0.812724 0.035488\n", | |
"space 3.0 0.266982 0.800947 0.043191\n", | |
"laser 3.0 0.264970 0.794909 0.047325\n", | |
"device 3.0 0.264215 0.792644 0.115501\n", | |
"crane 3.0 0.257942 0.773826 0.091387\n", | |
"balcony 3.0 0.256093 0.768280 0.115100\n", | |
"cage 3.0 0.253862 0.761586 0.057666\n", | |
"canister 3.0 0.252995 0.758986 0.044794\n", | |
"penis 3.0 0.249482 0.748447 0.027952\n", | |
"elevator 3.0 0.249475 0.748424 0.079156\n", | |
"Doctor_Who 3.0 0.244777 0.734330 0.055631\n", | |
"headset 3.0 0.241369 0.724106 0.065166\n", | |
"arrow 3.0 0.240591 0.721774 0.040477\n", | |
"steering_wheel 3.0 0.240482 0.721445 0.023518\n", | |
"LCD_screen 3.0 0.240454 0.721361 0.039021\n", | |
"Dragon 3.0 0.240302 0.720905 0.010778\n", | |
"eyeball 3.0 0.240087 0.720261 0.008821\n", | |
"hilltop 3.0 0.239593 0.718780 0.025651\n", | |
"... ... ... ... ...\n", | |
"wallet 1.0 0.177335 0.177335 NaN\n", | |
"Capital 1.0 0.177247 0.177247 NaN\n", | |
"slant 1.0 0.177133 0.177133 NaN\n", | |
"soles 1.0 0.177132 0.177132 NaN\n", | |
"repeatedly 1.0 0.177131 0.177131 NaN\n", | |
"He'sa 1.0 0.177035 0.177035 NaN\n", | |
"little 1.0 0.176945 0.176945 NaN\n", | |
"independently 1.0 0.176882 0.176882 NaN\n", | |
"grasped 1.0 0.176845 0.176845 NaN\n", | |
"skills 1.0 0.176832 0.176832 NaN\n", | |
"sphere 1.0 0.176820 0.176820 NaN\n", | |
"Cain 1.0 0.176809 0.176809 NaN\n", | |
"deflect 1.0 0.176771 0.176771 NaN\n", | |
"bit 1.0 0.176725 0.176725 NaN\n", | |
"chunks 1.0 0.176717 0.176717 NaN\n", | |
"barbs 1.0 0.176678 0.176678 NaN\n", | |
"gravely 1.0 0.176582 0.176582 NaN\n", | |
"perspective 1.0 0.176530 0.176530 NaN\n", | |
"sleeping_bag 1.0 0.176454 0.176454 NaN\n", | |
"leaped 1.0 0.176303 0.176303 NaN\n", | |
"handcuffed 1.0 0.176248 0.176248 NaN\n", | |
"pockets 1.0 0.176243 0.176243 NaN\n", | |
"fragment 1.0 0.176235 0.176235 NaN\n", | |
"apex 1.0 0.176228 0.176228 NaN\n", | |
"looked 1.0 0.176170 0.176170 NaN\n", | |
"gun 1.0 0.176114 0.176114 NaN\n", | |
"lift 1.0 0.176109 0.176109 NaN\n", | |
"Rhino 1.0 0.176108 0.176108 NaN\n", | |
"booster 1.0 0.176050 0.176050 NaN\n", | |
"luckily 1.0 0.175979 0.175979 NaN\n", | |
"\n", | |
"[7189 rows x 4 columns]" | |
] | |
}, | |
"execution_count": 73, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import pandas as pd\n", | |
"dfs = []\n", | |
"for w in board['negative']:\n", | |
" dfs.append(pd.DataFrame.from_records(model.similar_by_word(mapping[w], topn=1000, restrict_vocab=50000), columns=['word', 'similarity']))\n", | |
"#model.similar_by_word('head', topn=1000)\n", | |
"\n", | |
"df = pd.concat(dfs)\n", | |
"\n", | |
"df.groupby('word').agg(['count', 'mean', 'sum', 'std']).T.reset_index(level=0, drop=True).T.sort_values(['count', 'mean'], ascending=False)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import random | |
def get_board(): | |
cards = [] | |
with open('../codenames_cards.txt', 'r') as f: | |
for r in f.readlines(): | |
cards.append(r.lower().strip().replace(' ', '_')) | |
sample = random.sample(cards, 18) | |
return { | |
'positive': sample[:9], | |
'negative': sample[9:-1], | |
'assassin': sample[-1] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment