Last active
December 18, 2017 23:48
-
-
Save docmarionum1/ce6c10c75c521cfec36c5fe86e2b6f64 to your computer and use it in GitHub Desktop.
Codenames Blog
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| **.bin | |
| .ipynb_checkpoints |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We'll use the python library gensim: https://radimrehurek.com/gensim/" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 69, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import gensim" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Load a premade word2vec model built on Google News articles.\n", | |
| "\n", | |
| "Download from: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 70, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True, limit=500000)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Here's an example Codenames board. `positive` is one team's words, `negative` the other and `assassin` is the assassin word." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 79, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "board = {\n", | |
| " 'positive': ['ambulance', 'hospital', 'spell', 'lock', 'charge', 'tail', 'link', 'cook', 'web'],\n", | |
| " 'negative': ['cat', 'button', 'pipe', 'pants', 'mount', 'sleep', 'stick', 'file', 'worm'],\n", | |
| " 'assassin': 'doctor'\n", | |
| "}" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We can use gensim to find the 10 words most related to 'ambulance' in this word2vec model." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 55, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('paramedics', 0.7590752243995667),\n", | |
| " ('ambulances', 0.7493595480918884),\n", | |
| " ('Ambulance', 0.7236292362213135),\n", | |
| " ('paramedic', 0.662133514881134),\n", | |
| " ('Ambulance_paramedics', 0.6315338611602783),\n", | |
| " ('Ambulances', 0.6211477518081665),\n", | |
| " ('LifeFlight_helicopter', 0.6147335171699524),\n", | |
| " ('hospital', 0.6099206209182739),\n", | |
| " ('Paramedics', 0.6081751585006714),\n", | |
| " ('Ambulance_Service', 0.6080097556114197)]" | |
| ] | |
| }, | |
| "execution_count": 55, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.similar_by_word('ambulance', topn=10)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Some of these words word be useful, \"parametics\" for instance, but many are just other forms of the word \"ambulance.\"\n", | |
| "\n", | |
| "gensim allows us to directly find words the most similar to a whole group of words at one time." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 73, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('%_#F########_3v.jsn', 0.5153687000274658),\n", | |
| " ('By_TBT_staff', 0.4811619818210602),\n", | |
| " ('By_HARVEY_SIMPSON', 0.47336331009864807),\n", | |
| " ('try_resubmitting', 0.46592575311660767),\n", | |
| " ('By_Salvatore_Landolina', 0.4655460715293884),\n", | |
| " ('By_Jason_Kaneshiro', 0.4612027108669281),\n", | |
| " ('%_#F########_2v.jsn', 0.45537447929382324),\n", | |
| " ('%_#F########_1v.jsn', 0.4508393406867981),\n", | |
| " ('BY_VINCENT_MAO', 0.4498888850212097),\n", | |
| " ('Visit_BBC_Webwise', 0.4431522786617279)]" | |
| ] | |
| }, | |
| "execution_count": 73, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.most_similar(positive=board['positive'])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "As we can see, it produces a lot of nonsense words. We can use `restrict_vocab` to limit results to only the top n most common words in the corpus." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 74, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('For_Restrictions', 0.43488097190856934),\n", | |
| " ('bed', 0.39588358998298645),\n", | |
| " ('links', 0.38411831855773926),\n", | |
| " ('hook', 0.38367366790771484),\n", | |
| " ('paramedics', 0.38072746992111206),\n", | |
| " ('emergency', 0.37950167059898376),\n", | |
| " ('jail', 0.3759669065475464),\n", | |
| " ('log', 0.37062549591064453),\n", | |
| " ('intensive_care', 0.3661930561065674),\n", | |
| " ('call', 0.36543411016464233),\n", | |
| " ('webpage', 0.3649423122406006),\n", | |
| " ('tow_truck', 0.3592333197593689),\n", | |
| " ('click', 0.35906946659088135),\n", | |
| " ('cooked', 0.3552851676940918),\n", | |
| " ('care', 0.3537469208240509),\n", | |
| " ('handcuff', 0.35027384757995605),\n", | |
| " ('then', 0.34921103715896606),\n", | |
| " ('stay', 0.3478427529335022),\n", | |
| " ('turn', 0.34607696533203125),\n", | |
| " ('bookmark', 0.3458564579486847)]" | |
| ] | |
| }, | |
| "execution_count": 74, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.most_similar(\n", | |
| " positive=board['positive'],\n", | |
| " restrict_vocab=50000,\n", | |
| " topn=20\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This looks much better, and produces some decent clues. \n", | |
| "* \"bed\", \"paramedics\", \"emergency\" all relate to \"ambulance\" and \"hospital.\" \n", | |
| "* \"jail\" could relate to \"lock\" and \"charge.\" \n", | |
| "* \"click\" to \"web\" and \"link.\"\n", | |
| "\n", | |
| "But \"bed\" would also relate to the other team's word \"sleep\" and \"click\" with \"button.\"\n", | |
| "\n", | |
| "We can also include `negative` words so that we'd avoid words that are the other team's." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 80, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('Hospital', 0.27265793085098267),\n", | |
| " ('ambulances', 0.2605472207069397),\n", | |
| " ('hospitals', 0.24624229967594147),\n", | |
| " ('outpatient', 0.24339225888252258),\n", | |
| " ('inpatient', 0.2404019981622696),\n", | |
| " ('paramedics', 0.23482689261436462),\n", | |
| " ('escort', 0.23161748051643372),\n", | |
| " ('Partnerships', 0.23104971647262573),\n", | |
| " ('Medical_Center', 0.2306305170059204),\n", | |
| " ('telemedicine', 0.22638411819934845)]" | |
| ] | |
| }, | |
| "execution_count": 80, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.most_similar(\n", | |
| " positive=board['positive'],\n", | |
| " negative=board['negative'],\n", | |
| " restrict_vocab=50000\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "I really like the clue \"telemedicine.\" It's non-obvious, but relates to four words: \"web,\" \"link,\" \"ambulance\" and \"hospital.\" This shows the potential for this method to produce novel clues.\n", | |
| "\n", | |
| "Let's say that the clue were \"telemedicine\" and the four words were removed from the board, then the next team got a turn. What might their clues be?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 88, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('pillow', 0.43686941266059875),\n", | |
| " ('bra', 0.3842337727546692),\n", | |
| " ('couch', 0.38342970609664917),\n", | |
| " ('tub', 0.37922778725624084),\n", | |
| " ('closet', 0.36959999799728394),\n", | |
| " ('sofa', 0.36713898181915283),\n", | |
| " ('bathroom', 0.366258829832077),\n", | |
| " ('bed', 0.36348700523376465),\n", | |
| " ('crotch', 0.36245280504226685),\n", | |
| " ('spoon', 0.36179912090301514)]" | |
| ] | |
| }, | |
| "execution_count": 88, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "board = {\n", | |
| " 'positive': ['spell', 'lock', 'charge', 'tail', 'link'],\n", | |
| " 'negative': ['cat', 'button', 'pipe', 'pants', 'mount', 'sleep', 'stick', 'file', 'worm'],\n", | |
| " 'assassin': 'doctor'\n", | |
| "}\n", | |
| "\n", | |
| "model.most_similar(\n", | |
| " positive=board['negative'],\n", | |
| " negative=board['positive'],\n", | |
| " restrict_vocab=50000\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This appears much less successful. The top words mostly just seem to relate to a singe word:\n", | |
| "* pillow -> sleep\n", | |
| "* bra -> pants\n", | |
| "* couch -> sleep? cat?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 89, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('Partnerships', 0.19860073924064636),\n", | |
| " ('partnership', 0.1707054078578949),\n", | |
| " ('Affiliates', 0.1595458686351776),\n", | |
| " ('Partnership', 0.1545657068490982),\n", | |
| " ('spells', 0.15078961849212646),\n", | |
| " ('signing', 0.15013918280601501),\n", | |
| " ('guiding', 0.14804501831531525),\n", | |
| " ('reserve', 0.14592880010604858),\n", | |
| " ('tutelage', 0.1441878080368042),\n", | |
| " ('plundered', 0.14354334771633148)]" | |
| ] | |
| }, | |
| "execution_count": 89, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.most_similar(\n", | |
| " positive=board['positive'],\n", | |
| " negative=board['negative'],\n", | |
| " restrict_vocab=50000\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 84, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('pillow', 0.4124183654785156),\n", | |
| " ('bra', 0.38460129499435425),\n", | |
| " ('crotch', 0.3793488144874573),\n", | |
| " ('buttons', 0.37311607599258423),\n", | |
| " ('couch', 0.36672115325927734),\n", | |
| " ('strap', 0.35612520575523376),\n", | |
| " ('backpack', 0.3539729714393616),\n", | |
| " ('mailbox', 0.351298451423645),\n", | |
| " ('bug', 0.35095250606536865),\n", | |
| " ('files', 0.34630095958709717)]" | |
| ] | |
| }, | |
| "execution_count": 84, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.most_similar(\n", | |
| " positive=board['negative'],\n", | |
| " negative=board['positive'],\n", | |
| " restrict_vocab=50000\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 81, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('Partnerships', 0.24295198917388916),\n", | |
| " ('Ashford', 0.24151912331581116),\n", | |
| " ('Procurement', 0.211196631193161),\n", | |
| " ('Partnership', 0.20416252315044403),\n", | |
| " ('booking', 0.199225515127182),\n", | |
| " ('Affiliates', 0.19800953567028046),\n", | |
| " ('Interchange', 0.19791358709335327),\n", | |
| " ('service', 0.1973651796579361),\n", | |
| " ('ambulances', 0.19639825820922852),\n", | |
| " ('reserve', 0.19472390413284302)]" | |
| ] | |
| }, | |
| "execution_count": 81, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# We can use `restrict_vocab` to limit results to only the top n most common words\n", | |
| "model.most_similar(\n", | |
| " positive=board['positive'],\n", | |
| " negative=board['negative'] + [board['assassin']],\n", | |
| " restrict_vocab=50000\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 44, | |
| "metadata": { | |
| "scrolled": true | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('downward_spiral', 0.2583349347114563),\n", | |
| " ('dip', 0.24365955591201782),\n", | |
| " ('tumble', 0.23384986817836761),\n", | |
| " ('drop', 0.2305331975221634),\n", | |
| " ('byproduct', 0.22404420375823975),\n", | |
| " ('Decline', 0.22345377504825592),\n", | |
| " ('report', 0.21692954003810883),\n", | |
| " ('paragraph', 0.21681174635887146),\n", | |
| " ('Fall', 0.21671724319458008),\n", | |
| " ('spring', 0.2162589132785797)]" | |
| ] | |
| }, | |
| "execution_count": 44, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.most_similar(\n", | |
| " positive=[mapping[w] for w in board['positive']], \n", | |
| " negative=[mapping[w] for w in (board['negative'] + [board['assassin']])],\n", | |
| " topn=10, restrict_vocab=50000\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 46, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Problems:\n", | |
| "# Not capturing words that match well with multiple positive words. One good connection overpowers the rest\n", | |
| "# word parts\n", | |
| "# Obscure words\n", | |
| "\n", | |
| "# Idea: \n", | |
| "# Run similarites between individual words and then combine the lists. Words that appear the most often are good candidiates" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 72, | |
| "metadata": { | |
| "scrolled": true | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style>\n", | |
| " .dataframe thead tr:only-child th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: left;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>count</th>\n", | |
| " <th>mean</th>\n", | |
| " <th>sum</th>\n", | |
| " <th>std</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>word</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>flatten</th>\n", | |
| " <td>4.0</td>\n", | |
| " <td>0.236673</td>\n", | |
| " <td>0.946692</td>\n", | |
| " <td>0.079847</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>roll</th>\n", | |
| " <td>4.0</td>\n", | |
| " <td>0.228831</td>\n", | |
| " <td>0.915322</td>\n", | |
| " <td>0.047536</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Thursday</th>\n", | |
| " <td>4.0</td>\n", | |
| " <td>0.223386</td>\n", | |
| " <td>0.893544</td>\n", | |
| " <td>0.035845</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Wednesday</th>\n", | |
| " <td>4.0</td>\n", | |
| " <td>0.220163</td>\n", | |
| " <td>0.880650</td>\n", | |
| " <td>0.032037</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Monday</th>\n", | |
| " <td>4.0</td>\n", | |
| " <td>0.217271</td>\n", | |
| " <td>0.869085</td>\n", | |
| " <td>0.037886</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>downwards</th>\n", | |
| " <td>4.0</td>\n", | |
| " <td>0.209756</td>\n", | |
| " <td>0.839022</td>\n", | |
| " <td>0.014385</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>surge</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.302306</td>\n", | |
| " <td>0.906918</td>\n", | |
| " <td>0.076502</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>spiral</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.295132</td>\n", | |
| " <td>0.885395</td>\n", | |
| " <td>0.052286</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>slumps</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.289296</td>\n", | |
| " <td>0.867887</td>\n", | |
| " <td>0.094156</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>just</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.281075</td>\n", | |
| " <td>0.843225</td>\n", | |
| " <td>0.127939</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>statement</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.280905</td>\n", | |
| " <td>0.842714</td>\n", | |
| " <td>0.086204</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>push</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.279088</td>\n", | |
| " <td>0.837264</td>\n", | |
| " <td>0.076753</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>wave</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.255928</td>\n", | |
| " <td>0.767784</td>\n", | |
| " <td>0.067337</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>tailspin</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.255911</td>\n", | |
| " <td>0.767734</td>\n", | |
| " <td>0.052425</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>NYT</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.254757</td>\n", | |
| " <td>0.764271</td>\n", | |
| " <td>0.090522</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>freefall</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.254118</td>\n", | |
| " <td>0.762353</td>\n", | |
| " <td>0.091539</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>happens</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.253574</td>\n", | |
| " <td>0.760722</td>\n", | |
| " <td>0.027510</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>trend</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.252779</td>\n", | |
| " <td>0.758338</td>\n", | |
| " <td>0.071111</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>laughed</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.249634</td>\n", | |
| " <td>0.748903</td>\n", | |
| " <td>0.072787</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Wall_Street_Journal</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.247608</td>\n", | |
| " <td>0.742825</td>\n", | |
| " <td>0.088031</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>ascent</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.246717</td>\n", | |
| " <td>0.740150</td>\n", | |
| " <td>0.039043</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>graph</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.245881</td>\n", | |
| " <td>0.737642</td>\n", | |
| " <td>0.036799</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>mentality</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.244612</td>\n", | |
| " <td>0.733837</td>\n", | |
| " <td>0.071997</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>day</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.242762</td>\n", | |
| " <td>0.728287</td>\n", | |
| " <td>0.036740</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>inertia</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.241835</td>\n", | |
| " <td>0.725505</td>\n", | |
| " <td>0.045367</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>catapult</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.240552</td>\n", | |
| " <td>0.721657</td>\n", | |
| " <td>0.024719</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>depress</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.240457</td>\n", | |
| " <td>0.721371</td>\n", | |
| " <td>0.065503</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>spear</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.239582</td>\n", | |
| " <td>0.718747</td>\n", | |
| " <td>0.009128</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>if</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.238999</td>\n", | |
| " <td>0.716997</td>\n", | |
| " <td>0.028044</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>equation</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.237353</td>\n", | |
| " <td>0.712060</td>\n", | |
| " <td>0.008637</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>...</th>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>MacIntyre</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171345</td>\n", | |
| " <td>0.171345</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Rick_Carlisle</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171174</td>\n", | |
| " <td>0.171174</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Seifert</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171157</td>\n", | |
| " <td>0.171157</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>thumped</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171149</td>\n", | |
| " <td>0.171149</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>shoe</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171143</td>\n", | |
| " <td>0.171143</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>occupant</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171142</td>\n", | |
| " <td>0.171142</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>knight</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171085</td>\n", | |
| " <td>0.171085</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>medic</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.171074</td>\n", | |
| " <td>0.171074</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>consults</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170966</td>\n", | |
| " <td>0.170966</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>champs</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170955</td>\n", | |
| " <td>0.170955</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>lice</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170927</td>\n", | |
| " <td>0.170927</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>toothless</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170917</td>\n", | |
| " <td>0.170917</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>elects</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170865</td>\n", | |
| " <td>0.170865</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Matta</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170856</td>\n", | |
| " <td>0.170856</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>ugly</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170833</td>\n", | |
| " <td>0.170833</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>knife</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170772</td>\n", | |
| " <td>0.170772</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>arbiter</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170728</td>\n", | |
| " <td>0.170728</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>motionless</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170710</td>\n", | |
| " <td>0.170710</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>pinning</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170696</td>\n", | |
| " <td>0.170696</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>decides</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170694</td>\n", | |
| " <td>0.170694</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Sergei</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170564</td>\n", | |
| " <td>0.170564</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>co_ordinate</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170548</td>\n", | |
| " <td>0.170548</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bangs</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170530</td>\n", | |
| " <td>0.170530</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>laurels</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170458</td>\n", | |
| " <td>0.170458</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>notches</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170422</td>\n", | |
| " <td>0.170422</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>sprain</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170372</td>\n", | |
| " <td>0.170372</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>acumen</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170307</td>\n", | |
| " <td>0.170307</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>complacent</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170185</td>\n", | |
| " <td>0.170185</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>For_Restrictions</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170162</td>\n", | |
| " <td>0.170162</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>swords</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.170084</td>\n", | |
| " <td>0.170084</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>8125 rows × 4 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " count mean sum std\n", | |
| "word \n", | |
| "flatten 4.0 0.236673 0.946692 0.079847\n", | |
| "roll 4.0 0.228831 0.915322 0.047536\n", | |
| "Thursday 4.0 0.223386 0.893544 0.035845\n", | |
| "Wednesday 4.0 0.220163 0.880650 0.032037\n", | |
| "Monday 4.0 0.217271 0.869085 0.037886\n", | |
| "downwards 4.0 0.209756 0.839022 0.014385\n", | |
| "surge 3.0 0.302306 0.906918 0.076502\n", | |
| "spiral 3.0 0.295132 0.885395 0.052286\n", | |
| "slumps 3.0 0.289296 0.867887 0.094156\n", | |
| "just 3.0 0.281075 0.843225 0.127939\n", | |
| "statement 3.0 0.280905 0.842714 0.086204\n", | |
| "push 3.0 0.279088 0.837264 0.076753\n", | |
| "wave 3.0 0.255928 0.767784 0.067337\n", | |
| "tailspin 3.0 0.255911 0.767734 0.052425\n", | |
| "NYT 3.0 0.254757 0.764271 0.090522\n", | |
| "freefall 3.0 0.254118 0.762353 0.091539\n", | |
| "happens 3.0 0.253574 0.760722 0.027510\n", | |
| "trend 3.0 0.252779 0.758338 0.071111\n", | |
| "laughed 3.0 0.249634 0.748903 0.072787\n", | |
| "Wall_Street_Journal 3.0 0.247608 0.742825 0.088031\n", | |
| "ascent 3.0 0.246717 0.740150 0.039043\n", | |
| "graph 3.0 0.245881 0.737642 0.036799\n", | |
| "mentality 3.0 0.244612 0.733837 0.071997\n", | |
| "day 3.0 0.242762 0.728287 0.036740\n", | |
| "inertia 3.0 0.241835 0.725505 0.045367\n", | |
| "catapult 3.0 0.240552 0.721657 0.024719\n", | |
| "depress 3.0 0.240457 0.721371 0.065503\n", | |
| "spear 3.0 0.239582 0.718747 0.009128\n", | |
| "if 3.0 0.238999 0.716997 0.028044\n", | |
| "equation 3.0 0.237353 0.712060 0.008637\n", | |
| "... ... ... ... ...\n", | |
| "MacIntyre 1.0 0.171345 0.171345 NaN\n", | |
| "Rick_Carlisle 1.0 0.171174 0.171174 NaN\n", | |
| "Seifert 1.0 0.171157 0.171157 NaN\n", | |
| "thumped 1.0 0.171149 0.171149 NaN\n", | |
| "shoe 1.0 0.171143 0.171143 NaN\n", | |
| "occupant 1.0 0.171142 0.171142 NaN\n", | |
| "knight 1.0 0.171085 0.171085 NaN\n", | |
| "medic 1.0 0.171074 0.171074 NaN\n", | |
| "consults 1.0 0.170966 0.170966 NaN\n", | |
| "champs 1.0 0.170955 0.170955 NaN\n", | |
| "lice 1.0 0.170927 0.170927 NaN\n", | |
| "toothless 1.0 0.170917 0.170917 NaN\n", | |
| "elects 1.0 0.170865 0.170865 NaN\n", | |
| "Matta 1.0 0.170856 0.170856 NaN\n", | |
| "ugly 1.0 0.170833 0.170833 NaN\n", | |
| "knife 1.0 0.170772 0.170772 NaN\n", | |
| "arbiter 1.0 0.170728 0.170728 NaN\n", | |
| "motionless 1.0 0.170710 0.170710 NaN\n", | |
| "pinning 1.0 0.170696 0.170696 NaN\n", | |
| "decides 1.0 0.170694 0.170694 NaN\n", | |
| "Sergei 1.0 0.170564 0.170564 NaN\n", | |
| "co_ordinate 1.0 0.170548 0.170548 NaN\n", | |
| "bangs 1.0 0.170530 0.170530 NaN\n", | |
| "laurels 1.0 0.170458 0.170458 NaN\n", | |
| "notches 1.0 0.170422 0.170422 NaN\n", | |
| "sprain 1.0 0.170372 0.170372 NaN\n", | |
| "acumen 1.0 0.170307 0.170307 NaN\n", | |
| "complacent 1.0 0.170185 0.170185 NaN\n", | |
| "For_Restrictions 1.0 0.170162 0.170162 NaN\n", | |
| "swords 1.0 0.170084 0.170084 NaN\n", | |
| "\n", | |
| "[8125 rows x 4 columns]" | |
| ] | |
| }, | |
| "execution_count": 72, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "import pandas as pd\n", | |
| "dfs = []\n", | |
| "for w in board['positive']:\n", | |
| " dfs.append(pd.DataFrame.from_records(model.similar_by_word(mapping[w], topn=1000, restrict_vocab=50000), columns=['word', 'similarity']))\n", | |
| "#model.similar_by_word('head', topn=1000)\n", | |
| "\n", | |
| "df = pd.concat(dfs)\n", | |
| "\n", | |
| "df.groupby('word').agg(['count', 'mean', 'sum', 'std']).T.reset_index(level=0, drop=True).T.sort_values(['count', 'mean'], ascending=False)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 73, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style>\n", | |
| " .dataframe thead tr:only-child th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: left;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>count</th>\n", | |
| " <th>mean</th>\n", | |
| " <th>sum</th>\n", | |
| " <th>std</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>word</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>beam</th>\n", | |
| " <td>4.0</td>\n", | |
| " <td>0.259435</td>\n", | |
| " <td>1.037739</td>\n", | |
| " <td>0.055484</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>robotic</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.438389</td>\n", | |
| " <td>1.315166</td>\n", | |
| " <td>0.325974</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Great_Wall</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.308924</td>\n", | |
| " <td>0.926773</td>\n", | |
| " <td>0.109942</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>observatory</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.304838</td>\n", | |
| " <td>0.914514</td>\n", | |
| " <td>0.119003</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Kong</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.294825</td>\n", | |
| " <td>0.884476</td>\n", | |
| " <td>0.048162</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>skull</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.289561</td>\n", | |
| " <td>0.868683</td>\n", | |
| " <td>0.029377</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>rope</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.282066</td>\n", | |
| " <td>0.846198</td>\n", | |
| " <td>0.051396</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bat</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.279069</td>\n", | |
| " <td>0.837208</td>\n", | |
| " <td>0.082658</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>backboard</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.274719</td>\n", | |
| " <td>0.824156</td>\n", | |
| " <td>0.058485</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>helmet</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.273247</td>\n", | |
| " <td>0.819742</td>\n", | |
| " <td>0.055086</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>beams</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.272344</td>\n", | |
| " <td>0.817032</td>\n", | |
| " <td>0.066079</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>spider</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.271149</td>\n", | |
| " <td>0.813448</td>\n", | |
| " <td>0.107322</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>dummy</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.270908</td>\n", | |
| " <td>0.812724</td>\n", | |
| " <td>0.035488</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>space</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.266982</td>\n", | |
| " <td>0.800947</td>\n", | |
| " <td>0.043191</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>laser</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.264970</td>\n", | |
| " <td>0.794909</td>\n", | |
| " <td>0.047325</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>device</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.264215</td>\n", | |
| " <td>0.792644</td>\n", | |
| " <td>0.115501</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>crane</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.257942</td>\n", | |
| " <td>0.773826</td>\n", | |
| " <td>0.091387</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>balcony</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.256093</td>\n", | |
| " <td>0.768280</td>\n", | |
| " <td>0.115100</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cage</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.253862</td>\n", | |
| " <td>0.761586</td>\n", | |
| " <td>0.057666</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>canister</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.252995</td>\n", | |
| " <td>0.758986</td>\n", | |
| " <td>0.044794</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>penis</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.249482</td>\n", | |
| " <td>0.748447</td>\n", | |
| " <td>0.027952</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>elevator</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.249475</td>\n", | |
| " <td>0.748424</td>\n", | |
| " <td>0.079156</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Doctor_Who</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.244777</td>\n", | |
| " <td>0.734330</td>\n", | |
| " <td>0.055631</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>headset</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.241369</td>\n", | |
| " <td>0.724106</td>\n", | |
| " <td>0.065166</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>arrow</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.240591</td>\n", | |
| " <td>0.721774</td>\n", | |
| " <td>0.040477</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>steering_wheel</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.240482</td>\n", | |
| " <td>0.721445</td>\n", | |
| " <td>0.023518</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>LCD_screen</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.240454</td>\n", | |
| " <td>0.721361</td>\n", | |
| " <td>0.039021</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Dragon</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.240302</td>\n", | |
| " <td>0.720905</td>\n", | |
| " <td>0.010778</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>eyeball</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.240087</td>\n", | |
| " <td>0.720261</td>\n", | |
| " <td>0.008821</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>hilltop</th>\n", | |
| " <td>3.0</td>\n", | |
| " <td>0.239593</td>\n", | |
| " <td>0.718780</td>\n", | |
| " <td>0.025651</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>...</th>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>wallet</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.177335</td>\n", | |
| " <td>0.177335</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Capital</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.177247</td>\n", | |
| " <td>0.177247</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>slant</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.177133</td>\n", | |
| " <td>0.177133</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>soles</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.177132</td>\n", | |
| " <td>0.177132</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>repeatedly</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.177131</td>\n", | |
| " <td>0.177131</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>He'sa</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.177035</td>\n", | |
| " <td>0.177035</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>little</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176945</td>\n", | |
| " <td>0.176945</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>independently</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176882</td>\n", | |
| " <td>0.176882</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>grasped</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176845</td>\n", | |
| " <td>0.176845</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>skills</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176832</td>\n", | |
| " <td>0.176832</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>sphere</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176820</td>\n", | |
| " <td>0.176820</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Cain</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176809</td>\n", | |
| " <td>0.176809</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>deflect</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176771</td>\n", | |
| " <td>0.176771</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bit</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176725</td>\n", | |
| " <td>0.176725</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>chunks</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176717</td>\n", | |
| " <td>0.176717</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>barbs</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176678</td>\n", | |
| " <td>0.176678</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>gravely</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176582</td>\n", | |
| " <td>0.176582</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>perspective</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176530</td>\n", | |
| " <td>0.176530</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>sleeping_bag</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176454</td>\n", | |
| " <td>0.176454</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>leaped</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176303</td>\n", | |
| " <td>0.176303</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>handcuffed</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176248</td>\n", | |
| " <td>0.176248</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>pockets</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176243</td>\n", | |
| " <td>0.176243</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>fragment</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176235</td>\n", | |
| " <td>0.176235</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>apex</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176228</td>\n", | |
| " <td>0.176228</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>looked</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176170</td>\n", | |
| " <td>0.176170</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>gun</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176114</td>\n", | |
| " <td>0.176114</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>lift</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176109</td>\n", | |
| " <td>0.176109</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>Rhino</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176108</td>\n", | |
| " <td>0.176108</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>booster</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.176050</td>\n", | |
| " <td>0.176050</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>luckily</th>\n", | |
| " <td>1.0</td>\n", | |
| " <td>0.175979</td>\n", | |
| " <td>0.175979</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>7189 rows × 4 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " count mean sum std\n", | |
| "word \n", | |
| "beam 4.0 0.259435 1.037739 0.055484\n", | |
| "robotic 3.0 0.438389 1.315166 0.325974\n", | |
| "Great_Wall 3.0 0.308924 0.926773 0.109942\n", | |
| "observatory 3.0 0.304838 0.914514 0.119003\n", | |
| "Kong 3.0 0.294825 0.884476 0.048162\n", | |
| "skull 3.0 0.289561 0.868683 0.029377\n", | |
| "rope 3.0 0.282066 0.846198 0.051396\n", | |
| "bat 3.0 0.279069 0.837208 0.082658\n", | |
| "backboard 3.0 0.274719 0.824156 0.058485\n", | |
| "helmet 3.0 0.273247 0.819742 0.055086\n", | |
| "beams 3.0 0.272344 0.817032 0.066079\n", | |
| "spider 3.0 0.271149 0.813448 0.107322\n", | |
| "dummy 3.0 0.270908 0.812724 0.035488\n", | |
| "space 3.0 0.266982 0.800947 0.043191\n", | |
| "laser 3.0 0.264970 0.794909 0.047325\n", | |
| "device 3.0 0.264215 0.792644 0.115501\n", | |
| "crane 3.0 0.257942 0.773826 0.091387\n", | |
| "balcony 3.0 0.256093 0.768280 0.115100\n", | |
| "cage 3.0 0.253862 0.761586 0.057666\n", | |
| "canister 3.0 0.252995 0.758986 0.044794\n", | |
| "penis 3.0 0.249482 0.748447 0.027952\n", | |
| "elevator 3.0 0.249475 0.748424 0.079156\n", | |
| "Doctor_Who 3.0 0.244777 0.734330 0.055631\n", | |
| "headset 3.0 0.241369 0.724106 0.065166\n", | |
| "arrow 3.0 0.240591 0.721774 0.040477\n", | |
| "steering_wheel 3.0 0.240482 0.721445 0.023518\n", | |
| "LCD_screen 3.0 0.240454 0.721361 0.039021\n", | |
| "Dragon 3.0 0.240302 0.720905 0.010778\n", | |
| "eyeball 3.0 0.240087 0.720261 0.008821\n", | |
| "hilltop 3.0 0.239593 0.718780 0.025651\n", | |
| "... ... ... ... ...\n", | |
| "wallet 1.0 0.177335 0.177335 NaN\n", | |
| "Capital 1.0 0.177247 0.177247 NaN\n", | |
| "slant 1.0 0.177133 0.177133 NaN\n", | |
| "soles 1.0 0.177132 0.177132 NaN\n", | |
| "repeatedly 1.0 0.177131 0.177131 NaN\n", | |
| "He'sa 1.0 0.177035 0.177035 NaN\n", | |
| "little 1.0 0.176945 0.176945 NaN\n", | |
| "independently 1.0 0.176882 0.176882 NaN\n", | |
| "grasped 1.0 0.176845 0.176845 NaN\n", | |
| "skills 1.0 0.176832 0.176832 NaN\n", | |
| "sphere 1.0 0.176820 0.176820 NaN\n", | |
| "Cain 1.0 0.176809 0.176809 NaN\n", | |
| "deflect 1.0 0.176771 0.176771 NaN\n", | |
| "bit 1.0 0.176725 0.176725 NaN\n", | |
| "chunks 1.0 0.176717 0.176717 NaN\n", | |
| "barbs 1.0 0.176678 0.176678 NaN\n", | |
| "gravely 1.0 0.176582 0.176582 NaN\n", | |
| "perspective 1.0 0.176530 0.176530 NaN\n", | |
| "sleeping_bag 1.0 0.176454 0.176454 NaN\n", | |
| "leaped 1.0 0.176303 0.176303 NaN\n", | |
| "handcuffed 1.0 0.176248 0.176248 NaN\n", | |
| "pockets 1.0 0.176243 0.176243 NaN\n", | |
| "fragment 1.0 0.176235 0.176235 NaN\n", | |
| "apex 1.0 0.176228 0.176228 NaN\n", | |
| "looked 1.0 0.176170 0.176170 NaN\n", | |
| "gun 1.0 0.176114 0.176114 NaN\n", | |
| "lift 1.0 0.176109 0.176109 NaN\n", | |
| "Rhino 1.0 0.176108 0.176108 NaN\n", | |
| "booster 1.0 0.176050 0.176050 NaN\n", | |
| "luckily 1.0 0.175979 0.175979 NaN\n", | |
| "\n", | |
| "[7189 rows x 4 columns]" | |
| ] | |
| }, | |
| "execution_count": 73, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "import pandas as pd\n", | |
| "dfs = []\n", | |
| "for w in board['negative']:\n", | |
| " dfs.append(pd.DataFrame.from_records(model.similar_by_word(mapping[w], topn=1000, restrict_vocab=50000), columns=['word', 'similarity']))\n", | |
| "#model.similar_by_word('head', topn=1000)\n", | |
| "\n", | |
| "df = pd.concat(dfs)\n", | |
| "\n", | |
| "df.groupby('word').agg(['count', 'mean', 'sum', 'std']).T.reset_index(level=0, drop=True).T.sort_values(['count', 'mean'], ascending=False)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.5.4" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import random | |
| def get_board(): | |
| cards = [] | |
| with open('../codenames_cards.txt', 'r') as f: | |
| for r in f.readlines(): | |
| cards.append(r.lower().strip().replace(' ', '_')) | |
| sample = random.sample(cards, 18) | |
| return { | |
| 'positive': sample[:9], | |
| 'negative': sample[9:-1], | |
| 'assassin': sample[-1] | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment