Created
March 26, 2017 19:07
-
-
Save AashishTiwari/2ce6acc4abc259728d6f4a084b2851cc to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## NLP on Scraped FDA GOV Website\n", | |
"##### After removing foreign characters repeated texts etc." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Credits:\n", | |
"(http://pydata.org/dc2016/schedule/presentation/11/). To view the video of the presentation on YouTube, see [here](https://www.youtube.com/watch?v=6zm9NC9uRkk)._" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## The FDA GOV dataset" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" What We Do FDA-TRACK: sAgency-wide Program Performance A performance management system that \n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"import os\n", | |
"import codecs\n", | |
"\n", | |
"data_directory = os.path.join('d:', 'IBM_WEX_FDA_GOV')\n", | |
"\n", | |
"businesses_filepath = os.path.join(data_directory,\n", | |
" 'Text_Result.txt')\n", | |
"\n", | |
"with codecs.open(businesses_filepath, encoding='utf_8') as f:\n", | |
" first_business_record = f.readline() \n", | |
"\n", | |
"print(first_business_record)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"intermediate_directory = os.path.join('d:', 'fda_gov_web_ibm_intermediate')\n", | |
"\n", | |
"desc_txt_filepath = os.path.join(intermediate_directory,\n", | |
" 'details_all.txt')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Text from 5,246 lines written to the new txt file.\n", | |
"Wall time: 34 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"if 1 == 1:\n", | |
" row_count = 0\n", | |
"\n", | |
" # create & open a new file in write mode\n", | |
" with codecs.open(desc_txt_filepath, 'w', encoding='utf_8') as desc_txt_file:\n", | |
"\n", | |
" # open the existing review json file\n", | |
" with codecs.open(businesses_filepath, encoding='utf_8') as businesses_file:\n", | |
"\n", | |
" for line_txt in businesses_file.readlines():\n", | |
" desc_txt_file.write(line_txt.replace('\\n', '\\\\n') + '\\n')\n", | |
" row_count += 1\n", | |
"\n", | |
" print(\"Text from {:,} lines written to the new txt file.\".format(row_count))\n", | |
" \n", | |
"else:\n", | |
" \n", | |
" with codecs.open(desc_txt_filepath, encoding='utf_8') as desc_txt_file:\n", | |
" for row_count, line in enumerate(desc_txt_filepath):\n", | |
" pass\n", | |
" \n", | |
" print(\"Text from {:,} lines in the txt file.\".format(row_count + 1))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## spaCy — Industrial-Strength NLP in Python" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import spacy\n", | |
"import pandas as pd\n", | |
"import itertools as it\n", | |
"\n", | |
"nlp = spacy.load('en')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's grab a sample review to play with." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" Committees Print this page Home Advisory Committees About Advisory \n", | |
"\n", | |
" Committees The FDA uses 50 committees and panels to obtain \n", | |
"\n", | |
" independent expert advice on scientific, technical, and policy \n", | |
"\n", | |
" matters. About Advisory Committees Navigate the Advisory Committees \n", | |
"\n", | |
" Section About Advisory Committees How to become a member of an \n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"with codecs.open(desc_txt_filepath, encoding='utf_8') as f:\n", | |
" sample_txt = list(it.islice(f, 3915, 3921))[0:5]\n", | |
" sample_txt = ''.join(sample_txt).replace('\\\\n', '\\n')\n", | |
" \n", | |
"print(sample_txt)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Hand the review text to spaCy, and be prepared to wait..." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 10.2 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"parsed_txt = nlp(sample_txt)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" Committees Print this page Home Advisory Committees About Advisory \n", | |
"\n", | |
" Committees The FDA uses 50 committees and panels to obtain \n", | |
"\n", | |
" independent expert advice on scientific, technical, and policy \n", | |
"\n", | |
" matters. About Advisory Committees Navigate the Advisory Committees \n", | |
"\n", | |
" Section About Advisory Committees How to become a member of an \n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(parsed_txt)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Looks the same! What happened under the hood?\n", | |
"\n", | |
"What about sentence detection and segmentation?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Sentence 1:\n", | |
" Committees Print this page Home Advisory Committees About Advisory \n", | |
"\n", | |
" Committees\n", | |
"\n", | |
"Sentence 2:\n", | |
"The FDA uses 50 committees and panels to obtain \n", | |
"\n", | |
" independent expert advice on scientific, technical, and policy \n", | |
"\n", | |
" matters.\n", | |
"\n", | |
"Sentence 3:\n", | |
"About Advisory Committees Navigate the Advisory Committees \n", | |
"\n", | |
" Section\n", | |
"\n", | |
"Sentence 4:\n", | |
"About Advisory Committees How to become a member of an \n", | |
"\n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"for num, sentence in enumerate(parsed_txt.sents):\n", | |
" print('Sentence {}:'.format(num + 1))\n", | |
" print(sentence)\n", | |
" print('')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"What about named entity detection?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Entity 1: Committees Print - ORG\n", | |
"\n", | |
"Entity 2: Home Advisory Committees About Advisory - ORG\n", | |
"\n", | |
"Entity 3: FDA - ORG\n", | |
"\n", | |
"Entity 4: 50 - CARDINAL\n", | |
"\n", | |
"Entity 5: About Advisory Committees - ORG\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"for num, entity in enumerate(parsed_txt.ents):\n", | |
" print('Entity {}:'.format(num + 1), entity, '-', entity.label_)\n", | |
" print('')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"What about part of speech tagging?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>token_text</th>\n", | |
" <th>part_of_speech</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td></td>\n", | |
" <td>SPACE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Committees</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Print</td>\n", | |
" <td>VERB</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>this</td>\n", | |
" <td>DET</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>page</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>Home</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>Committees</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>About</td>\n", | |
" <td>ADP</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>SPACE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>Committees</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>The</td>\n", | |
" <td>DET</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>FDA</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>uses</td>\n", | |
" <td>VERB</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>50</td>\n", | |
" <td>NUM</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>committees</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>and</td>\n", | |
" <td>CONJ</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>panels</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>to</td>\n", | |
" <td>PART</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>obtain</td>\n", | |
" <td>VERB</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>SPACE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>independent</td>\n", | |
" <td>ADJ</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>expert</td>\n", | |
" <td>ADJ</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>advice</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>on</td>\n", | |
" <td>ADP</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>scientific</td>\n", | |
" <td>ADJ</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>,</td>\n", | |
" <td>PUNCT</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>technical</td>\n", | |
" <td>ADJ</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>,</td>\n", | |
" <td>PUNCT</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>30</th>\n", | |
" <td>and</td>\n", | |
" <td>CONJ</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>31</th>\n", | |
" <td>policy</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>32</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>SPACE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>33</th>\n", | |
" <td>matters</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>34</th>\n", | |
" <td>.</td>\n", | |
" <td>PUNCT</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>35</th>\n", | |
" <td>About</td>\n", | |
" <td>ADP</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>36</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>37</th>\n", | |
" <td>Committees</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>38</th>\n", | |
" <td>Navigate</td>\n", | |
" <td>VERB</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>39</th>\n", | |
" <td>the</td>\n", | |
" <td>DET</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>40</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>41</th>\n", | |
" <td>Committees</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>42</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>SPACE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>43</th>\n", | |
" <td>Section</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>44</th>\n", | |
" <td>About</td>\n", | |
" <td>ADV</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>45</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>46</th>\n", | |
" <td>Committees</td>\n", | |
" <td>PROPN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>47</th>\n", | |
" <td>How</td>\n", | |
" <td>ADV</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>48</th>\n", | |
" <td>to</td>\n", | |
" <td>PART</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>49</th>\n", | |
" <td>become</td>\n", | |
" <td>VERB</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50</th>\n", | |
" <td>a</td>\n", | |
" <td>DET</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>51</th>\n", | |
" <td>member</td>\n", | |
" <td>NOUN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>52</th>\n", | |
" <td>of</td>\n", | |
" <td>ADP</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>53</th>\n", | |
" <td>an</td>\n", | |
" <td>DET</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>54</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>SPACE</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" token_text part_of_speech\n", | |
"0 SPACE\n", | |
"1 Committees NOUN\n", | |
"2 Print VERB\n", | |
"3 this DET\n", | |
"4 page NOUN\n", | |
"5 Home PROPN\n", | |
"6 Advisory PROPN\n", | |
"7 Committees PROPN\n", | |
"8 About ADP\n", | |
"9 Advisory PROPN\n", | |
"10 \\n\\n SPACE\n", | |
"11 Committees PROPN\n", | |
"12 The DET\n", | |
"13 FDA PROPN\n", | |
"14 uses VERB\n", | |
"15 50 NUM\n", | |
"16 committees NOUN\n", | |
"17 and CONJ\n", | |
"18 panels NOUN\n", | |
"19 to PART\n", | |
"20 obtain VERB\n", | |
"21 \\n\\n SPACE\n", | |
"22 independent ADJ\n", | |
"23 expert ADJ\n", | |
"24 advice NOUN\n", | |
"25 on ADP\n", | |
"26 scientific ADJ\n", | |
"27 , PUNCT\n", | |
"28 technical ADJ\n", | |
"29 , PUNCT\n", | |
"30 and CONJ\n", | |
"31 policy NOUN\n", | |
"32 \\n\\n SPACE\n", | |
"33 matters NOUN\n", | |
"34 . PUNCT\n", | |
"35 About ADP\n", | |
"36 Advisory PROPN\n", | |
"37 Committees PROPN\n", | |
"38 Navigate VERB\n", | |
"39 the DET\n", | |
"40 Advisory PROPN\n", | |
"41 Committees PROPN\n", | |
"42 \\n\\n SPACE\n", | |
"43 Section NOUN\n", | |
"44 About ADV\n", | |
"45 Advisory PROPN\n", | |
"46 Committees PROPN\n", | |
"47 How ADV\n", | |
"48 to PART\n", | |
"49 become VERB\n", | |
"50 a DET\n", | |
"51 member NOUN\n", | |
"52 of ADP\n", | |
"53 an DET\n", | |
"54 \\n\\n SPACE" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"token_text = [token.orth_ for token in parsed_txt]\n", | |
"token_pos = [token.pos_ for token in parsed_txt]\n", | |
"\n", | |
"pd.DataFrame(list(zip(token_text, token_pos)),\n", | |
" columns=['token_text', 'part_of_speech'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"What about text normalization, like stemming/lemmatization and shape analysis?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>token_text</th>\n", | |
" <th>token_lemma</th>\n", | |
" <th>token_shape</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Committees</td>\n", | |
" <td>committee</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Print</td>\n", | |
" <td>print</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>this</td>\n", | |
" <td>this</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>page</td>\n", | |
" <td>page</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>Home</td>\n", | |
" <td>home</td>\n", | |
" <td>Xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>advisory</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>Committees</td>\n", | |
" <td>committees</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>About</td>\n", | |
" <td>about</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>advisory</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>Committees</td>\n", | |
" <td>committees</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>The</td>\n", | |
" <td>the</td>\n", | |
" <td>Xxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>FDA</td>\n", | |
" <td>fda</td>\n", | |
" <td>XXX</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>uses</td>\n", | |
" <td>use</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>50</td>\n", | |
" <td>50</td>\n", | |
" <td>dd</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>committees</td>\n", | |
" <td>committee</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>and</td>\n", | |
" <td>and</td>\n", | |
" <td>xxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>panels</td>\n", | |
" <td>panel</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>to</td>\n", | |
" <td>to</td>\n", | |
" <td>xx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>obtain</td>\n", | |
" <td>obtain</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>independent</td>\n", | |
" <td>independent</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>expert</td>\n", | |
" <td>expert</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>advice</td>\n", | |
" <td>advice</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>on</td>\n", | |
" <td>on</td>\n", | |
" <td>xx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>scientific</td>\n", | |
" <td>scientific</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>,</td>\n", | |
" <td>,</td>\n", | |
" <td>,</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>technical</td>\n", | |
" <td>technical</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>,</td>\n", | |
" <td>,</td>\n", | |
" <td>,</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>30</th>\n", | |
" <td>and</td>\n", | |
" <td>and</td>\n", | |
" <td>xxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>31</th>\n", | |
" <td>policy</td>\n", | |
" <td>policy</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>32</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>33</th>\n", | |
" <td>matters</td>\n", | |
" <td>matter</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>34</th>\n", | |
" <td>.</td>\n", | |
" <td>.</td>\n", | |
" <td>.</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>35</th>\n", | |
" <td>About</td>\n", | |
" <td>about</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>36</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>advisory</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>37</th>\n", | |
" <td>Committees</td>\n", | |
" <td>committees</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>38</th>\n", | |
" <td>Navigate</td>\n", | |
" <td>navigate</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>39</th>\n", | |
" <td>the</td>\n", | |
" <td>the</td>\n", | |
" <td>xxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>40</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>advisory</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>41</th>\n", | |
" <td>Committees</td>\n", | |
" <td>committees</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>42</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>43</th>\n", | |
" <td>Section</td>\n", | |
" <td>section</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>44</th>\n", | |
" <td>About</td>\n", | |
" <td>about</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>45</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>advisory</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>46</th>\n", | |
" <td>Committees</td>\n", | |
" <td>committees</td>\n", | |
" <td>Xxxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>47</th>\n", | |
" <td>How</td>\n", | |
" <td>how</td>\n", | |
" <td>Xxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>48</th>\n", | |
" <td>to</td>\n", | |
" <td>to</td>\n", | |
" <td>xx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>49</th>\n", | |
" <td>become</td>\n", | |
" <td>become</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50</th>\n", | |
" <td>a</td>\n", | |
" <td>a</td>\n", | |
" <td>x</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>51</th>\n", | |
" <td>member</td>\n", | |
" <td>member</td>\n", | |
" <td>xxxx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>52</th>\n", | |
" <td>of</td>\n", | |
" <td>of</td>\n", | |
" <td>xx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>53</th>\n", | |
" <td>an</td>\n", | |
" <td>an</td>\n", | |
" <td>xx</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>54</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>\\n\\n</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" token_text token_lemma token_shape\n", | |
"0 \n", | |
"1 Committees committee Xxxxx\n", | |
"2 Print print Xxxxx\n", | |
"3 this this xxxx\n", | |
"4 page page xxxx\n", | |
"5 Home home Xxxx\n", | |
"6 Advisory advisory Xxxxx\n", | |
"7 Committees committees Xxxxx\n", | |
"8 About about Xxxxx\n", | |
"9 Advisory advisory Xxxxx\n", | |
"10 \\n\\n \\n\\n \\n\\n \n", | |
"11 Committees committees Xxxxx\n", | |
"12 The the Xxx\n", | |
"13 FDA fda XXX\n", | |
"14 uses use xxxx\n", | |
"15 50 50 dd\n", | |
"16 committees committee xxxx\n", | |
"17 and and xxx\n", | |
"18 panels panel xxxx\n", | |
"19 to to xx\n", | |
"20 obtain obtain xxxx\n", | |
"21 \\n\\n \\n\\n \\n\\n \n", | |
"22 independent independent xxxx\n", | |
"23 expert expert xxxx\n", | |
"24 advice advice xxxx\n", | |
"25 on on xx\n", | |
"26 scientific scientific xxxx\n", | |
"27 , , ,\n", | |
"28 technical technical xxxx\n", | |
"29 , , ,\n", | |
"30 and and xxx\n", | |
"31 policy policy xxxx\n", | |
"32 \\n\\n \\n\\n \\n\\n \n", | |
"33 matters matter xxxx\n", | |
"34 . . .\n", | |
"35 About about Xxxxx\n", | |
"36 Advisory advisory Xxxxx\n", | |
"37 Committees committees Xxxxx\n", | |
"38 Navigate navigate Xxxxx\n", | |
"39 the the xxx\n", | |
"40 Advisory advisory Xxxxx\n", | |
"41 Committees committees Xxxxx\n", | |
"42 \\n\\n \\n\\n \\n\\n \n", | |
"43 Section section Xxxxx\n", | |
"44 About about Xxxxx\n", | |
"45 Advisory advisory Xxxxx\n", | |
"46 Committees committees Xxxxx\n", | |
"47 How how Xxx\n", | |
"48 to to xx\n", | |
"49 become become xxxx\n", | |
"50 a a x\n", | |
"51 member member xxxx\n", | |
"52 of of xx\n", | |
"53 an an xx\n", | |
"54 \\n\\n \\n\\n \\n\\n" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"token_lemma = [token.lemma_ for token in parsed_txt]\n", | |
"token_shape = [token.shape_ for token in parsed_txt]\n", | |
"\n", | |
"pd.DataFrame(list(zip(token_text, token_lemma, token_shape)),\n", | |
" columns=['token_text', 'token_lemma', 'token_shape'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"What about token-level entity analysis?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>token_text</th>\n", | |
" <th>entity_type</th>\n", | |
" <th>inside_outside_begin</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td></td>\n", | |
" <td>ORG</td>\n", | |
" <td>B</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Committees</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Print</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>this</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>page</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>Home</td>\n", | |
" <td>ORG</td>\n", | |
" <td>B</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>Committees</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>About</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>Committees</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>The</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>FDA</td>\n", | |
" <td>ORG</td>\n", | |
" <td>B</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>uses</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>50</td>\n", | |
" <td>CARDINAL</td>\n", | |
" <td>B</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>committees</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>and</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>panels</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>to</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>obtain</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>independent</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>expert</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>advice</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>on</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>scientific</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>,</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>technical</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>,</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>30</th>\n", | |
" <td>and</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>31</th>\n", | |
" <td>policy</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>32</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>33</th>\n", | |
" <td>matters</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>34</th>\n", | |
" <td>.</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>35</th>\n", | |
" <td>About</td>\n", | |
" <td>ORG</td>\n", | |
" <td>B</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>36</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>37</th>\n", | |
" <td>Committees</td>\n", | |
" <td>ORG</td>\n", | |
" <td>I</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>38</th>\n", | |
" <td>Navigate</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>39</th>\n", | |
" <td>the</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>40</th>\n", | |
" <td>Advisory</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>41</th>\n", | |
" <td>Committees</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>42</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>43</th>\n", | |
" <td>Section</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>44</th>\n", | |
" <td>About</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>45</th>\n", | |
" <td>Advisory</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>46</th>\n", | |
" <td>Committees</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>47</th>\n", | |
" <td>How</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>48</th>\n", | |
" <td>to</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>49</th>\n", | |
" <td>become</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50</th>\n", | |
" <td>a</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>51</th>\n", | |
" <td>member</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>52</th>\n", | |
" <td>of</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>53</th>\n", | |
" <td>an</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>54</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td></td>\n", | |
" <td>O</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" token_text entity_type inside_outside_begin\n", | |
"0 ORG B\n", | |
"1 Committees ORG I\n", | |
"2 Print ORG I\n", | |
"3 this O\n", | |
"4 page O\n", | |
"5 Home ORG B\n", | |
"6 Advisory ORG I\n", | |
"7 Committees ORG I\n", | |
"8 About ORG I\n", | |
"9 Advisory ORG I\n", | |
"10 \\n\\n O\n", | |
"11 Committees O\n", | |
"12 The O\n", | |
"13 FDA ORG B\n", | |
"14 uses O\n", | |
"15 50 CARDINAL B\n", | |
"16 committees O\n", | |
"17 and O\n", | |
"18 panels O\n", | |
"19 to O\n", | |
"20 obtain O\n", | |
"21 \\n\\n O\n", | |
"22 independent O\n", | |
"23 expert O\n", | |
"24 advice O\n", | |
"25 on O\n", | |
"26 scientific O\n", | |
"27 , O\n", | |
"28 technical O\n", | |
"29 , O\n", | |
"30 and O\n", | |
"31 policy O\n", | |
"32 \\n\\n O\n", | |
"33 matters O\n", | |
"34 . O\n", | |
"35 About ORG B\n", | |
"36 Advisory ORG I\n", | |
"37 Committees ORG I\n", | |
"38 Navigate O\n", | |
"39 the O\n", | |
"40 Advisory O\n", | |
"41 Committees O\n", | |
"42 \\n\\n O\n", | |
"43 Section O\n", | |
"44 About O\n", | |
"45 Advisory O\n", | |
"46 Committees O\n", | |
"47 How O\n", | |
"48 to O\n", | |
"49 become O\n", | |
"50 a O\n", | |
"51 member O\n", | |
"52 of O\n", | |
"53 an O\n", | |
"54 \\n\\n O" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"token_entity_type = [token.ent_type_ for token in parsed_txt]\n", | |
"token_entity_iob = [token.ent_iob_ for token in parsed_txt]\n", | |
"\n", | |
"pd.DataFrame(list(zip(token_text, token_entity_type, token_entity_iob)),\n", | |
" columns=['token_text', 'entity_type', 'inside_outside_begin'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"What about a variety of other token-level attributes, such as the relative frequency of tokens, and whether or not a token matches any of these categories?\n", | |
"- stopword\n", | |
"- punctuation\n", | |
"- whitespace\n", | |
"- represents a number\n", | |
"- whether or not the token is included in spaCy's default vocabulary?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>text</th>\n", | |
" <th>log_probability</th>\n", | |
" <th>stop?</th>\n", | |
" <th>punctuation?</th>\n", | |
" <th>whitespace?</th>\n", | |
" <th>number?</th>\n", | |
" <th>out of vocab.?</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td></td>\n", | |
" <td>-11.173082</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Committees</td>\n", | |
" <td>-16.535004</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Print</td>\n", | |
" <td>-13.545116</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>this</td>\n", | |
" <td>-5.361816</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>page</td>\n", | |
" <td>-9.092671</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>Home</td>\n", | |
" <td>-11.417955</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>-15.521417</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>Committees</td>\n", | |
" <td>-16.535004</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>About</td>\n", | |
" <td>-10.633126</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>-15.521417</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>-15.172424</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>Committees</td>\n", | |
" <td>-16.535004</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>The</td>\n", | |
" <td>-5.958707</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>FDA</td>\n", | |
" <td>-12.629486</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>uses</td>\n", | |
" <td>-9.585516</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>50</td>\n", | |
" <td>-9.152124</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>committees</td>\n", | |
" <td>-14.081789</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>and</td>\n", | |
" <td>-4.113108</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>panels</td>\n", | |
" <td>-11.836880</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>to</td>\n", | |
" <td>-3.856022</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>obtain</td>\n", | |
" <td>-11.677555</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>-15.172424</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>independent</td>\n", | |
" <td>-10.750811</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>expert</td>\n", | |
" <td>-10.803404</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>advice</td>\n", | |
" <td>-9.038592</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>on</td>\n", | |
" <td>-5.172736</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>scientific</td>\n", | |
" <td>-10.406111</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>,</td>\n", | |
" <td>-3.454960</td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>technical</td>\n", | |
" <td>-10.738839</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>,</td>\n", | |
" <td>-3.454960</td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>30</th>\n", | |
" <td>and</td>\n", | |
" <td>-4.113108</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>31</th>\n", | |
" <td>policy</td>\n", | |
" <td>-10.023180</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>32</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>-15.172424</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>33</th>\n", | |
" <td>matters</td>\n", | |
" <td>-10.199471</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>34</th>\n", | |
" <td>.</td>\n", | |
" <td>-3.067898</td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>35</th>\n", | |
" <td>About</td>\n", | |
" <td>-10.633126</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>36</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>-15.521417</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>37</th>\n", | |
" <td>Committees</td>\n", | |
" <td>-16.535004</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>38</th>\n", | |
" <td>Navigate</td>\n", | |
" <td>-15.630514</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>39</th>\n", | |
" <td>the</td>\n", | |
" <td>-3.528767</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>40</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>-15.521417</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>41</th>\n", | |
" <td>Committees</td>\n", | |
" <td>-16.535004</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>42</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>-15.172424</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>43</th>\n", | |
" <td>Section</td>\n", | |
" <td>-12.593850</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>44</th>\n", | |
" <td>About</td>\n", | |
" <td>-10.633126</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>45</th>\n", | |
" <td>Advisory</td>\n", | |
" <td>-15.521417</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>46</th>\n", | |
" <td>Committees</td>\n", | |
" <td>-16.535004</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>47</th>\n", | |
" <td>How</td>\n", | |
" <td>-7.879386</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>48</th>\n", | |
" <td>to</td>\n", | |
" <td>-3.856022</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>49</th>\n", | |
" <td>become</td>\n", | |
" <td>-8.776984</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50</th>\n", | |
" <td>a</td>\n", | |
" <td>-3.929788</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>51</th>\n", | |
" <td>member</td>\n", | |
" <td>-10.177588</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>52</th>\n", | |
" <td>of</td>\n", | |
" <td>-4.275874</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>53</th>\n", | |
" <td>an</td>\n", | |
" <td>-6.014852</td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>54</th>\n", | |
" <td>\\n\\n</td>\n", | |
" <td>-4.606561</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" <td>Yes</td>\n", | |
" <td></td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" text log_probability stop? punctuation? whitespace? number? \\\n", | |
"0 -11.173082 Yes \n", | |
"1 Committees -16.535004 \n", | |
"2 Print -13.545116 \n", | |
"3 this -5.361816 Yes \n", | |
"4 page -9.092671 \n", | |
"5 Home -11.417955 \n", | |
"6 Advisory -15.521417 \n", | |
"7 Committees -16.535004 \n", | |
"8 About -10.633126 Yes \n", | |
"9 Advisory -15.521417 \n", | |
"10 \\n\\n -15.172424 Yes \n", | |
"11 Committees -16.535004 \n", | |
"12 The -5.958707 Yes \n", | |
"13 FDA -12.629486 \n", | |
"14 uses -9.585516 \n", | |
"15 50 -9.152124 Yes \n", | |
"16 committees -14.081789 \n", | |
"17 and -4.113108 Yes \n", | |
"18 panels -11.836880 \n", | |
"19 to -3.856022 Yes \n", | |
"20 obtain -11.677555 \n", | |
"21 \\n\\n -15.172424 Yes \n", | |
"22 independent -10.750811 \n", | |
"23 expert -10.803404 \n", | |
"24 advice -9.038592 \n", | |
"25 on -5.172736 Yes \n", | |
"26 scientific -10.406111 \n", | |
"27 , -3.454960 Yes \n", | |
"28 technical -10.738839 \n", | |
"29 , -3.454960 Yes \n", | |
"30 and -4.113108 Yes \n", | |
"31 policy -10.023180 \n", | |
"32 \\n\\n -15.172424 Yes \n", | |
"33 matters -10.199471 \n", | |
"34 . -3.067898 Yes \n", | |
"35 About -10.633126 Yes \n", | |
"36 Advisory -15.521417 \n", | |
"37 Committees -16.535004 \n", | |
"38 Navigate -15.630514 \n", | |
"39 the -3.528767 Yes \n", | |
"40 Advisory -15.521417 \n", | |
"41 Committees -16.535004 \n", | |
"42 \\n\\n -15.172424 Yes \n", | |
"43 Section -12.593850 \n", | |
"44 About -10.633126 Yes \n", | |
"45 Advisory -15.521417 \n", | |
"46 Committees -16.535004 \n", | |
"47 How -7.879386 Yes \n", | |
"48 to -3.856022 Yes \n", | |
"49 become -8.776984 Yes \n", | |
"50 a -3.929788 Yes \n", | |
"51 member -10.177588 \n", | |
"52 of -4.275874 Yes \n", | |
"53 an -6.014852 Yes \n", | |
"54 \\n\\n -4.606561 Yes \n", | |
"\n", | |
" out of vocab.? \n", | |
"0 \n", | |
"1 \n", | |
"2 \n", | |
"3 \n", | |
"4 \n", | |
"5 \n", | |
"6 \n", | |
"7 \n", | |
"8 \n", | |
"9 \n", | |
"10 \n", | |
"11 \n", | |
"12 \n", | |
"13 \n", | |
"14 \n", | |
"15 \n", | |
"16 \n", | |
"17 \n", | |
"18 \n", | |
"19 \n", | |
"20 \n", | |
"21 \n", | |
"22 \n", | |
"23 \n", | |
"24 \n", | |
"25 \n", | |
"26 \n", | |
"27 \n", | |
"28 \n", | |
"29 \n", | |
"30 \n", | |
"31 \n", | |
"32 \n", | |
"33 \n", | |
"34 \n", | |
"35 \n", | |
"36 \n", | |
"37 \n", | |
"38 \n", | |
"39 \n", | |
"40 \n", | |
"41 \n", | |
"42 \n", | |
"43 \n", | |
"44 \n", | |
"45 \n", | |
"46 \n", | |
"47 \n", | |
"48 \n", | |
"49 \n", | |
"50 \n", | |
"51 \n", | |
"52 \n", | |
"53 \n", | |
"54 " | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"token_attributes = [(token.orth_,\n", | |
" token.prob,\n", | |
" token.is_stop,\n", | |
" token.is_punct,\n", | |
" token.is_space,\n", | |
" token.like_num,\n", | |
" token.is_oov)\n", | |
" for token in parsed_txt]\n", | |
"\n", | |
"df = pd.DataFrame(token_attributes,\n", | |
" columns=['text',\n", | |
" 'log_probability',\n", | |
" 'stop?',\n", | |
" 'punctuation?',\n", | |
" 'whitespace?',\n", | |
" 'number?',\n", | |
" 'out of vocab.?'])\n", | |
"\n", | |
"df.loc[:, 'stop?':'out of vocab.?'] = (df.loc[:, 'stop?':'out of vocab.?']\n", | |
" .applymap(lambda x: 'Yes' if x else ''))\n", | |
" \n", | |
"df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If the text you'd like to process is general-purpose English language text (i.e., not domain-specific, like medical literature), spaCy is ready to use out-of-the-box.\n", | |
"\n", | |
"I think it will eventually become a core part of the Python data science ecosystem — it will do for natural language computing what other great libraries have done for numerical computing." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Phrase Modeling" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"_Phrase modeling_ is another approach to learning combinations of tokens that together represent meaningful multi-word concepts. We can develop phrase models by looping over the the words in our reviews and looking for words that _co-occur_ (i.e., appear one after another) together much more frequently than you would expect them to by random chance. The formula our phrase models will use to determine whether two tokens $A$ and $B$ constitute a phrase is:\n", | |
"\n", | |
"$$\\frac{count(A\\ B) - count_{min}}{count(A) * count(B)} * N > threshold$$\n", | |
"\n", | |
"...where:\n", | |
"* $count(A)$ is the number of times token $A$ appears in the corpus\n", | |
"* $count(B)$ is the number of times token $B$ appears in the corpus\n", | |
"* $count(A\\ B)$ is the number of times the tokens $A\\ B$ appear in the corpus *in order*\n", | |
"* $N$ is the total size of the corpus vocabulary\n", | |
"* $count_{min}$ is a user-defined parameter to ensure that accepted phrases occur a minimum number of times\n", | |
"* $threshold$ is a user-defined parameter to control how strong of a relationship between two tokens the model requires before accepting them as a phrase\n", | |
"\n", | |
"Once our phrase model has been trained on our corpus, we can apply it to new text. When our model encounters two tokens in new text that identifies as a phrase, it will merge the two into a single new token.\n", | |
"\n", | |
"Phrase modeling is superficially similar to named entity detection in that you would expect named entities to become phrases in the model (so _new york_ would become *new\\_york*). But you would also expect multi-word expressions that represent common concepts, but aren't specifically named entities (such as _happy hour_) to also become phrases in the model.\n", | |
"\n", | |
"We turn to the indispensible [**gensim**](https://radimrehurek.com/gensim/index.html) library to help us with phrase modeling — the [**Phrases**](https://radimrehurek.com/gensim/models/phrases.html) class in particular." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\aashis_tiwari\\AppData\\Local\\Continuum\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\gensim\\utils.py:855: UserWarning: detected Windows; aliasing chunkize to chunkize_serial\n", | |
" warnings.warn(\"detected Windows; aliasing chunkize to chunkize_serial\")\n" | |
] | |
} | |
], | |
"source": [ | |
"from gensim.models import Phrases\n", | |
"from gensim.models.word2vec import LineSentence" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As we're performing phrase modeling, we'll be doing some iterative data transformation at the same time. Our roadmap for data preparation includes:\n", | |
"\n", | |
"1. Segment text of complete reviews into sentences & normalize text\n", | |
"1. First-order phrase modeling $\\rightarrow$ _apply first-order phrase model to transform sentences_\n", | |
"1. Second-order phrase modeling $\\rightarrow$ _apply second-order phrase model to transform sentences_\n", | |
"1. Apply text normalization and second-order phrase model to text of complete reviews\n", | |
"\n", | |
"We'll use this transformed data as the input for some higher-level modeling approaches in the following sections." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First, let's define a few helper functions that we'll use for text normalization. In particular, the `lemmatized_sentence_corpus` generator function will use spaCy to:\n", | |
"- Iterate over the 1M reviews in the `review_txt_all.txt` we created before\n", | |
"- Segment the reviews into individual sentences\n", | |
"- Remove punctuation and excess whitespace\n", | |
"- Lemmatize the text\n", | |
"\n", | |
"... and do so efficiently in parallel, thanks to spaCy's `nlp.pipe()` function." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def punct_space(token):\n", | |
" \"\"\"\n", | |
" helper function to eliminate tokens\n", | |
" that are pure punctuation or whitespace\n", | |
" \"\"\"\n", | |
" \n", | |
" return token.is_punct or token.is_space\n", | |
"\n", | |
"def line_review(filename):\n", | |
" \"\"\"\n", | |
" generator function to read in reviews from the file\n", | |
" and un-escape the original line breaks in the text\n", | |
" \"\"\"\n", | |
" \n", | |
" with codecs.open(filename, encoding='utf_8') as f:\n", | |
" for review in f:\n", | |
" yield review.replace('\\\\n', '\\n')\n", | |
" \n", | |
"def lemmatized_sentence_corpus(filename):\n", | |
" \"\"\"\n", | |
" generator function to use spaCy to parse reviews,\n", | |
" lemmatize the text, and yield sentences\n", | |
" \"\"\"\n", | |
" \n", | |
" for parsed_txt in nlp.pipe(line_review(filename),\n", | |
" batch_size=10000, n_threads=4):\n", | |
" \n", | |
" for sent in parsed_txt.sents:\n", | |
" yield u' '.join([token.lemma_ for token in sent\n", | |
" if not punct_space(token)])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"unigram_sentences_filepath = os.path.join(intermediate_directory,\n", | |
" 'unigram_sentences_all.txt')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's use the `lemmatized_sentence_corpus` generator to loop over the original review text, segmenting the reviews into individual sentences and normalizing the text. We'll write this data back out to a new file (`unigram_sentences_all`), with one normalized sentence per line. We'll use this data for learning our phrase models." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 2.89 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to execute data prep yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" with codecs.open(unigram_sentences_filepath, 'w', encoding='utf_8') as f:\n", | |
" for sentence in lemmatized_sentence_corpus(desc_txt_filepath):\n", | |
" f.write(sentence + '\\n')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If your data is organized like our `unigram_sentences_all` file now is — a large text file with one document/sentence per line — gensim's [**LineSentence**](https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.LineSentence) class provides a convenient iterator for working with other gensim components. It *streams* the documents/sentences from disk, so that you never have to hold the entire corpus in RAM at once. This allows you to scale your modeling pipeline up to potentially very large corpora." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"unigram_sentences = LineSentence(unigram_sentences_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's take a look at a few sample sentences in our new, transformed file." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"bookmark this page because -PRON- will be update it on a regular basis\n", | |
"\n", | |
"what ' new fda approves cinqair to treat severe asthma change\n", | |
"\n", | |
"course a new approach to opioid pain medication at fda fda permit\n", | |
"\n", | |
"marketing of device that sense optimal time to check patientâs\n", | |
"\n", | |
"eye pressure fda provide $ 2 million in new grant for natural\n", | |
"\n", | |
"history study in rare disease 2016\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"for unigram_sentence in it.islice(unigram_sentences, 3915, 3921):\n", | |
" print(' '.join(unigram_sentence))\n", | |
" print('')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Next, we'll learn a phrase model that will link individual words into two-word phrases. We'd expect words that together represent a specific concept, like \"`ice cream`\", to be linked together to form a new, single token: \"`ice_cream`\"." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"bigram_model_filepath = os.path.join(intermediate_directory, 'bigram_model_all')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 339 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to execute modeling yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" bigram_model = Phrases(unigram_sentences)\n", | |
"\n", | |
" bigram_model.save(bigram_model_filepath)\n", | |
" \n", | |
"# load the finished model from disk\n", | |
"bigram_model = Phrases.load(bigram_model_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now that we have a trained phrase model for word pairs, let's apply it to the review sentences data and explore the results." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"bigram_sentences_filepath = os.path.join(intermediate_directory,\n", | |
" 'bigram_sentences_all.txt')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\aashis_tiwari\\AppData\\Local\\Continuum\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\gensim\\models\\phrases.py:274: UserWarning: For a faster implementation, use the gensim.models.phrases.Phraser class\n", | |
" warnings.warn(\"For a faster implementation, use the gensim.models.phrases.Phraser class\")\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 477 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to execute data prep yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" with codecs.open(bigram_sentences_filepath, 'w', encoding='utf_8') as f:\n", | |
" \n", | |
" for unigram_sentence in unigram_sentences:\n", | |
" \n", | |
" bigram_sentence = ' '.join(bigram_model[unigram_sentence])\n", | |
" \n", | |
" f.write(bigram_sentence + '\\n')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"bigram_sentences = LineSentence(bigram_sentences_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"bookmark this_page because -PRON-_will be update it on a regular basis\n", | |
"\n", | |
"what_' new fda approves cinqair to treat severe asthma change\n", | |
"\n", | |
"course a new approach to opioid pain medication at fda fda permit\n", | |
"\n", | |
"marketing of device that sense optimal time to check patientâs\n", | |
"\n", | |
"eye pressure fda provide $ 2 million in new grant for natural\n", | |
"\n", | |
"history study in rare disease 2016\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"for bigram_sentence in it.islice(bigram_sentences,3915, 3921):\n", | |
" print(' '.join(bigram_sentence))\n", | |
" print('')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Looks like the phrase modeling worked! We now see two-word phrases, such as \"`ice_cream`\" and \"`apple_pie`\", linked together in the text as a single token. Next, we'll train a _second-order_ phrase model. We'll apply the second-order phrase model on top of the already-transformed data, so that incomplete word combinations like \"`vanilla_ice cream`\" will become fully joined to \"`vanilla_ice_cream`\". No disrespect intended to [Vanilla Ice](https://www.youtube.com/watch?v=rog8ou-ZepE), of course." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"trigram_model_filepath = os.path.join(intermediate_directory,\n", | |
" 'trigram_model_all')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 291 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to execute modeling yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" trigram_model = Phrases(bigram_sentences)\n", | |
"\n", | |
" trigram_model.save(trigram_model_filepath)\n", | |
" \n", | |
"# load the finished model from disk\n", | |
"trigram_model = Phrases.load(trigram_model_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll apply our trained second-order phrase model to our first-order transformed sentences, write the results out to a new file, and explore a few of the second-order transformed sentences." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"trigram_sentences_filepath = os.path.join(intermediate_directory,\n", | |
" 'trigram_sentences_all.txt')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\aashis_tiwari\\AppData\\Local\\Continuum\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\gensim\\models\\phrases.py:274: UserWarning: For a faster implementation, use the gensim.models.phrases.Phraser class\n", | |
" warnings.warn(\"For a faster implementation, use the gensim.models.phrases.Phraser class\")\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 449 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to execute data prep yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" with codecs.open(trigram_sentences_filepath, 'w', encoding='utf_8') as f:\n", | |
" \n", | |
" for bigram_sentence in bigram_sentences:\n", | |
" \n", | |
" trigram_sentence = ' '.join(trigram_model[bigram_sentence])\n", | |
" \n", | |
" f.write(trigram_sentence + '\\n')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"trigram_sentences = LineSentence(trigram_sentences_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"bookmark this_page because -PRON-_will be update it on a regular basis\n", | |
"\n", | |
"what_'_new fda approves cinqair to treat severe asthma change\n", | |
"\n", | |
"course a new approach to opioid pain medication at fda fda permit\n", | |
"\n", | |
"marketing of device that sense optimal time to check patientâs\n", | |
"\n", | |
"eye pressure fda provide $ 2 million in new grant for natural\n", | |
"\n", | |
"history study in rare disease 2016\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"for trigram_sentence in it.islice(trigram_sentences, 3915, 3921):\n", | |
" print(' '.join(trigram_sentence))\n", | |
" print('')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Looks like the second-order phrase model was successful. We're now seeing three-word phrases, such as \"`vanilla_ice_cream`\" and \"`cinnamon_ice_cream`\"." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The final step of our text preparation process circles back to the complete text of the reviews. We're going to run the complete text of the reviews through a pipeline that applies our text normalization and phrase models.\n", | |
"\n", | |
"In addition, we'll remove stopwords at this point. _Stopwords_ are very common words, like _a_, _the_, _and_, and so on, that serve functional roles in natural language, but typically don't contribute to the overall meaning of text. Filtering stopwords is a common procedure that allows higher-level NLP modeling techniques to focus on the words that carry more semantic weight.\n", | |
"\n", | |
"Finally, we'll write the transformed text out to a new file, with one review per line." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"trigram_reviews_filepath = os.path.join(intermediate_directory,\n", | |
" 'trigram_transformed_reviews_all.txt')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 37, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\aashis_tiwari\\AppData\\Local\\Continuum\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\gensim\\models\\phrases.py:274: UserWarning: For a faster implementation, use the gensim.models.phrases.Phraser class\n", | |
" warnings.warn(\"For a faster implementation, use the gensim.models.phrases.Phraser class\")\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 3.2 s\n" | |
] | |
} | |
], | |
"source": [ | |
" %%time\n", | |
"# %debug\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to execute data prep yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" with codecs.open(trigram_reviews_filepath, 'w', encoding='utf_8') as f:\n", | |
" \n", | |
" for parsed_desc in nlp.pipe(line_review(desc_txt_filepath),\n", | |
" batch_size=10000, n_threads=4):\n", | |
" \n", | |
" # lemmatize the text, removing punctuation and whitespace\n", | |
" unigram_review = [token.lemma_ for token in parsed_desc\n", | |
" if not punct_space(token)]\n", | |
" \n", | |
" # apply the first-order and second-order phrase models\n", | |
" bigram_review = bigram_model[unigram_review]\n", | |
" trigram_review = trigram_model[bigram_review]\n", | |
" \n", | |
" # remove any remaining stopwords\n", | |
" trigram_review = [term for term in trigram_review\n", | |
" if term not in spacy.en.stop_words.STOP_WORDS]\n", | |
" \n", | |
" \n", | |
" # write the transformed review as a line in the new file\n", | |
" trigram_review = ' '.join(trigram_review)\n", | |
" f.write(trigram_review + '\\n')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's preview the results. We'll grab one review from the file with the original, untransformed text, grab the same review from the file with the normalized and transformed text, and compare the two." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Original:\n", | |
"\n", | |
" Reports, Manuals, & Forms A collection of various FDA reports, \n", | |
"\n", | |
"\n", | |
" guides, and forms. About This Website FDA web policies, \n", | |
"\n", | |
"\n", | |
" accessibility, and support for mobile devices. Innovation at FDA The \n", | |
"\n", | |
"\n", | |
" latest news on FDA efforts to support innovation. Plain Language \n", | |
"\n", | |
"\n", | |
"----\n", | |
"\n", | |
"Transformed:\n", | |
"\n", | |
"report manual amp forms collection fda report\n", | |
"\n", | |
"guide form about_this_website fda web policy\n", | |
"\n", | |
"accessibility support mobile device innovation fda\n", | |
"\n", | |
"late_news fda effort support innovation plain language\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print('Original:' + '\\n')\n", | |
"\n", | |
"for desc in it.islice(line_review(desc_txt_filepath), 23, 27):\n", | |
" print(desc)\n", | |
"\n", | |
"print('----' + '\\n')\n", | |
"print('Transformed:' + '\\n')\n", | |
"\n", | |
"with codecs.open(trigram_reviews_filepath, encoding='utf_8') as f:\n", | |
" for desc in it.islice(f, 23, 27):\n", | |
" print(desc)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You can see that most of the grammatical structure has been scrubbed from the text — capitalization, articles/conjunctions, punctuation, spacing, etc. However, much of the general semantic *meaning* is still present. Also, multi-word concepts such as \"`friday_night`\" and \"`above_average`\" have been joined into single tokens, as expected. The review text is now ready for higher-level modeling. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Topic Modeling with Latent Dirichlet Allocation (_LDA_)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"*Topic modeling* is family of techniques that can be used to describe and summarize the documents in a corpus according to a set of latent \"topics\". For this demo, we'll be using [*Latent Dirichlet Allocation*](http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf) or LDA, a popular approach to topic modeling.\n", | |
"\n", | |
"In many conventional NLP applications, documents are represented a mixture of the individual tokens (words and phrases) they contain. In other words, a document is represented as a *vector* of token counts. There are two layers in this model — documents and tokens — and the size or dimensionality of the document vectors is the number of tokens in the corpus vocabulary. This approach has a number of disadvantages:\n", | |
"* Document vectors tend to be large (one dimension for each token $\\Rightarrow$ lots of dimensions)\n", | |
"* They also tend to be very sparse. Any given document only contains a small fraction of all tokens in the vocabulary, so most values in the document's token vector are 0.\n", | |
"* The dimensions are fully indepedent from each other — there's no sense of connection between related tokens, such as _knife_ and _fork_.\n", | |
"\n", | |
"LDA injects a third layer into this conceptual model. Documents are represented as a mixture of a pre-defined number of *topics*, and the *topics* are represented as a mixture of the individual tokens in the vocabulary. The number of topics is a model hyperparameter selected by the practitioner. LDA makes a prior assumption that the (document, topic) and (topic, token) mixtures follow [*Dirichlet*](https://en.wikipedia.org/wiki/Dirichlet_distribution) probability distributions. This assumption encourages documents to consist mostly of a handful of topics, and topics to consist mostly of a modest set of the tokens." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"LDA is fully unsupervised. The topics are \"discovered\" automatically from the data by trying to maximize the likelihood of observing the documents in your corpus, given the modeling assumptions. They are expected to capture some latent structure and organization within the documents, and often have a meaningful human interpretation for people familiar with the subject material.\n", | |
"\n", | |
"We'll again turn to gensim to assist with data preparation and modeling. In particular, gensim offers a high-performance parallelized implementation of LDA with its [**LdaMulticore**](https://radimrehurek.com/gensim/models/ldamulticore.html) class." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 39, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from gensim.corpora import Dictionary, MmCorpus\n", | |
"from gensim.models.ldamulticore import LdaMulticore\n", | |
"\n", | |
"import pyLDAvis\n", | |
"import pyLDAvis.gensim\n", | |
"import warnings\n", | |
"import _pickle as pickle" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The first step to creating an LDA model is to learn the full vocabulary of the corpus to be modeled. We'll use gensim's [**Dictionary**](https://radimrehurek.com/gensim/corpora/dictionary.html) class for this." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"trigram_dictionary_filepath = os.path.join(intermediate_directory,\n", | |
" 'trigram_dict_all.dict')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 247 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to learn the dictionary yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" trigram_reviews = LineSentence(trigram_reviews_filepath)\n", | |
"\n", | |
" # learn the dictionary by iterating over all of the reviews\n", | |
" trigram_dictionary = Dictionary(trigram_reviews)\n", | |
" \n", | |
" # filter tokens that are very rare or too common from\n", | |
" # the dictionary (filter_extremes) and reassign integer ids (compactify)\n", | |
" trigram_dictionary.filter_extremes(no_below=10, no_above=0.4)\n", | |
" trigram_dictionary.compactify()\n", | |
"\n", | |
" trigram_dictionary.save(trigram_dictionary_filepath)\n", | |
" \n", | |
"# load the finished dictionary from disk\n", | |
"trigram_dictionary = Dictionary.load(trigram_dictionary_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Like many NLP techniques, LDA uses a simplifying assumption known as the [*bag-of-words* model](https://en.wikipedia.org/wiki/Bag-of-words_model). In the bag-of-words model, a document is represented by the counts of distinct terms that occur within it. Additional information, such as word order, is discarded. \n", | |
"\n", | |
"Using the gensim Dictionary we learned to generate a bag-of-words representation for each review. The `trigram_bow_generator` function implements this. We'll save the resulting bag-of-words reviews as a matrix.\n", | |
"\n", | |
"In the following code, \"bag-of-words\" is abbreviated as `bow`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"trigram_bow_filepath = os.path.join(intermediate_directory,\n", | |
" 'trigram_bow_corpus_all.mm')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def trigram_bow_generator(filepath):\n", | |
" \"\"\"\n", | |
" generator function to read reviews from a file\n", | |
" and yield a bag-of-words representation\n", | |
" \"\"\"\n", | |
" \n", | |
" for reason in LineSentence(filepath):\n", | |
" yield trigram_dictionary.doc2bow(reason)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 312 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to build the bag-of-words corpus yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" # generate bag-of-words representations for\n", | |
" # all reviews and save them as a matrix\n", | |
" MmCorpus.serialize(trigram_bow_filepath,\n", | |
" trigram_bow_generator(trigram_reviews_filepath))\n", | |
" \n", | |
"# load the finished bag-of-words corpus from disk\n", | |
"trigram_bow_corpus = MmCorpus(trigram_bow_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"With the bag-of-words corpus, we're finally ready to learn our topic model from the reviews. We simply need to pass the bag-of-words matrix and Dictionary from our previous steps to `LdaMulticore` as inputs, along with the number of topics the model should learn. For this demo, we're asking for 50 topics." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 45, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"lda_model_filepath = os.path.join(intermediate_directory, 'lda_model_all')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 37.2 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to train the LDA model yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" with warnings.catch_warnings():\n", | |
" warnings.simplefilter('ignore')\n", | |
" \n", | |
" # workers => sets the parallelism, and should be\n", | |
" # set to your number of physical cores minus one\n", | |
" lda = LdaMulticore(trigram_bow_corpus,\n", | |
" num_topics=10,\n", | |
" id2word=trigram_dictionary,\n", | |
" workers=1)\n", | |
" \n", | |
" lda.save(lda_model_filepath)\n", | |
" \n", | |
"# load the finished LDA model from disk\n", | |
"lda = LdaMulticore.load(lda_model_filepath)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Our topic model is now trained and ready to use! Since each topic is represented as a mixture of tokens, you can manually inspect which tokens have been grouped together into which topics to try to understand the patterns the model has discovered in the data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 47, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def explore_topic(topic_number, topn=20):\n", | |
" \"\"\"\n", | |
" accept a user-supplied topic number and\n", | |
" print out a formatted list of the top terms\n", | |
" \"\"\"\n", | |
" \n", | |
" print('{:20} {}'.format(u'term', u'frequency') + '\\n')\n", | |
"\n", | |
" for term, frequency in lda.show_topic(topic_number, topn=20):\n", | |
" print('{:20} {:.3f}'.format(term, round(frequency, 3)))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"term frequency\n", | |
"\n", | |
"fda 0.102\n", | |
"report 0.018\n", | |
"information 0.017\n", | |
"safety 0.015\n", | |
"if_you 0.013\n", | |
"national 0.013\n", | |
"2015 0.012\n", | |
"need 0.011\n", | |
"issue 0.011\n", | |
"food 0.010\n", | |
"medical_device 0.009\n", | |
"recall 0.009\n", | |
"use 0.008\n", | |
"health 0.008\n", | |
"product 0.008\n", | |
"drug 0.008\n", | |
"complaint 0.008\n", | |
"microsoft 0.008\n", | |
"list 0.008\n", | |
"program 0.007\n" | |
] | |
} | |
], | |
"source": [ | |
"explore_topic(topic_number=0)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The first topic has strong associations with words like *taco*, *salsa*, *chip*, *burrito*, and *margarita*, as well as a handful of more general words. You might call this the **Mexican food** topic!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"topic_names = {\n", | |
" 0: 'UNK0',\n", | |
" 1: 'UNK1',\n", | |
" 2: 'UNK2',\n", | |
" 3: 'UNK3',\n", | |
" 4: 'UNK4',\n", | |
" 5: 'UNK5',\n", | |
" 6: 'UNK6',\n", | |
" 7: 'UNK7',\n", | |
" 8: 'UNK8',\n", | |
" 9: 'UNK9' \n", | |
"}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 53, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"topic_names_filepath = os.path.join(intermediate_directory, 'topic_names.pkl')\n", | |
"\n", | |
"with open(topic_names_filepath, 'wb') as f:\n", | |
" pickle.dump(topic_names, f)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 54, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"LDAvis_data_filepath = os.path.join(intermediate_directory, 'ldavis_prepared')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 55, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 18.9 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to execute data prep yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" LDAvis_prepared = pyLDAvis.gensim.prepare(lda, trigram_bow_corpus,\n", | |
" trigram_dictionary)\n", | |
"\n", | |
" with open(LDAvis_data_filepath, 'wb') as f:\n", | |
" pickle.dump(LDAvis_prepared, f)\n", | |
" \n", | |
"# load the pre-prepared pyLDAvis data from disk\n", | |
"with open(LDAvis_data_filepath, \"rb\") as f:\n", | |
" LDAvis_prepared = pickle.load(f)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`pyLDAvis.display(...)` displays the topic model visualization in-line in the notebook." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"\n", | |
"<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.css\">\n", | |
"\n", | |
"\n", | |
"<div id=\"ldavis_el247220212014195049402964239\"></div>\n", | |
"<script type=\"text/javascript\">\n", | |
"\n", | |
"var ldavis_el247220212014195049402964239_data = {\"token.table\": {\"Term\": [\"'s\", \"'s\", \"'s\", \"'s\", \"'s\", \"'s\", \"'s\", \"'s\", \"0\", \"0\", \"0\", \"0\", \"0_0\", \"0_0\", \"0_0\", \"0_0\", \"0_0\", \"0_0\", \"1\", \"1\", \"1\", \"1\", \"1\", \"1\", \"13\", \"13\", \"13\", \"13\", \"1_888_463_6332\", \"1_888_463_6332\", \"1_888_463_6332\", \"1_888_463_6332\", \"2\", \"2\", \"2\", \"2\", \"2\", \"2\", \"20\", \"20\", \"20\", \"20\", \"20\", \"2007\", \"2007\", \"2007\", \"2007\", \"2012\", \"2012\", \"2012\", \"2012\", \"2012\", \"2012\", \"2013\", \"2013\", \"2013\", \"2013\", \"2013\", \"2014\", \"2014\", \"2014\", \"2014\", \"2014\", \"2014\", \"2014\", \"2014\", \"2014\", \"2015\", \"2015\", \"2015\", \"2015\", \"2015\", \"2015\", \"2015\", \"2015\", \"2016\", \"2016\", \"2016\", \"2016\", \"2016\", \"2016\", \"2016\", \"2016\", \"2016\", \"2017\", \"2017\", \"2017\", \"2017\", \"2017\", \"2017\", \"2017\", \"2017\", \"2017\", \"220\", \"220\", \"3\", \"3\", \"3\", \"3\", \"3\", \"3\", \"301_796\", \"301_796\", \"301_796\", \"301_796\", \"4\", \"4\", \"4\", \"4\", \"4\", \"4\", \"4\", \"4\", \"7\", \"7\", \"7\", \"7\", \"7\", \"7\", \"7\", \"7\", \"7\", \"9\", \"9\", \"9\", \"9\", \"9\", \"9\", \"accessibility\", \"accessibility\", \"accessibility\", \"accessibility\", \"accessibility\", \"accessibility\", \"accessibility\", \"accessibility\", \"accessible\", \"accessible\", \"accessible\", \"accessible\", \"accessible\", \"accessible\", \"accessible\", \"accessible\", \"act\", \"act\", \"act\", \"act\", \"act\", \"act\", \"act\", \"act\", \"act\", \"action\", \"action\", \"action\", \"action\", \"action\", \"action\", \"activity\", \"activity\", \"activity\", \"activity\", \"activity\", \"activity\", \"activity\", \"address\", \"address\", \"address\", \"address\", \"address\", \"address\", \"address\", \"administrative\", \"administrative\", \"administrative\", \"administrative\", \"administrative\", \"administrative\", \"administrative\", \"advanced\", \"advanced\", \"adverse\", \"adverse\", \"adverse\", \"adverse\", \"adverse\", \"adverse\", \"adverse_event\", \"adverse_event\", \"adverse_event\", \"adverse_event\", \"adverse_event\", \"advisory\", \"advisory\", \"advisory\", \"advisory\", \"advisory_committees\", \"advisory_committees\", \"advisory_committees\", \"advisory_committees\", \"advisory_committees\", \"affairs\", \"affairs\", \"affairs\", \"affairs\", \"affairs\", \"affairs\", \"agency\", \"agency\", \"agency\", \"agency\", \"agency\", \"agency\", \"agency\", \"agency\", \"agency\", \"alerts\", \"alerts\", \"alerts\", \"alerts\", \"american\", \"american\", \"american\", \"american\", \"amp\", \"amp\", \"amp\", \"amp\", \"amp\", \"amp\", \"amp\", \"amp\", \"amp\", \"amp\", \"analysis\", \"analysis\", \"analysis\", \"analysis\", \"and_drug_administration\", \"and_drug_administration\", \"and_drug_administration\", \"and_drug_administration\", \"and_drug_administration\", \"and_drug_administration\", \"and_drug_administration\", \"and_drug_administration\", \"and_human_services\", \"and_human_services\", \"and_human_services\", \"and_human_services\", \"and_human_services\", \"animal\", \"animal\", \"animal\", \"animal\", \"animal\", \"animal\", \"animal\", \"animal\", \"animal_drug\", \"animal_drug\", \"animal_drug\", \"animal_drug\", \"animal_drug\", \"animal_drug\", \"announcement\", \"announcement\", \"announcement\", \"announcement\", \"announcement\", \"announcement\", \"announcement\", \"answer\", \"answer\", \"answer\", \"answer\", \"answer\", \"answer\", \"answer\", \"application\", \"application\", \"application\", \"application\", \"application\", \"application\", \"apply\", \"apply\", \"apply\", \"apply\", \"apply\", \"apply\", \"appropriate\", \"appropriate\", \"appropriate\", \"approval\", \"approval\", \"approval\", \"approval\", \"approval\", \"approval\", \"approval\", \"approval\", \"approval\", \"approval\", \"approve\", \"approve\", \"approve\", \"approve\", \"approve\", \"approve\", \"approve\", \"approve\", \"approve\", \"approved\", \"approved\", \"approved\", \"approved\", \"archive\", \"archive\", \"archive\", \"archive\", \"archive\", \"archive\", \"attachment\", \"attachment\", \"attachment\", \"authority\", \"authority\", \"authority\", \"authority\", \"authority\", \"authority\", \"authority\", \"available\", \"available\", \"available\", \"available\", \"available\", \"available\", \"available\", \"available\", \"available\", \"avenue_silver_spring_md\", \"avenue_silver_spring_md\", \"back_to\", \"back_to\", \"back_to\", \"back_to\", \"back_to\", \"base\", \"base\", \"base\", \"base\", \"base\", \"base\", \"base\", \"begin\", \"begin\", \"biologic\", \"biologic\", \"biologic\", \"biological_product\", \"biological_product\", \"biological_product\", \"biologics\", \"biologics\", \"biologics\", \"biologics\", \"biologics\", \"biologics\", \"biologics\", \"biologics\", \"board\", \"board\", \"board\", \"board\", \"board\", \"board\", \"budget\", \"budget\", \"budget\", \"budget\", \"budget\", \"building\", \"building\", \"building\", \"building\", \"building\", \"business\", \"business\", \"business\", \"business\", \"business\", \"cber\", \"cber\", \"cder\", \"cder\", \"cder\", \"cder\", \"cder\", \"cell\", \"cell\", \"cell\", \"cell\", \"cell\", \"center\", \"center\", \"center\", \"center\", \"center\", \"center\", \"center\", \"center_for_biologics_evaluation\", \"center_for_biologics_evaluation\", \"center_for_biologics_evaluation\", \"center_for_biologics_evaluation\", \"center_for_biologics_evaluation\", \"center_for_tobacco_products\", \"center_for_tobacco_products\", \"center_for_tobacco_products\", \"center_for_tobacco_products\", \"center_for_tobacco_products\", \"certain\", \"certain\", \"certain\", \"certain\", \"certain\", \"certain\", \"certain\", \"change\", \"change\", \"change\", \"change\", \"change\", \"change\", \"change\", \"check\", \"check\", \"check\", \"check\", \"check\", \"cheese\", \"cheese\", \"cheese\", \"chemical\", \"chemical\", \"chemical\", \"chemical\", \"cholesterol\", \"cholesterol\", \"cholesterol\", \"cholesterol\", \"cholesterol\", \"clearance\", \"clearance\", \"clearance\", \"clearance\", \"clinical\", \"clinical\", \"clinical\", \"clinical\", \"clinical\", \"clinical\", \"clinical\", \"clinical_trial\", \"clinical_trial\", \"clinical_trial\", \"clinical_trial\", \"clinical_trial\", \"clinical_trial\", \"clinical_trial\", \"clinical_trial\", \"code\", \"code\", \"code\", \"code\", \"code\", \"code\", \"collect\", \"collect\", \"collect\", \"collect\", \"collect\", \"collect\", \"collect\", \"collection\", \"collection\", \"collection\", \"collection\", \"collection\", \"color_additive\", \"color_additive\", \"color_additive\", \"color_additive\", \"color_additive\", \"combination_product\", \"combination_product\", \"combination_product\", \"combination_product\", \"combination_product\", \"combination_product\", \"come\", \"come\", \"come\", \"comment\", \"comment\", \"comment\", \"comment\", \"comment\", \"comment\", \"comment\", \"comment\", \"comment\", \"comment\", \"comment_on\", \"comment_on\", \"comment_on\", \"comment_on\", \"comment_on\", \"committee_on\", \"committee_on\", \"committees\", \"committees\", \"committees\", \"committees\", \"committees\", \"communications\", \"communications\", \"communications\", \"communications\", \"communications\", \"complaint\", \"complaint\", \"complaint\", \"complaint\", \"complaint\", \"complaint\", \"complaint\", \"compliance\", \"compliance\", \"compliance\", \"compliance\", \"compliance\", \"compliance\", \"compliance\", \"condition\", \"condition\", \"condition\", \"condition\", \"consumer\", \"consumer\", \"consumer\", \"consumer\", \"consumer\", \"consumer\", \"consumer\", \"consumer\", \"consumer\", \"consumer\", \"contact\", \"contact\", \"contact\", \"contact\", \"contact\", \"contact\", \"contact\", \"contact\", \"contact_fda\", \"contact_fda\", \"contact_fda\", \"contact_fda\", \"contact_fda\", \"contact_fda\", \"contact_fda\", \"contact_fda\", \"contact_fda_browse_by\", \"contact_fda_browse_by\", \"contact_fda_browse_by\", \"contact_fda_browse_by\", \"contact_fda_browse_by\", \"contain\", \"contain\", \"contain\", \"contain\", \"contain\", \"contain\", \"control\", \"control\", \"control\", \"control\", \"control\", \"cookie\", \"cookie\", \"cookie\", \"cookie\", \"cookie\", \"coordinator\", \"coordinator\", \"coordinator\", \"coordinator\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"cosmetic\", \"course\", \"course\", \"course\", \"course\", \"course\", \"course\", \"current\", \"current\", \"current\", \"current\", \"current\", \"current\", \"currently\", \"currently\", \"currently\", \"currently\", \"currently\", \"date\", \"date\", \"date\", \"date\", \"date\", \"date\", \"datum\", \"datum\", \"datum\", \"datum\", \"datum\", \"datum\", \"development\", \"development\", \"development\", \"development\", \"development\", \"development\", \"development\", \"development\", \"development\", \"devices\", \"devices\", \"devices\", \"devices\", \"devices\", \"devices\", \"devices\", \"director\", \"director\", \"disability\", \"disability\", \"disability\", \"disability\", \"disability\", \"disability\", \"disease\", \"disease\", \"disease\", \"disease\", \"disease\", \"disease\", \"disease\", \"disease\", \"division_of\", \"division_of\", \"division_of\", \"division_of\", \"division_of\", \"do_not\", \"do_not\", \"do_not\", \"do_not\", \"do_not\", \"do_not\", \"document\", \"document\", \"document\", \"document\", \"document\", \"document\", \"document\", \"document\", \"document\", \"document\", \"download\", \"download\", \"download\", \"draft\", \"draft\", \"draft\", \"draft\", \"draft\", \"drug\", \"drug\", \"drug\", \"drug\", \"drug\", \"drug\", \"drug\", \"drug\", \"drug\", \"drug\", \"drug_evaluation\", \"drug_evaluation\", \"drug_evaluation\", \"e_mail\", \"e_mail\", \"e_mail\", \"early\", \"early\", \"early\", \"effective\", \"effective\", \"effective\", \"effective\", \"effective\", \"effective\", \"effort\", \"effort\", \"effort\", \"effort\", \"effort\", \"effort\", \"electronic\", \"electronic\", \"electronic\", \"electronic\", \"electronic\", \"electronic\", \"electronic\", \"electronic\", \"electronic\", \"electronic\", \"electronically\", \"electronically\", \"electronically\", \"electronically\", \"electronically\", \"electronically\", \"email_print\", \"email_print\", \"email_print\", \"email_print\", \"email_print\", \"email_print\", \"emergency\", \"emergency\", \"emergency\", \"emergency\", \"emergency\", \"emergency\", \"emergency\", \"emergency\", \"employment\", \"employment\", \"employment\", \"employment\", \"employment\", \"en\", \"en\", \"en\", \"enhance\", \"enhance\", \"enhance\", \"enhance\", \"establish\", \"establish\", \"establish\", \"evaluation\", \"evaluation\", \"evaluation\", \"evaluation\", \"evaluation\", \"evaluation\", \"event\", \"event\", \"event\", \"event\", \"event\", \"event\", \"event\", \"event\", \"event\", \"event\", \"expert\", \"expert\", \"expert\", \"expert\", \"expert\", \"export\", \"export\", \"export\", \"export\", \"export\", \"export\", \"facility\", \"facility\", \"facility\", \"fda\", \"fda\", \"fda\", \"fda\", \"fda\", \"fda\", \"fda\", \"fda\", \"fda\", \"fda\", \"fda_'s\", \"fda_'s\", \"fda_'s\", \"fda_'s\", \"fda_'s\", \"fda_'s\", \"fda_'s\", \"fda_'s\", \"fda_'s\", \"fda_archive_combination\", \"fda_archive_combination\", \"fda_on_facebook\", \"fda_on_facebook\", \"fda_on_facebook\", \"fda_on_facebook\", \"fda_on_facebook\", \"fda_photo_on\", \"fda_photo_on\", \"fda_photo_on\", \"fda_photo_on\", \"fda_voice\", \"fda_voice\", \"fda_voice\", \"fdasia\", \"fdasia\", \"fdasia\", \"fdasia\", \"february\", \"february\", \"february\", \"february\", \"february\", \"february\", \"federal\", \"federal\", \"federal\", \"federal\", \"federal\", \"federal\", \"federal\", \"federal\", \"federal\", \"federal\", \"federal_register\", \"federal_register\", \"federal_register\", \"federal_register\", \"federal_register\", \"federal_register\", \"federal_register\", \"field\", \"field\", \"field\", \"field\", \"field\", \"file\", \"file\", \"file\", \"file\", \"file\", \"file\", \"file\", \"file\", \"file\", \"final_rule\", \"final_rule\", \"final_rule\", \"final_rule\", \"final_rule\", \"final_rule\", \"final_rule\", \"final_rule\", \"find\", \"find\", \"find\", \"find\", \"find\", \"find\", \"find\", \"find\", \"find\", \"find\", \"flickr\", \"flickr\", \"flickr\", \"flickr\", \"food\", \"food\", \"food\", \"food\", \"food\", \"food\", \"food\", \"food\", \"food\", \"food\", \"food_safety_modernization\", \"food_safety_modernization\", \"food_safety_modernization\", \"food_safety_modernization\", \"form\", \"form\", \"form\", \"form\", \"form\", \"forms\", \"forms\", \"forms\", \"forms\", \"forms\", \"freedom_of_information\", \"freedom_of_information\", \"freedom_of_information\", \"freedom_of_information\", \"freedom_of_information\", \"function\", \"function\", \"function\", \"function\", \"general\", \"general\", \"general\", \"general\", \"general\", \"general\", \"good\", \"good\", \"good\", \"good\", \"good\", \"good\", \"good\", \"good\", \"government\", \"government\", \"government\", \"government\", \"government\", \"guidance\", \"guidance\", \"guidance\", \"guidance\", \"guidance\", \"guidance\", \"guidance\", \"guidance\", \"guidance\", \"guidance\", \"guidance_documents\", \"guidance_documents\", \"health\", \"health\", \"health\", \"health\", \"health\", \"health\", \"health\", \"health\", \"health\", \"health\", \"health_care\", \"health_care\", \"health_care\", \"health_professional\", \"health_professional\", \"health_professional\", \"health_professionals_science_amp\", \"health_professionals_science_amp\", \"health_professionals_science_amp\", \"healthcare\", \"healthcare\", \"healthcare\", \"healthcare\", \"heart\", \"heart\", \"heart\", \"heart\", \"help\", \"help\", \"help\", \"help\", \"help\", \"help\", \"help\", \"help\", \"help\", \"house\", \"house\", \"house\", \"human\", \"human\", \"human\", \"human\", \"human\", \"human\", \"human\", \"human\", \"human\", \"human\", \"icsrs\", \"icsrs\", \"icsrs\", \"icsrs\", \"identify\", \"identify\", \"identify\", \"identify\", \"if_you\", \"if_you\", \"if_you\", \"if_you\", \"if_you\", \"if_you\", \"if_you\", \"imaging\", \"imaging\", \"imaging\", \"imaging\", \"impact\", \"impact\", \"impact\", \"impact\", \"impact\", \"impact\", \"impact\", \"import\", \"import\", \"import\", \"import\", \"import\", \"import\", \"import\", \"important\", \"important\", \"important\", \"important\", \"important\", \"important\", \"include\", \"include\", \"include\", \"include\", \"include\", \"include\", \"include\", \"include\", \"include\", \"include\", \"indicate\", \"indicate\", \"indicate\", \"indicate\", \"indicate\", \"industry\", \"industry\", \"industry\", \"industry\", \"industry\", \"industry\", \"industry\", \"industry\", \"inform\", \"inform\", \"inform\", \"inform\", \"information\", \"information\", \"information\", \"information\", \"information\", \"information\", \"information\", \"information\", \"information\", \"information\", \"initiative\", \"initiative\", \"initiative\", \"initiative\", \"initiative\", \"initiative\", \"initiative\", \"initiative\", \"initiative\", \"inspection\", \"inspection\", \"inspection\", \"inspection\", \"inspection\", \"inspection\", \"instruction\", \"instruction\", \"instruction\", \"instruction\", \"instruction\", \"international_programs\", \"international_programs\", \"international_programs\", \"international_programs\", \"international_programs\", \"internet\", \"internet\", \"internet\", \"internet\", \"internet\", \"investigation\", \"investigation\", \"investigation\", \"investigation\", \"investigation\", \"investigation\", \"investigation\", \"investigator\", \"investigator\", \"investigator\", \"investigator\", \"investigator\", \"investigator\", \"issue\", \"issue\", \"issue\", \"issue\", \"issue\", \"issue\", \"issue\", \"issue\", \"issue\", \"it_more_sharing_option\", \"it_more_sharing_option\", \"it_more_sharing_option\", \"it_more_sharing_option\", \"it_more_sharing_option\", \"it_more_sharing_option\", \"it_more_sharing_option\", \"january\", \"january\", \"january\", \"january\", \"january\", \"january\", \"january\", \"january\", \"july\", \"july\", \"july\", \"july\", \"july\", \"july\", \"kb\", \"kb\", \"kb\", \"kb\", \"kb\", \"kb\", \"kb\", \"kb\", \"key\", \"key\", \"key\", \"know\", \"know\", \"know\", \"know\", \"know\", \"know\", \"know\", \"labeling\", \"labeling\", \"labeling\", \"labeling\", \"late\", \"late\", \"late\", \"late\", \"late\", \"late\", \"late\", \"late\", \"law\", \"law\", \"law\", \"law\", \"law\", \"law\", \"law\", \"law\", \"law\", \"leadership\", \"leadership\", \"leadership\", \"leadership\", \"leadership\", \"learn\", \"learn\", \"learn\", \"learn\", \"learn_more\", \"learn_more\", \"learn_more\", \"learn_more\", \"legislation\", \"legislation\", \"legislation\", \"legislation\", \"legislation\", \"letter\", \"letter\", \"letter\", \"letter\", \"linkedin_pin_it_email\", \"linkedin_pin_it_email\", \"linkedin_pin_it_email\", \"list\", \"list\", \"list\", \"list\", \"list\", \"list\", \"list\", \"listing_of\", \"listing_of\", \"listing_of\", \"listing_of\", \"listing_of\", \"local\", \"local\", \"local\", \"long\", \"long\", \"long\", \"long\", \"long\", \"long\", \"look\", \"look\", \"look\", \"look\", \"look\", \"m.d.\", \"m.d.\", \"m.d.\", \"m.d.\", \"m.d.\", \"m.d.\", \"m.d._before\", \"m.d._before\", \"m.d._before\", \"m.d._before\", \"mail\", \"mail\", \"mail\", \"mail\", \"maintain\", \"maintain\", \"maintain\", \"management\", \"management\", \"management\", \"management\", \"management\", \"management\", \"management\", \"march\", \"march\", \"march\", \"march\", \"march\", \"march\", \"march\", \"march\", \"market\", \"market\", \"market\", \"market\", \"market\", \"market\", \"market\", \"media\", \"media\", \"media\", \"media\", \"media\", \"media\", \"medical_device\", \"medical_device\", \"medical_device\", \"medical_device\", \"medical_device\", \"medical_device\", \"medical_devices\", \"medical_devices\", \"medical_devices\", \"medical_devices\", \"medical_devices\", \"medical_devices\", \"medical_devices\", \"medwatch\", \"medwatch\", \"medwatch\", \"medwatch\", \"medwatch\", \"medwatch\", \"medwatch_safety_alerts_news\", \"medwatch_safety_alerts_news\", \"medwatch_safety_alerts_news\", \"medwatch_safety_alerts_news\", \"meeting\", \"meeting\", \"meeting\", \"meeting\", \"meeting\", \"meeting\", \"meeting\", \"meeting\", \"meeting_conference\", \"meeting_conference\", \"meeting_conference\", \"meeting_conference\", \"meeting_conference\", \"microsoft\", \"microsoft\", \"microsoft\", \"microsoft\", \"milk\", \"milk\", \"more_in\", \"more_in\", \"more_in\", \"more_in\", \"more_in\", \"more_in\", \"more_in\", \"more_in\", \"more_sharing_option_linkedin\", \"more_sharing_option_linkedin\", \"more_sharing_option_linkedin\", \"national\", \"national\", \"national\", \"national\", \"national\", \"national\", \"national\", \"national\", \"national\", \"navigate_the\", \"navigate_the\", \"navigate_the\", \"navigate_the\", \"navigate_the\", \"need\", \"need\", \"need\", \"need\", \"need\", \"need\", \"need\", \"need\", \"new\", \"new\", \"new\", \"new\", \"new\", \"new\", \"new\", \"new\", \"new\", \"news\", \"news\", \"news\", \"news\", \"news\", \"news\", \"news\", \"news\", \"news\", \"news\", \"news_amp_event\", \"news_amp_event\", \"news_amp_event\", \"news_amp_event\", \"news_amp_event\", \"news_amp_event\", \"no_fear_act\", \"no_fear_act\", \"no_fear_act\", \"no_fear_act\", \"no_fear_act\", \"note\", \"note\", \"note\", \"note\", \"note\", \"note\", \"number_of\", \"number_of\", \"number_of\", \"number_of\", \"number_of\", \"number_of\", \"number_of\", \"nutrition\", \"nutrition\", \"nutrition\", \"nutrition\", \"office_of\", \"office_of\", \"office_of\", \"office_of\", \"office_of\", \"office_of\", \"office_of\", \"office_of\", \"office_of\", \"office_of\", \"official\", \"official\", \"official\", \"official\", \"official\", \"on_twitter_follow\", \"on_twitter_follow\", \"on_twitter_follow\", \"on_twitter_follow\", \"online\", \"online\", \"online\", \"online\", \"online\", \"online\", \"open\", \"open\", \"open\", \"open\", \"open\", \"operation\", \"operation\", \"operation\", \"option\", \"option\", \"option\", \"option\", \"option_linkedin_pin_it\", \"option_linkedin_pin_it\", \"option_linkedin_pin_it\", \"option_linkedin_pin_it\", \"option_linkedin_pin_it\", \"order\", \"order\", \"order\", \"order\", \"order\", \"order\", \"our_website\", \"our_website\", \"our_website\", \"our_website\", \"our_website\", \"our_website\", \"outbreak\", \"outbreak\", \"outbreak\", \"outbreak\", \"outbreak\", \"outbreaks\", \"outbreaks\", \"outreach\", \"outreach\", \"outreach\", \"outreach\", \"page\", \"page\", \"page\", \"page\", \"page\", \"page\", \"page\", \"page\", \"page\", \"page\", \"page_last_updated\", \"page_last_updated\", \"page_last_updated\", \"page_last_updated\", \"page_last_updated\", \"page_last_updated\", \"page_last_updated\", \"page_last_updated\", \"page_last_updated\", \"partner\", \"partner\", \"partner\", \"partner\", \"partner\", \"patient\", \"patient\", \"patient\", \"patient\", \"patient\", \"patient\", \"patient\", \"patient\", \"patient\", \"patient\", \"patients\", \"patients\", \"patients\", \"patients\", \"patients\", \"pdf\", \"pdf\", \"pdf\", \"pdf\", \"pdf\", \"pdf\", \"pdf\", \"pdf\", \"pediatric\", \"pediatric\", \"pediatric\", \"pediatric\", \"pediatric\", \"pediatric\", \"pediatric\", \"person\", \"person\", \"person\", \"person\", \"person\", \"person\", \"pet\", \"pet\", \"pet\", \"pet\", \"ph.d.\", \"ph.d.\", \"ph.d.\", \"plan\", \"plan\", \"plan\", \"plan\", \"player\", \"player\", \"player\", \"podcasts\", \"podcasts\", \"podcasts\", \"policy\", \"policy\", \"policy\", \"policy\", \"policy\", \"policy\", \"policy\", \"popular\", \"popular\", \"potential\", \"potential\", \"potential\", \"potential\", \"potential\", \"practice\", \"practice\", \"practice\", \"practice\", \"practice\", \"preparedness\", \"preparedness\", \"preparedness\", \"preparedness\", \"preparedness\", \"preparedness\", \"prescription_drug\", \"prescription_drug\", \"prescription_drug\", \"press\", \"press\", \"press\", \"press\", \"press\", \"prevention\", \"prevention\", \"prevention\", \"prevention\", \"prevention\", \"print_this_page_home\", \"print_this_page_home\", \"print_this_page_home\", \"print_this_page_home\", \"print_this_page_home\", \"privacy\", \"privacy\", \"privacy\", \"privacy\", \"privacy\", \"privacy\", \"privacy\", \"privacy\", \"problem\", \"problem\", \"problem\", \"problem\", \"problem\", \"problem\", \"problem\", \"problem_with\", \"problem_with\", \"problem_with\", \"problem_with\", \"process\", \"process\", \"process\", \"process\", \"process\", \"process\", \"process\", \"process\", \"produce\", \"produce\", \"produce\", \"produce\", \"produce\", \"product\", \"product\", \"product\", \"product\", \"product\", \"product\", \"product\", \"product\", \"product\", \"product\", \"products\", \"products\", \"products\", \"products\", \"products\", \"products\", \"products\", \"products\", \"products\", \"professional\", \"professional\", \"professional\", \"professional\", \"professional\", \"professional\", \"program\", \"program\", \"program\", \"program\", \"program\", \"program\", \"program\", \"program\", \"program\", \"program\", \"protect\", \"protect\", \"protect\", \"protect\", \"protect\", \"protect\", \"protect\", \"protect\", \"protect\", \"protection\", \"protection\", \"protection\", \"protection\", \"protection\", \"provide\", \"provide\", \"provide\", \"provide\", \"provide\", \"provide\", \"provide\", \"provide\", \"provide\", \"provide\", \"provision\", \"provision\", \"provision\", \"provision\", \"public\", \"public\", \"public\", \"public\", \"public\", \"public\", \"public\", \"public\", \"public\", \"public\", \"publications\", \"publications\", \"publications\", \"publications\", \"radiation\", \"radiation\", \"radiation\", \"radiation\", \"radiation\", \"radiation_emitting\", \"radiation_emitting\", \"radiation_emitting\", \"radiological_health\", \"radiological_health\", \"radiological_health\", \"radiological_health\", \"radiological_health\", \"reaction\", \"reaction\", \"recall\", \"recall\", \"recall\", \"recall\", \"recall\", \"recall\", \"recall\", \"recall\", \"recalls\", \"recalls\", \"recalls\", \"recalls_market_withdrawals_amp\", \"recalls_market_withdrawals_amp\", \"receive\", \"receive\", \"receive\", \"receive\", \"receive\", \"receive\", \"receive\", \"receive\", \"recently\", \"recently\", \"recently\", \"recently\", \"record\", \"record\", \"record\", \"record\", \"record\", \"reduce\", \"reduce\", \"reduce\", \"reduce\", \"register\", \"register\", \"register\", \"registration\", \"registration\", \"registration\", \"registration\", \"regulated_products\", \"regulated_products\", \"regulated_products\", \"regulated_products\", \"regulation\", \"regulation\", \"regulation\", \"regulation\", \"regulation\", \"regulation\", \"regulation\", \"regulation\", \"regulation\", \"regulations\", \"regulations\", \"regulations\", \"regulations\", \"regulations\", \"regulations\", \"regulations\", \"regulations\", \"regulatory\", \"regulatory\", \"regulatory\", \"regulatory\", \"regulatory\", \"regulatory\", \"regulatory\", \"regulatory\", \"regulatory\", \"regulatory\", \"report\", \"report\", \"report\", \"report\", \"report\", \"report\", \"report\", \"report\", \"report\", \"report\", \"reportable_food_registry\", \"reportable_food_registry\", \"reportable_food_registry\", \"reporting\", \"reporting\", \"reporting\", \"reporting\", \"request\", \"request\", \"request\", \"request\", \"request\", \"request\", \"request\", \"request\", \"request\", \"request\", \"require\", \"require\", \"require\", \"require\", \"require\", \"require\", \"require\", \"requirement\", \"requirement\", \"requirement\", \"requirement\", \"requirement\", \"requirement\", \"requirement\", \"research\", \"research\", \"research\", \"research\", \"research\", \"research\", \"research\", \"research\", \"resource\", \"resource\", \"resource\", \"resource\", \"resource\", \"resource\", \"resource\", \"resources\", \"resources\", \"resources\", \"resources\", \"resources\", \"response\", \"response\", \"response\", \"response\", \"response\", \"response\", \"response\", \"response\", \"response\", \"result\", \"result\", \"result\", \"result\", \"result\", \"rfr\", \"rfr\", \"rfr\", \"rfr\", \"right\", \"right\", \"right\", \"right\", \"robert\", \"robert\", \"robert\", \"robert\", \"robert\", \"robert\", \"safety\", \"safety\", \"safety\", \"safety\", \"safety\", \"safety\", \"safety\", \"safety\", \"safety\", \"safety\", \"safety_alerts\", \"safety_alerts\", \"safety_alerts\", \"safety_alerts\", \"safety_alerts\", \"science\", \"science\", \"science\", \"science\", \"science\", \"science\", \"science\", \"science\", \"science\", \"science\", \"science_amp_research\", \"science_amp_research\", \"science_amp_research\", \"science_amp_research\", \"science_amp_research\", \"screening\", \"screening\", \"screening\", \"screening\", \"screening\", \"screening\", \"search\", \"search\", \"search\", \"search\", \"search\", \"search\", \"search\", \"search\", \"section\", \"section\", \"section\", \"section\", \"section\", \"section\", \"section\", \"section\", \"section\", \"section_508\", \"section_508\", \"section_508\", \"section_508\", \"section_508\", \"section_508\", \"security\", \"security\", \"security\", \"security\", \"security\", \"sentinel\", \"sentinel\", \"sentinel\", \"sentinel\", \"sentinel\", \"service\", \"service\", \"service\", \"service\", \"service\", \"service\", \"service\", \"service\", \"session\", \"session\", \"session\", \"session\", \"share\", \"share\", \"share\", \"share\", \"share\", \"share\", \"share\", \"share_tweet\", \"share_tweet\", \"share_tweet\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"share_tweet_linkedin_pin\", \"silver_spring_md_20993\", \"silver_spring_md_20993\", \"silver_spring_md_20993\", \"silver_spring_md_20993\", \"silver_spring_md_20993\", \"silver_spring_md_20993\", \"site\", \"site\", \"site\", \"site\", \"site\", \"site\", \"site\", \"site_map\", \"site_map\", \"site_map\", \"site_map\", \"site_map\", \"small\", \"small\", \"specific\", \"specific\", \"specific\", \"specific\", \"specific\", \"sponsor\", \"sponsor\", \"sponsor\", \"sponsor\", \"sponsor\", \"sponsor\", \"spotlight\", \"spotlight\", \"spotlight\", \"spotlight\", \"spotlight\", \"spotlight\", \"spotlight\", \"staff\", \"staff\", \"staff\", \"staff\", \"staff\", \"staff\", \"stakeholder\", \"stakeholder\", \"stakeholder\", \"stakeholder\", \"standard\", \"standard\", \"standard\", \"standard\", \"standard\", \"standard\", \"standard\", \"standards\", \"standards\", \"standards\", \"standards\", \"standards\", \"standards\", \"standards\", \"standards\", \"standards\", \"state\", \"state\", \"state\", \"state\", \"state\", \"state\", \"stay\", \"stay\", \"stay\", \"stay\", \"student\", \"student\", \"student\", \"student\", \"submissions\", \"submissions\", \"submissions\", \"submit\", \"submit\", \"submit\", \"submit\", \"submit\", \"submit\", \"submit\", \"submit\", \"submit\", \"submit\", \"subscribe\", \"subscribe\", \"subscribe\", \"subscribe\", \"subscribe\", \"subscribe_to\", \"subscribe_to\", \"subscribe_to\", \"subscribe_to\", \"subscribe_to\", \"supplement\", \"supplement\", \"supplement\", \"support\", \"support\", \"support\", \"support\", \"support\", \"support\", \"support\", \"system\", \"system\", \"system\", \"system\", \"system\", \"system\", \"system\", \"system\", \"technical\", \"technical\", \"technical\", \"technical\", \"technical\", \"technical\", \"technology\", \"technology\", \"technology\", \"technology\", \"technology\", \"technology\", \"technology\", \"technology\", \"test\", \"test\", \"test\", \"test\", \"test\", \"test\", \"test\", \"test\", \"testimony\", \"testimony\", \"testimony\", \"testimony\", \"testimony\", \"testimony\", \"testimony\", \"therapy\", \"therapy\", \"therapy\", \"therapy\", \"therapy\", \"therapy\", \"therapy\", \"this_page\", \"this_page\", \"this_page\", \"this_page\", \"time\", \"time\", \"time\", \"time\", \"time\", \"time\", \"time\", \"time\", \"tissue\", \"tissue\", \"tissue\", \"tissue\", \"tissue\", \"tobacco\", \"tobacco\", \"tobacco\", \"tobacco\", \"tobacco\", \"tobacco\", \"tobacco\", \"tobacco\", \"tobacco_product\", \"tobacco_product\", \"tobacco_product\", \"tobacco_product\", \"tobacco_products\", \"tobacco_products\", \"tobacco_products\", \"tobacco_products\", \"tobacco_products\", \"tobacco_products\", \"tobacco_products\", \"tobacco_products\", \"training\", \"training\", \"training\", \"training\", \"training\", \"training\", \"training\", \"training\", \"training\", \"treat\", \"treat\", \"treat\", \"treat\", \"treat\", \"treatment\", \"treatment\", \"treatment\", \"treatment\", \"treatment\", \"treatment\", \"treatment\", \"treatment\", \"treatment\", \"trial\", \"trial\", \"trial\", \"trial\", \"tribal\", \"tribal\", \"tribal\", \"tribal\", \"tribal\", \"type\", \"type\", \"type\", \"type\", \"type\", \"type\", \"u.s._department_of\", \"u.s._department_of\", \"u.s._department_of\", \"u.s._department_of\", \"u.s._department_of\", \"u.s._food\", \"u.s._food\", \"u.s._food\", \"u.s._food\", \"u.s._food\", \"u.s._food\", \"undeclared\", \"undeclared\", \"undeclared\", \"upcoming\", \"upcoming\", \"upcoming\", \"upcoming\", \"upcoming\", \"upcoming\", \"upcoming\", \"update\", \"update\", \"update\", \"update\", \"update\", \"update\", \"update\", \"update\", \"update\", \"update\", \"use\", \"use\", \"use\", \"use\", \"use\", \"use\", \"use\", \"use\", \"use\", \"use\", \"user\", \"user\", \"user\", \"user\", \"user\", \"user\", \"user\", \"user\", \"veterinary\", \"veterinary\", \"veterinary\", \"veterinary\", \"veterinary\", \"veterinary\", \"veterinary\", \"veterinary_cosmetics_tobacco_products\", \"veterinary_cosmetics_tobacco_products\", \"veterinary_cosmetics_tobacco_products\", \"veterinary_cosmetics_tobacco_products\", \"veterinary_cosmetics_tobacco_products\", \"veterinary_cosmetics_tobacco_products\", \"veterinary_cosmetics_tobacco_products\", \"veterinary_cosmetics_tobacco_products\", \"video\", \"video\", \"video\", \"video\", \"video\", \"video\", \"video\", \"video\", \"view\", \"view\", \"view\", \"view\", \"view\", \"view\", \"view\", \"view\", \"view\", \"violation\", \"violation\", \"violation\", \"visit\", \"visit\", \"visit\", \"visit\", \"visit\", \"visit\", \"web\", \"web\", \"web\", \"web\", \"web\", \"web\", \"web\", \"web\", \"web_site\", \"web_site\", \"web_site\", \"web_site\", \"web_site\", \"website\", \"website\", \"website\", \"website\", \"website\", \"website\", \"website\", \"website\", \"website\", \"website\", \"what_'_new\", \"what_'_new\", \"what_'_new\", \"what_'_new\", \"what_'_new\", \"what_'_new\", \"what_'_new\", \"wide\", \"wide\", \"wide\", \"wide\", \"wide\", \"windows\", \"windows\", \"windows\", \"windows\", \"windows\", \"work\", \"work\", \"work\", \"work\", \"work\", \"workshops\", \"workshops\", \"workshops\", \"workshops\", \"workshops\", \"you_should\", \"you_should\", \"you_should\", \"you_should\", \"|\", \"|\", \"|\", \"|\", \"\\u00e2\\u0080\\u0093\", \"\\u00e2\\u0080\\u0093\", \"\\u00e2\\u0080\\u0093\", \"\\u00e2\\u0080\\u0093\", \"\\u00e2\\u0080\\u0093\"], \"Freq\": [0.26980558648063885, 0.08301710353250426, 0.020754275883126064, 0.04150855176625213, 0.14527993118188245, 0.08301710353250426, 0.20754275883126064, 0.14527993118188245, 0.11895140558037816, 0.0713708433482269, 0.2141125300446807, 0.5709667467858152, 0.05732968905125745, 0.08599453357688618, 0.08599453357688618, 0.08599453357688618, 0.4586375124100596, 0.2293187562050298, 0.04550553232302052, 0.05688191540377566, 0.11376383080755131, 0.04550553232302052, 0.19339851237283723, 0.523313621714736, 0.21980292084494135, 0.21980292084494135, 0.1318817525069648, 0.3956452575208944, 0.4372904788436121, 0.18741020521869092, 0.1249401368124606, 0.18741020521869092, 0.09147244152940617, 0.04573622076470309, 0.015245406921567697, 0.06098162768627079, 0.350644359196057, 0.4268713938038955, 0.048607252599114484, 0.048607252599114484, 0.3402507681938014, 0.38885802079291587, 0.14582175779734347, 0.10454023910655205, 0.10454023910655205, 0.10454023910655205, 0.6272414346393123, 0.3095756008547509, 0.02381350775805776, 0.07144052327417327, 0.35720261637086637, 0.21432156982251982, 0.02381350775805776, 0.15056403801174637, 0.07528201900587318, 0.02509400633529106, 0.7026321773881498, 0.02509400633529106, 0.49484991758531127, 0.16082622321522616, 0.06185623969816391, 0.012371247939632781, 0.07422748763779669, 0.06185623969816391, 0.024742495879265563, 0.08659873557742948, 0.012371247939632781, 0.07000748376724895, 0.36403891558969453, 0.04200449026034937, 0.08400898052069874, 0.04200449026034937, 0.3220344253293452, 0.04200449026034937, 0.04200449026034937, 0.025546808777315553, 0.12773404388657775, 0.510936175546311, 0.051093617554631106, 0.012773404388657776, 0.11496063949791999, 0.025546808777315553, 0.08941383072060444, 0.038320213165973324, 0.05535002020267617, 0.15498005656749328, 0.011070004040535233, 0.2324700848512399, 0.1328400484864228, 0.1660500606080285, 0.011070004040535233, 0.011070004040535233, 0.2324700848512399, 0.22977354900126312, 0.45954709800252624, 0.0685164821149491, 0.1370329642298982, 0.0228388273716497, 0.0685164821149491, 0.5481318569195928, 0.1370329642298982, 0.041529291890859715, 0.16611716756343886, 0.3737636270177374, 0.3737636270177374, 0.054017646859650964, 0.036011764573100645, 0.054017646859650964, 0.10803529371930193, 0.054017646859650964, 0.10803529371930193, 0.30609999887135547, 0.2520823520117045, 0.03761348071586006, 0.07522696143172013, 0.11284044214758018, 0.11284044214758018, 0.03761348071586006, 0.11284044214758018, 0.3385213264427406, 0.07522696143172013, 0.11284044214758018, 0.046738933034282594, 0.5141282633771085, 0.046738933034282594, 0.14021679910284776, 0.09347786606856519, 0.14021679910284776, 0.03141401178781198, 0.08377069810083194, 0.06282802357562396, 0.13612738441385192, 0.13612738441385192, 0.18848407072687187, 0.03141401178781198, 0.3350827924033278, 0.16656964333436278, 0.2220928577791504, 0.27761607222393797, 0.0277616072223938, 0.08328482166718139, 0.08328482166718139, 0.08328482166718139, 0.08328482166718139, 0.07686905856060416, 0.057651793920453116, 0.07686905856060416, 0.17295538176135936, 0.09608632320075519, 0.09608632320075519, 0.057651793920453116, 0.057651793920453116, 0.30747623424241666, 0.1163777764903839, 0.08728333236778793, 0.08728333236778793, 0.4364166618389396, 0.029094444122595976, 0.26184999710336376, 0.05422504368017265, 0.08133756552025898, 0.13556260920043162, 0.4338003494413812, 0.08133756552025898, 0.13556260920043162, 0.05422504368017265, 0.02839060542865754, 0.05678121085731508, 0.4826402922871782, 0.19873423800060278, 0.11356242171463016, 0.05678121085731508, 0.02839060542865754, 0.05968003609152361, 0.35808021654914163, 0.05968003609152361, 0.17904010827457081, 0.11936007218304721, 0.05968003609152361, 0.17904010827457081, 0.5516971593412942, 0.2758485796706471, 0.5128041429077282, 0.12820103572693206, 0.12820103572693206, 0.12820103572693206, 0.12820103572693206, 0.12820103572693206, 0.07118304901035659, 0.14236609802071318, 0.5694643920828527, 0.07118304901035659, 0.07118304901035659, 0.31793022498709733, 0.15896511249354867, 0.15896511249354867, 0.31793022498709733, 0.0716950242995631, 0.0716950242995631, 0.0716950242995631, 0.0716950242995631, 0.645255218696068, 0.06494751388445454, 0.25979005553781814, 0.06494751388445454, 0.06494751388445454, 0.3896850833067272, 0.06494751388445454, 0.010000003905734668, 0.07000002734014268, 0.060000023434408016, 0.18000007030322404, 0.05000001952867335, 0.37000014451218277, 0.020000007811469336, 0.22000008592616271, 0.020000007811469336, 0.4978507697710623, 0.1659502565903541, 0.055316752196784696, 0.2765837609839235, 0.5286303940720501, 0.10572607881441003, 0.10572607881441003, 0.10572607881441003, 0.2736878766168947, 0.08049643429908668, 0.026832144766362226, 0.05366428953272445, 0.016099286859817336, 0.1770921554579907, 0.021465715813089783, 0.27905430557016714, 0.03219857371963467, 0.03756500267290712, 0.06643642684777734, 0.13287285369555468, 0.19930928054333202, 0.5314914147822187, 0.03256508261200078, 0.06513016522400156, 0.09769524783600234, 0.09769524783600234, 0.22795557828400545, 0.1628254130600039, 0.26052066089600623, 0.03256508261200078, 0.04909333547311873, 0.1963733418924749, 0.09818667094623745, 0.49093335473118727, 0.14728000641935618, 0.03239935512125656, 0.35639290633382215, 0.06479871024251312, 0.03239935512125656, 0.2267954858487959, 0.03239935512125656, 0.12959742048502623, 0.12959742048502623, 0.10225716975930256, 0.10225716975930256, 0.10225716975930256, 0.40902867903721024, 0.10225716975930256, 0.10225716975930256, 0.32573739171880023, 0.02714478264323335, 0.2985926090755669, 0.1085791305729334, 0.1085791305729334, 0.02714478264323335, 0.1085791305729334, 0.049026790046429077, 0.1961071601857163, 0.049026790046429077, 0.3922143203714326, 0.14708037013928724, 0.049026790046429077, 0.14708037013928724, 0.06629192066604395, 0.1325838413320879, 0.4640434446623076, 0.06629192066604395, 0.06629192066604395, 0.06629192066604395, 0.05945476344326774, 0.17836429032980322, 0.17836429032980322, 0.05945476344326774, 0.35672858065960644, 0.17836429032980322, 0.10171466744050771, 0.20342933488101542, 0.6102880046430463, 0.12972981922636057, 0.07413132527220603, 0.1853283131805151, 0.2038611444985666, 0.1667954818624636, 0.01853283131805151, 0.055598493954154525, 0.055598493954154525, 0.07413132527220603, 0.055598493954154525, 0.11881472452373794, 0.15841963269831727, 0.09901227043644828, 0.05940736226186897, 0.1386171786110276, 0.05940736226186897, 0.03960490817457932, 0.03960490817457932, 0.31683926539663454, 0.11310524273685488, 0.3393157282105646, 0.3393157282105646, 0.11310524273685488, 0.04749413936369624, 0.09498827872739248, 0.5936767420462029, 0.11873534840924059, 0.07124120904554436, 0.07124120904554436, 0.21057900614339037, 0.21057900614339037, 0.42115801228678074, 0.077467576518688, 0.077467576518688, 0.077467576518688, 0.154935153037376, 0.077467576518688, 0.464805459112128, 0.077467576518688, 0.24534379894996156, 0.08178126631665386, 0.10222658289581732, 0.08178126631665386, 0.10222658289581732, 0.12267189947498078, 0.020445316579163465, 0.18400784921247118, 0.06133594973749039, 0.19733375160050554, 0.5920012548015167, 0.32597967265008126, 0.08149491816252032, 0.08149491816252032, 0.4074745908126016, 0.08149491816252032, 0.038706591828260604, 0.11611977548478182, 0.038706591828260604, 0.30965273462608484, 0.30965273462608484, 0.038706591828260604, 0.11611977548478182, 0.50675940823897, 0.33783960549264674, 0.1453969794088979, 0.1453969794088979, 0.4361909382266937, 0.18210117408078952, 0.18210117408078952, 0.36420234816157904, 0.03802159181474096, 0.07604318362948193, 0.03802159181474096, 0.03802159181474096, 0.15208636725896385, 0.24714034679581626, 0.34219432633266866, 0.057032387722111444, 0.38963936837182167, 0.06493989472863694, 0.06493989472863694, 0.19481968418591084, 0.12987978945727388, 0.19481968418591084, 0.09128646019740075, 0.15975130534545132, 0.06846484514805057, 0.6390052213818053, 0.022821615049350187, 0.12456496086921579, 0.12456496086921579, 0.06228248043460789, 0.3736948826076474, 0.1868474413038237, 0.30325086741602386, 0.07581271685400597, 0.15162543370801193, 0.07581271685400597, 0.37906358427002984, 0.4764413327196864, 0.31762755514645763, 0.11896048165976139, 0.11896048165976139, 0.11896048165976139, 0.23792096331952278, 0.3568814449792842, 0.4001542970679595, 0.057164899581137074, 0.057164899581137074, 0.4001542970679595, 0.057164899581137074, 0.1386069172700575, 0.04620230575668584, 0.1386069172700575, 0.23101152878342918, 0.04620230575668584, 0.04620230575668584, 0.3234161402968009, 0.12972880945876794, 0.06486440472938397, 0.06486440472938397, 0.4540508331056878, 0.19459321418815193, 0.12428818895246833, 0.12428818895246833, 0.12428818895246833, 0.24857637790493667, 0.372864566857405, 0.2690000182666915, 0.03842857403809878, 0.15371429615239512, 0.38428574038098784, 0.03842857403809878, 0.07685714807619756, 0.03842857403809878, 0.14432573177000943, 0.19243430902667924, 0.04810857725666981, 0.33676004079668864, 0.14432573177000943, 0.04810857725666981, 0.04810857725666981, 0.11611517801922262, 0.4644607120768905, 0.11611517801922262, 0.11611517801922262, 0.11611517801922262, 0.09642576099668496, 0.6749803269767947, 0.09642576099668496, 0.13083866037546066, 0.392515981126382, 0.13083866037546066, 0.2616773207509213, 0.08645119407457019, 0.17290238814914038, 0.08645119407457019, 0.08645119407457019, 0.43225597037285096, 0.14752875186951958, 0.14752875186951958, 0.44258625560855874, 0.29505750373903916, 0.02797388096364424, 0.02797388096364424, 0.22379104770915392, 0.2797388096364424, 0.08392164289093272, 0.05594776192728848, 0.3077126906000866, 0.48502919234251, 0.048502919234251, 0.097005838468502, 0.097005838468502, 0.048502919234251, 0.048502919234251, 0.048502919234251, 0.048502919234251, 0.06543914475860318, 0.19631743427580958, 0.06543914475860318, 0.06543914475860318, 0.19631743427580958, 0.39263486855161917, 0.09302221437524255, 0.24805923833398014, 0.031007404791747517, 0.3410814527092227, 0.1860444287504851, 0.09302221437524255, 0.031007404791747517, 0.48284645959360323, 0.24142322979680161, 0.0804744099322672, 0.0804744099322672, 0.0804744099322672, 0.1832651871982186, 0.04581629679955465, 0.04581629679955465, 0.6414281551937651, 0.04581629679955465, 0.1361481774545544, 0.1361481774545544, 0.0680740887272772, 0.2949877178182012, 0.2496049920000164, 0.113456814545462, 0.28004074395297296, 0.3500509299412162, 0.28004074395297296, 0.0949126669624161, 0.0949126669624161, 0.023728166740604024, 0.023728166740604024, 0.023728166740604024, 0.07118450022181208, 0.07118450022181208, 0.07118450022181208, 0.49829150155268453, 0.04745633348120805, 0.05976852307880382, 0.17930556923641147, 0.4781481846304306, 0.05976852307880382, 0.2390740923152153, 0.829250030031588, 0.0829250030031588, 0.19735851325428921, 0.2631446843390523, 0.06578617108476308, 0.06578617108476308, 0.39471702650857843, 0.12329676122977792, 0.24659352245955585, 0.3698902836893338, 0.12329676122977792, 0.12329676122977792, 0.021319736253450426, 0.3411157800552068, 0.06395920876035127, 0.12791841752070254, 0.14923815377415298, 0.10659868126725212, 0.1705578900276034, 0.3099462318505299, 0.11270772067291997, 0.2817693016822999, 0.16906158100937996, 0.028176930168229992, 0.08453079050468998, 0.028176930168229992, 0.13410486533863045, 0.13410486533863045, 0.5364194613545218, 0.13410486533863045, 0.050232147416768215, 0.10046429483353643, 0.08372024569461368, 0.11720834397245916, 0.03348809827784548, 0.11720834397245916, 0.06697619655569095, 0.18418454052815011, 0.20092858966707286, 0.050232147416768215, 0.06940970277946097, 0.17352425694865242, 0.38175336528703535, 0.1561718312537872, 0.10411455416919146, 0.017352425694865243, 0.06940970277946097, 0.017352425694865243, 0.09670350474905498, 0.2578760126641466, 0.32234501583018327, 0.1289380063320733, 0.03223450158301833, 0.03223450158301833, 0.09670350474905498, 0.03223450158301833, 0.23272698970007563, 0.3490904845501135, 0.11636349485003782, 0.11636349485003782, 0.11636349485003782, 0.1721800014312923, 0.08609000071564615, 0.43045000357823077, 0.08609000071564615, 0.08609000071564615, 0.08609000071564615, 0.4451733482922455, 0.04451733482922454, 0.04451733482922454, 0.26710400897534725, 0.17806933931689817, 0.247565120817496, 0.08252170693916533, 0.13753617823194222, 0.247565120817496, 0.30257959211027285, 0.23328314036334125, 0.4665662807266825, 0.11664157018167062, 0.11664157018167062, 0.15587609344980202, 0.08907205339988687, 0.08907205339988687, 0.044536026699943436, 0.06680404004991515, 0.06680404004991515, 0.31175218689960404, 0.022268013349971718, 0.022268013349971718, 0.15587609344980202, 0.053825159176410585, 0.053825159176410585, 0.16147547752923175, 0.16147547752923175, 0.5382515917641059, 0.053825159176410585, 0.05874893157658649, 0.05874893157658649, 0.5874893157658649, 0.05874893157658649, 0.05874893157658649, 0.17624679472975946, 0.44531059542964235, 0.14843686514321414, 0.14843686514321414, 0.14843686514321414, 0.14843686514321414, 0.09677642776382832, 0.04838821388191416, 0.33871749717339916, 0.04838821388191416, 0.1451646416457425, 0.33871749717339916, 0.06361385999338473, 0.4452970199536931, 0.0954207899900771, 0.1590346499834618, 0.1908415799801542, 0.06361385999338473, 0.17418223973421718, 0.047504247200241045, 0.11084324346722911, 0.15834749066747014, 0.19001698880096418, 0.015834749066747016, 0.047504247200241045, 0.2058517378677112, 0.047504247200241045, 0.16325783406975586, 0.03265156681395117, 0.03265156681395117, 0.03265156681395117, 0.13060626725580468, 0.3265156681395117, 0.2938641013255605, 0.12901385951947691, 0.7740831571168615, 0.10305218186670465, 0.24045509102231086, 0.30915654560011396, 0.10305218186670465, 0.10305218186670465, 0.17175363644450775, 0.05485464101735629, 0.13713660254339072, 0.027427320508678146, 0.21941856406942517, 0.05485464101735629, 0.3017005255954596, 0.13713660254339072, 0.05485464101735629, 0.14747011849483016, 0.07373505924741508, 0.5898804739793206, 0.07373505924741508, 0.07373505924741508, 0.09798056113498295, 0.32660187044994315, 0.09798056113498295, 0.09798056113498295, 0.1959611222699659, 0.13064074817997726, 0.1186450658129236, 0.1483063322661545, 0.16313696549276996, 0.0593225329064618, 0.1483063322661545, 0.07415316613307725, 0.13347569903953904, 0.10381443258630815, 0.04449189967984635, 0.01483063322661545, 0.1874889228924024, 0.3749778457848048, 0.3749778457848048, 0.15654146899302893, 0.3652634276504009, 0.20872195865737192, 0.05218048966434298, 0.15654146899302893, 0.026655131366686853, 0.07552287220561275, 0.07996539410006057, 0.5153325397559458, 0.004442521894447809, 0.004442521894447809, 0.08885043788895618, 0.08440791599450836, 0.017770087577791235, 0.10662052546674741, 0.2588231185074358, 0.1294115592537179, 0.5176462370148716, 0.1617631267653584, 0.48528938029607516, 0.1617631267653584, 0.16308985688254368, 0.32617971376508736, 0.4892695706476311, 0.09581968599471392, 0.09581968599471392, 0.19163937198942785, 0.09581968599471392, 0.09581968599471392, 0.3832787439788557, 0.280428604620759, 0.04673810077012651, 0.04673810077012651, 0.1402143023103795, 0.37390480616101207, 0.09347620154025302, 0.17138165783330606, 0.028563609638884345, 0.05712721927776869, 0.042845414458326514, 0.0999726337360952, 0.3998905349443808, 0.014281804819442172, 0.08569082891665303, 0.07140902409721087, 0.028563609638884345, 0.22254721794072305, 0.07418240598024102, 0.4450944358814461, 0.07418240598024102, 0.07418240598024102, 0.14836481196048204, 0.051384316959035316, 0.15415295087710595, 0.5652274865493885, 0.051384316959035316, 0.051384316959035316, 0.051384316959035316, 0.3371816059547894, 0.04816880085068421, 0.0722532012760263, 0.04816880085068421, 0.12042200212671052, 0.04816880085068421, 0.12042200212671052, 0.1685908029773947, 0.0818604832845622, 0.1637209665691244, 0.24558144985368663, 0.40930241642281107, 0.0818604832845622, 0.3534398477820219, 0.3534398477820219, 0.11781328259400732, 0.08626619671394725, 0.08626619671394725, 0.6038633769976307, 0.08626619671394725, 0.23337817480027945, 0.11668908740013972, 0.5834454370006986, 0.07877861247697285, 0.1575572249539457, 0.23633583743091854, 0.3151144499078914, 0.07877861247697285, 0.07877861247697285, 0.02148076040091664, 0.12888456240549986, 0.08592304160366657, 0.17184608320733313, 0.02148076040091664, 0.06444228120274993, 0.12888456240549986, 0.06444228120274993, 0.19332684360824978, 0.12888456240549986, 0.09514150077821201, 0.19028300155642403, 0.19028300155642403, 0.38056600311284805, 0.09514150077821201, 0.1125555704851533, 0.3376667114554599, 0.1125555704851533, 0.1125555704851533, 0.1125555704851533, 0.2251111409703066, 0.16638522534567993, 0.33277045069135985, 0.16638522534567993, 0.08269110454844202, 0.2590177245414434, 0.18970312219936697, 0.08877133282406276, 0.09728365240993178, 0.054722054480586625, 0.043777643584469304, 0.025536958757607094, 0.10214783503042837, 0.05715414579083492, 0.026334718294153885, 0.1580083097649233, 0.13167359147076943, 0.039502077441230826, 0.14484095061784635, 0.05266943658830777, 0.1580083097649233, 0.013167359147076943, 0.27651454208861576, 0.4830705980135308, 0.3623029485101481, 0.12032211370502471, 0.36096634111507414, 0.12032211370502471, 0.24064422741004943, 0.12032211370502471, 0.124002016995701, 0.496008067982804, 0.248004033991402, 0.124002016995701, 0.16370154146083568, 0.2728359024347261, 0.4365374438955618, 0.07117777763202601, 0.28471111052810405, 0.5694222210562081, 0.07117777763202601, 0.3791745141162164, 0.0947936285290541, 0.13271107994067574, 0.05687617711743246, 0.05687617711743246, 0.2654221598813515, 0.012953016582730178, 0.03885904974819053, 0.1683892155754923, 0.10362413266184142, 0.10362413266184142, 0.14248318241003194, 0.07771809949638106, 0.10362413266184142, 0.06476508291365089, 0.18134223215822248, 0.1562563157294745, 0.05208543857649149, 0.41668350861193193, 0.20834175430596596, 0.05208543857649149, 0.05208543857649149, 0.10417087715298298, 0.23184810494528535, 0.1656057892466324, 0.19872694709595887, 0.09936347354797943, 0.3312115784932648, 0.112128673040524, 0.084096504780393, 0.112128673040524, 0.0420482523901965, 0.0420482523901965, 0.084096504780393, 0.4625307762921615, 0.0700804206503275, 0.0140160841300655, 0.04152702013756294, 0.20763510068781468, 0.08305404027512588, 0.16610808055025175, 0.2906891409629406, 0.04152702013756294, 0.08305404027512588, 0.08305404027512588, 0.047684610048451495, 0.07152691507267725, 0.19073844019380598, 0.047684610048451495, 0.286107660290709, 0.023842305024225748, 0.07152691507267725, 0.1430538301453545, 0.07152691507267725, 0.023842305024225748, 0.27216492113066315, 0.3628865615075509, 0.18144328075377544, 0.18144328075377544, 0.07368103618755213, 0.09101775058462323, 0.0476759645919455, 0.16903296537144313, 0.19503803696704977, 0.0650126789890166, 0.13869371517656873, 0.095351929183891, 0.08234939338608768, 0.043341785992677725, 0.07512418248308579, 0.2253725474492574, 0.07512418248308579, 0.5258692773816005, 0.07757543054567852, 0.07757543054567852, 0.2133324340006159, 0.32969557981913367, 0.27151400690987476, 0.3685743785413317, 0.0982865009443551, 0.024571625236088777, 0.07371487570826633, 0.41771762901350923, 0.2563732668766797, 0.1709155112511198, 0.04272887781277995, 0.12818663343833986, 0.3418310225022396, 0.22682114959779173, 0.045364229919558345, 0.5443707590347002, 0.13609268975867506, 0.1028176894016775, 0.1028176894016775, 0.0342725631338925, 0.3084530682050325, 0.068545126267785, 0.3084530682050325, 0.26271729358689944, 0.04378621559781658, 0.39407594038034915, 0.04378621559781658, 0.04378621559781658, 0.04378621559781658, 0.04378621559781658, 0.13135864679344972, 0.37468269289466266, 0.05352609898495181, 0.32115659390971085, 0.16057829695485543, 0.05352609898495181, 0.014955246980844935, 0.05982098792337974, 0.3290154335785886, 0.1794629637701392, 0.02991049396168987, 0.0897314818850696, 0.02991049396168987, 0.10468672886591454, 0.13459722282760442, 0.014955246980844935, 0.36839959155215735, 0.49119945540287646, 0.08336408717105344, 0.15005535690789618, 0.06669126973684275, 0.10837331332236946, 0.04168204358552672, 0.23341944407894963, 0.1333825394736855, 0.11670972203947481, 0.06669126973684275, 0.008336408717105344, 0.22464634391121707, 0.22464634391121707, 0.44929268782243414, 0.6854024214875435, 0.11423373691459059, 0.11423373691459059, 0.4247982228777799, 0.4247982228777799, 0.10619955571944498, 0.13033391468056085, 0.13033391468056085, 0.13033391468056085, 0.5213356587222434, 0.178801712912797, 0.357603425825594, 0.178801712912797, 0.178801712912797, 0.11965761131016722, 0.1396005465285284, 0.05982880565508361, 0.15954348174688962, 0.019942935218361203, 0.23931522262033444, 0.039885870436722405, 0.07977174087344481, 0.1396005465285284, 0.5849057804028229, 0.0835579686289747, 0.25067390588692406, 0.11117715061335237, 0.015882450087621767, 0.031764900175243534, 0.015882450087621767, 0.22235430122670474, 0.3176490017524354, 0.19058940105146122, 0.015882450087621767, 0.06352980035048707, 0.015882450087621767, 0.13805305622857894, 0.13805305622857894, 0.13805305622857894, 0.4141591686857368, 0.24320176394808102, 0.4053362732468017, 0.24320176394808102, 0.08106725464936033, 0.08527053272005183, 0.4434067701442695, 0.13643285235208294, 0.0511623196320311, 0.0511623196320311, 0.0511623196320311, 0.18759517198411402, 0.13247608937410096, 0.13247608937410096, 0.13247608937410096, 0.39742826812230286, 0.0483919270023261, 0.1451757810069783, 0.1451757810069783, 0.1451757810069783, 0.0483919270023261, 0.2903515620139566, 0.1451757810069783, 0.13632798827416123, 0.06816399413708062, 0.06816399413708062, 0.06816399413708062, 0.20449198241124183, 0.06816399413708062, 0.40898396482248367, 0.049912388871522925, 0.09982477774304585, 0.4991238887152292, 0.049912388871522925, 0.2495619443576146, 0.049912388871522925, 0.3335041224439008, 0.0416880153054876, 0.0312660114791157, 0.0208440076527438, 0.104220038263719, 0.1771740650483223, 0.0312660114791157, 0.0208440076527438, 0.0937980344373471, 0.1354860497428347, 0.1673974446742855, 0.1673974446742855, 0.08369872233714275, 0.5021923340228565, 0.08369872233714275, 0.02171224828578707, 0.23883473114365777, 0.3908204691441673, 0.02171224828578707, 0.06513674485736122, 0.02171224828578707, 0.10856124142893536, 0.13027348971472244, 0.1210158566495684, 0.1210158566495684, 0.36304756994870524, 0.36304756994870524, 0.07876083483265088, 0.09188764063809268, 0.21265425404815735, 0.06563402902720906, 0.2441585879812177, 0.2179049763703341, 0.03938041741632544, 0.01050144464435345, 0.01050144464435345, 0.026253611610883625, 0.2626109076374274, 0.04774743775225952, 0.07162115662838928, 0.09549487550451904, 0.14324231325677855, 0.1909897510090381, 0.04774743775225952, 0.02387371887612976, 0.09549487550451904, 0.27643783268545996, 0.3109925617711425, 0.06910945817136499, 0.20732837451409497, 0.034554729085682495, 0.10366418725704749, 0.13883301120840022, 0.5553320448336009, 0.06941650560420011, 0.06941650560420011, 0.13883301120840022, 0.26125879082409276, 0.15675527449445567, 0.31351054898891134, 0.05225175816481855, 0.15675527449445567, 0.12685559794274262, 0.2114259965712377, 0.38056679382822783, 0.12685559794274262, 0.12685559794274262, 0.07199050317058847, 0.1079857547558827, 0.07199050317058847, 0.1079857547558827, 0.4319430190235308, 0.1079857547558827, 0.1079857547558827, 0.1109493276916255, 0.05547466384581275, 0.33284798307487645, 0.05547466384581275, 0.1109493276916255, 0.2773733192290637, 0.06987342041768435, 0.401772167401685, 0.052405065313263256, 0.1222784857309476, 0.1397468408353687, 0.08734177552210542, 0.017468355104421088, 0.052405065313263256, 0.06987342041768435, 0.04045212908728813, 0.04045212908728813, 0.04045212908728813, 0.24271277452372877, 0.12135638726186439, 0.04045212908728813, 0.48542554904745755, 0.09575918235870631, 0.2154581603070892, 0.07181938676902973, 0.11969897794838288, 0.023939795589676578, 0.23939795589676577, 0.14363877353805946, 0.07181938676902973, 0.13342755446781973, 0.177903405957093, 0.0889517029785465, 0.31133096042491276, 0.13342755446781973, 0.13342755446781973, 0.03935404741618637, 0.531279640118516, 0.059031071124279556, 0.019677023708093187, 0.07870809483237275, 0.059031071124279556, 0.059031071124279556, 0.1377391659566523, 0.3013579638124816, 0.5273764366718429, 0.0753394909531204, 0.08900404766322254, 0.029668015887740845, 0.11867206355096338, 0.029668015887740845, 0.08900404766322254, 0.32634817476514927, 0.29668015887740845, 0.4418456121046194, 0.18936240518769404, 0.1262416034584627, 0.18936240518769404, 0.1494258348541165, 0.07471291742705825, 0.07471291742705825, 0.3735645871352913, 0.07471291742705825, 0.07471291742705825, 0.07471291742705825, 0.07471291742705825, 0.35352971504107683, 0.047137295338810246, 0.07070594300821537, 0.047137295338810246, 0.16498053368583585, 0.023568647669405123, 0.11784323834702562, 0.023568647669405123, 0.14141188601643073, 0.21950229050477815, 0.3658371508412969, 0.21950229050477815, 0.07316743016825938, 0.07316743016825938, 0.24577176317168, 0.12288588158584, 0.12288588158584, 0.49154352634336, 0.5105055803550023, 0.06381319754437528, 0.3190659877218764, 0.06381319754437528, 0.09739334212092664, 0.09739334212092664, 0.09739334212092664, 0.4869667106046332, 0.19478668424185328, 0.29869852768247124, 0.14934926384123562, 0.14934926384123562, 0.14934926384123562, 0.2531722766184612, 0.16878151774564082, 0.5063445532369224, 0.11735744565569486, 0.46942978262277946, 0.029339361413923716, 0.08801808424177116, 0.08801808424177116, 0.20537552989746602, 0.029339361413923716, 0.221991046690943, 0.07399701556364767, 0.07399701556364767, 0.443982093381886, 0.07399701556364767, 0.5510240621873059, 0.2361531695088454, 0.15743544633923026, 0.33834449795378807, 0.4060133975445457, 0.06766889959075761, 0.06766889959075761, 0.06766889959075761, 0.06766889959075761, 0.13360405024466698, 0.20040607536700047, 0.46761417585633447, 0.06680202512233349, 0.06680202512233349, 0.043091127566230694, 0.1292733826986921, 0.43091127566230697, 0.3016378929636149, 0.043091127566230694, 0.043091127566230694, 0.7340711766200619, 0.10486731094572313, 0.10486731094572313, 0.10486731094572313, 0.08890745126408175, 0.5334447075844905, 0.08890745126408175, 0.355629805056327, 0.2993197107891702, 0.2993197107891702, 0.2993197107891702, 0.030519810798026556, 0.030519810798026556, 0.274678297182239, 0.18311886478815934, 0.09155943239407967, 0.09155943239407967, 0.30519810798026553, 0.08888861428494887, 0.017777722856989777, 0.24888811999785687, 0.035555445713979554, 0.3555544571397955, 0.10666633714193866, 0.017777722856989777, 0.10666633714193866, 0.10664422516296168, 0.2666105629074042, 0.05332211258148084, 0.05332211258148084, 0.10664422516296168, 0.05332211258148084, 0.31993267548888504, 0.17475808694341807, 0.17475808694341807, 0.08737904347170904, 0.04368952173585452, 0.13106856520756355, 0.34951617388683615, 0.16851677885552732, 0.4333288599142131, 0.02407382555078962, 0.1444429533047377, 0.09629530220315848, 0.09629530220315848, 0.10759181136877095, 0.5020951197209311, 0.03586393712292365, 0.10759181136877095, 0.0717278742458473, 0.03586393712292365, 0.10759181136877095, 0.2720998609980863, 0.10203744787428237, 0.10203744787428237, 0.4421622741218903, 0.03401248262476079, 0.03401248262476079, 0.11088850064952574, 0.22177700129905148, 0.11088850064952574, 0.33266550194857725, 0.04464380015941381, 0.4910818017535519, 0.022321900079706906, 0.022321900079706906, 0.20089710071736216, 0.08928760031882763, 0.11160950039853453, 0.04464380015941381, 0.15301589086732997, 0.051005296955776665, 0.051005296955776665, 0.3570370786904366, 0.3570370786904366, 0.5050590032730201, 0.3156618770456376, 0.06313237540912751, 0.09469856311369126, 0.29546672946692953, 0.4432000942003943, 0.06832494300995648, 0.11387490501659414, 0.045549962006637655, 0.2505247910365071, 0.06832494300995648, 0.1594248670232318, 0.18219984802655062, 0.09109992401327531, 0.17630058167175727, 0.08815029083587864, 0.7052023266870291, 0.4678834526161059, 0.08997758704155884, 0.1079731044498706, 0.08997758704155884, 0.01799551740831177, 0.08997758704155884, 0.0539865522249353, 0.01799551740831177, 0.03599103481662354, 0.10234022972121282, 0.5117011486060641, 0.10234022972121282, 0.034113409907070937, 0.20468045944242563, 0.0909621137991472, 0.5230321543450964, 0.0682215853493604, 0.0682215853493604, 0.0909621137991472, 0.0682215853493604, 0.0227405284497868, 0.0682215853493604, 0.03282741238668267, 0.21337818051343738, 0.1313096495467307, 0.06565482477336534, 0.09848223716004802, 0.08206853096670669, 0.016413706193341336, 0.06565482477336534, 0.3118604176734854, 0.06459105147430035, 0.17762539155432597, 0.08073881434287544, 0.11303434008002561, 0.19377315442290105, 0.04844328860572526, 0.06459105147430035, 0.016147762868575086, 0.14532986581717577, 0.11303434008002561, 0.12298142747809393, 0.040993809159364646, 0.040993809159364646, 0.2049690457968232, 0.12298142747809393, 0.4099380915936464, 0.2835017665946126, 0.07087544164865316, 0.07087544164865316, 0.07087544164865316, 0.42525264989191897, 0.0865850758170548, 0.0865850758170548, 0.46178707102429223, 0.1731701516341096, 0.05772338387803653, 0.14430845969509132, 0.08426566726159693, 0.28088555753865646, 0.16853133452319385, 0.028088555753865645, 0.14044277876932823, 0.2527970017847908, 0.028088555753865645, 0.5141082042103204, 0.08568470070172007, 0.17136940140344015, 0.08568470070172007, 0.12474817551652205, 0.05197840646521752, 0.12474817551652205, 0.031187043879130513, 0.17672658198173957, 0.17672658198173957, 0.04158272517217401, 0.10395681293043504, 0.05197840646521752, 0.12474817551652205, 0.2429709831441279, 0.3471014044916113, 0.03471014044916113, 0.06942028089832226, 0.27768112359328906, 0.10897901898397719, 0.544895094919886, 0.10897901898397719, 0.10897901898397719, 0.15222111203335256, 0.20296148271113673, 0.3551825947444893, 0.05074037067778418, 0.02537018533889209, 0.20296148271113673, 0.049683994553292005, 0.19873597821316802, 0.049683994553292005, 0.5465239400862121, 0.14905198365987601, 0.05751230472840729, 0.28756152364203647, 0.5751230472840729, 0.463863053003543, 0.30924203533569533, 0.07731050883392383, 0.07731050883392383, 0.15012097425276544, 0.22518146137914813, 0.45036292275829626, 0.07506048712638272, 0.07506048712638272, 0.06791973874633193, 0.06791973874633193, 0.06791973874633193, 0.20375921623899582, 0.13583947749266387, 0.40751843247799163, 0.1185246182928339, 0.0790164121952226, 0.1975410304880565, 0.1185246182928339, 0.3555738548785017, 0.1185246182928339, 0.12231655823583507, 0.48926623294334026, 0.12231655823583507, 0.12231655823583507, 0.12231655823583507, 0.7071526665637081, 0.1414305333127416, 0.3900534712252006, 0.23403208273512038, 0.07801069424504013, 0.31204277698016053, 0.040015019669960475, 0.040015019669960475, 0.250093872937253, 0.0500187745874506, 0.08003003933992095, 0.11004130409239131, 0.2100788532672925, 0.17006383359733204, 0.040015019669960475, 0.020007509834980237, 0.26368551337441015, 0.06592137834360254, 0.04394758556240169, 0.08789517112480338, 0.2417117205932093, 0.021973792781200846, 0.13184275668720508, 0.04394758556240169, 0.08789517112480338, 0.40102772156199623, 0.06683795359366604, 0.26735181437466415, 0.20051386078099812, 0.06683795359366604, 0.08046569801621699, 0.03218627920648679, 0.19311767523892076, 0.17702453563567735, 0.04827941880973019, 0.016093139603243396, 0.016093139603243396, 0.09655883761946038, 0.04827941880973019, 0.27358337325513776, 0.08586332825023847, 0.42931664125119234, 0.08586332825023847, 0.25758998475071543, 0.17172665650047694, 0.06225040470305796, 0.43575283292140576, 0.1556260117576449, 0.10893820823035144, 0.06225040470305796, 0.04668780352729347, 0.04668780352729347, 0.07781300587882245, 0.1230410807204282, 0.5536848632419269, 0.0615205403602141, 0.0615205403602141, 0.0615205403602141, 0.0615205403602141, 0.0615205403602141, 0.11269025965517643, 0.11269025965517643, 0.11269025965517643, 0.11269025965517643, 0.3380707789655293, 0.11269025965517643, 0.11216437892615846, 0.3364931367784754, 0.3364931367784754, 0.22432875785231693, 0.11754178133712936, 0.05877089066856468, 0.7640215786913409, 0.2728236872000212, 0.5456473744000424, 0.0682059218000053, 0.0682059218000053, 0.18101684582361688, 0.04525421145590422, 0.7240673832944675, 0.6115209333264615, 0.13104019999852748, 0.21840033333087913, 0.011540520393533139, 0.45008029534779237, 0.13848624472239765, 0.10386468354179824, 0.13848624472239765, 0.05770260196766569, 0.09232416314826511, 0.09408628467414626, 0.8467765620673164, 0.4491267318614958, 0.1497089106204986, 0.0748544553102493, 0.2245633659307479, 0.0748544553102493, 0.3660029580729824, 0.43920354968757885, 0.07320059161459648, 0.07320059161459648, 0.07320059161459648, 0.1849866913687359, 0.09249334568436796, 0.09249334568436796, 0.09249334568436796, 0.3699733827374718, 0.1849866913687359, 0.11362271518358193, 0.22724543036716385, 0.5681135759179097, 0.2759058618466821, 0.35473610808859124, 0.11824536936286374, 0.11824536936286374, 0.11824536936286374, 0.5283228626566246, 0.10566457253132493, 0.10566457253132493, 0.10566457253132493, 0.10566457253132493, 0.15626342401352333, 0.05208780800450778, 0.10417561600901556, 0.31252684802704667, 0.31252684802704667, 0.07194271977234153, 0.03597135988617076, 0.39568495874787835, 0.10791407965851228, 0.10791407965851228, 0.10791407965851228, 0.07194271977234153, 0.10791407965851228, 0.025053608962182606, 0.10021443584873042, 0.10021443584873042, 0.4259113523571043, 0.10021443584873042, 0.10021443584873042, 0.10021443584873042, 0.05906618043535553, 0.05906618043535553, 0.47252944348284426, 0.3543970826121332, 0.19825536665941992, 0.03965107333188399, 0.35685965998695585, 0.03965107333188399, 0.07930214666376798, 0.11895321999565196, 0.11895321999565196, 0.07930214666376798, 0.09727222116943306, 0.3890888846777322, 0.09727222116943306, 0.29181666350829916, 0.09727222116943306, 0.4527361496840912, 0.0740049475445149, 0.013059696625502631, 0.052238786502010526, 0.04788555429350965, 0.013059696625502631, 0.11753726962952368, 0.039179089876507894, 0.06965171533601404, 0.12189050183802456, 0.2706006933245202, 0.03183537568523767, 0.11142381489833185, 0.11142381489833185, 0.015917687842618836, 0.09550612705571301, 0.19101225411142603, 0.07958843921309418, 0.07958843921309418, 0.2763453004346801, 0.055269060086936016, 0.055269060086936016, 0.055269060086936016, 0.4974215407824242, 0.055269060086936016, 0.12442839941775571, 0.12442839941775571, 0.09332129956331678, 0.03110709985443893, 0.015553549927219464, 0.07776774963609732, 0.1944193740902433, 0.13998194934497518, 0.10109807452692651, 0.10109807452692651, 0.03448033432285754, 0.03448033432285754, 0.03448033432285754, 0.03448033432285754, 0.5861656834885782, 0.1724016716142877, 0.03448033432285754, 0.03448033432285754, 0.03448033432285754, 0.15765457868193647, 0.21020610490924863, 0.15765457868193647, 0.42041220981849725, 0.052551526227312156, 0.11635315137590081, 0.10342502344524516, 0.07756876758393387, 0.11635315137590081, 0.29734694240507986, 0.038784383791966935, 0.1422094072372121, 0.09049689551458952, 0.012928127930655646, 0.012928127930655646, 0.40010461654842727, 0.08002092330968545, 0.08002092330968545, 0.3200836932387418, 0.1450722190595032, 0.10362301361393086, 0.041449205445572346, 0.09326071225253778, 0.010362301361393086, 0.09326071225253778, 0.010362301361393086, 0.41449205445572346, 0.020724602722786173, 0.0725361095297516, 0.09069066501708517, 0.09069066501708517, 0.2720719950512555, 0.45345332508542585, 0.25874824564276155, 0.06468706141069039, 0.19406118423207117, 0.38812236846414233, 0.06468706141069039, 0.2004466254235999, 0.2672621672314665, 0.46770879265506643, 0.4367015464848811, 0.10917538662122027, 0.1637630799318304, 0.054587693310610134, 0.1637630799318304, 0.22042892522431812, 0.6612867756729544, 0.39324023357085325, 0.3370630573464456, 0.018725725408135867, 0.018725725408135867, 0.018725725408135867, 0.14980580326508694, 0.037451450816271735, 0.018725725408135867, 0.5617071896525073, 0.08024388423607247, 0.24073165270821742, 0.302336890049899, 0.604673780099798, 0.38498799658574423, 0.029614461275826478, 0.1480723063791324, 0.17768676765495886, 0.20730122893078534, 0.029614461275826478, 0.029614461275826478, 0.029614461275826478, 0.11961886678298242, 0.3588566003489473, 0.11961886678298242, 0.3588566003489473, 0.5285269966473267, 0.058725221849702976, 0.17617566554910893, 0.11745044369940595, 0.058725221849702976, 0.08370865041766855, 0.5859605529236798, 0.08370865041766855, 0.1674173008353371, 0.1521439521043639, 0.45643185631309174, 0.1521439521043639, 0.6250385875105818, 0.17858245357445193, 0.08929122678722597, 0.08929122678722597, 0.5027987148467563, 0.3016792289080538, 0.10055974296935126, 0.10055974296935126, 0.11106242291288772, 0.01586606041612682, 0.0951963624967609, 0.01586606041612682, 0.04759818124838045, 0.31732120832253635, 0.01586606041612682, 0.23799090624190225, 0.14279454374514136, 0.07034197244330806, 0.3165388759948862, 0.03517098622165403, 0.10551295866496208, 0.14068394488661612, 0.03517098622165403, 0.03517098622165403, 0.2461969035515782, 0.3118385541265545, 0.05197309235442575, 0.09355156623796636, 0.0207892369417703, 0.01039461847088515, 0.0831569477670812, 0.2702600802430139, 0.13513004012150695, 0.01039461847088515, 0.01039461847088515, 0.05712592548874749, 0.2642074053854571, 0.04998518480265405, 0.09282962891921466, 0.17851851715233588, 0.007140740686093436, 0.06426666617484092, 0.03570370343046718, 0.10711111029140154, 0.14281481372186872, 0.11519671347182897, 0.11519671347182897, 0.5759835673591448, 0.12835282033932585, 0.12835282033932585, 0.3850584610179776, 0.12835282033932585, 0.03757760628892769, 0.2630432440224938, 0.09394401572231921, 0.1315216220112469, 0.09394401572231921, 0.05636640943339153, 0.15031042515571075, 0.07515521257785537, 0.07515521257785537, 0.05636640943339153, 0.09208217305505874, 0.03069405768501958, 0.03069405768501958, 0.4604108652752937, 0.15347028842509788, 0.18416434611011748, 0.03069405768501958, 0.23109106906662003, 0.10270714180738667, 0.2824446399703134, 0.07703035635554001, 0.05135357090369334, 0.2567678545184667, 0.02567678545184667, 0.07565138598951188, 0.050434257326341256, 0.06304282165792657, 0.025217128663170628, 0.2773884152948769, 0.2647798509632916, 0.06304282165792657, 0.1765199006421944, 0.03089738653919824, 0.09269215961759472, 0.5561529577055683, 0.09269215961759472, 0.09269215961759472, 0.03089738653919824, 0.09269215961759472, 0.2580209075975219, 0.17201393839834794, 0.057337979466115976, 0.028668989733057988, 0.43003484599586983, 0.10312289415811236, 0.10312289415811236, 0.07734217061858427, 0.05156144707905618, 0.07734217061858427, 0.283587958934809, 0.05156144707905618, 0.12890361769764044, 0.07734217061858427, 0.2525377565954359, 0.08417925219847863, 0.08417925219847863, 0.5050755131908718, 0.08417925219847863, 0.17410108752931544, 0.17410108752931544, 0.17410108752931544, 0.3482021750586309, 0.059143792765437804, 0.29571896382718904, 0.47315034212350243, 0.17743137829631342, 0.20345073525728086, 0.4069014705145617, 0.06781691175242696, 0.06781691175242696, 0.20345073525728086, 0.06781691175242696, 0.3231648703780282, 0.12522638727148594, 0.0848307784742324, 0.1858198004673662, 0.020197804398626763, 0.06059341319588029, 0.06463297407560564, 0.03231648703780282, 0.024237365278352116, 0.0767516567147817, 0.16162582049119967, 0.12121936536839975, 0.40406455122799917, 0.20203227561399958, 0.04040645512279992, 0.15622922901296735, 0.15622922901296735, 0.022318461287566763, 0.04463692257513353, 0.0669553838627003, 0.08927384515026705, 0.022318461287566763, 0.022318461287566763, 0.20086615158810087, 0.22318461287566763, 0.14064445977707618, 0.046881486592358725, 0.1875259463694349, 0.14064445977707618, 0.42193337933122854, 0.0807689892873527, 0.0807689892873527, 0.24230696786205808, 0.0807689892873527, 0.4038449464367635, 0.1615379785747054, 0.0854603879599811, 0.017092077591996217, 0.13673662073596973, 0.06836831036798487, 0.15382869832796595, 0.15382869832796595, 0.2905653190639357, 0.11964454314397352, 0.017125236054851875, 0.2055028326582225, 0.03425047210970375, 0.1883775966033706, 0.11987665238396313, 0.3425047210970375, 0.017125236054851875, 0.051375708164555625, 0.017125236054851875, 0.4575091278565973, 0.15250304261886577, 0.07625152130943288, 0.07625152130943288, 0.12708586884905482, 0.12708586884905482, 0.07041139988333243, 0.07041139988333243, 0.07041139988333243, 0.6337025989499918, 0.14082279976666487, 0.3816050368741667, 0.2289630221245, 0.07632100737483334, 0.15264201474966668, 0.15264201474966668, 0.12756405667075502, 0.07653843400245301, 0.35717935867811407, 0.12756405667075502, 0.07653843400245301, 0.07653843400245301, 0.12756405667075502, 0.025512811334151005, 0.2836403488196871, 0.14182017440984354, 0.3403684185836245, 0.22691227905574965, 0.03177319410048047, 0.09531958230144141, 0.15886597050240236, 0.3495051351052852, 0.06354638820096094, 0.09531958230144141, 0.19063916460288283, 0.44430758598694464, 0.2665845515921668, 0.08886151719738894, 0.05446623139749812, 0.05446623139749812, 0.1361655784937453, 0.02723311569874906, 0.08169934709624718, 0.16339869419249436, 0.08169934709624718, 0.10893246279499624, 0.2995642726862397, 0.06832320355642979, 0.20496961066928937, 0.06832320355642979, 0.06832320355642979, 0.47826242489500853, 0.06832320355642979, 0.11303202893447618, 0.11303202893447618, 0.07064501808404762, 0.22606405786895237, 0.11303202893447618, 0.3108380795698095, 0.04238701085042857, 0.1396831909736487, 0.06984159548682435, 0.06984159548682435, 0.41904957292094613, 0.20952478646047307, 0.11722497044120148, 0.8205747930884103, 0.5951103590100919, 0.05951103590100919, 0.05951103590100919, 0.05951103590100919, 0.17853310770302758, 0.3981761959581756, 0.03981761959581756, 0.03981761959581756, 0.03981761959581756, 0.3185409567665405, 0.15927047838327024, 0.15872270827084967, 0.31744541654169933, 0.12697816661667974, 0.031744541654169935, 0.09523362496250981, 0.19046724992501962, 0.06348908330833987, 0.058513686423090024, 0.11702737284618005, 0.08777052963463504, 0.4388526481731752, 0.11702737284618005, 0.17554105926927008, 0.5303038681021781, 0.17676795603405937, 0.08838397801702969, 0.08838397801702969, 0.11592075905797353, 0.023184151811594708, 0.44049888442029944, 0.023184151811594708, 0.11592075905797353, 0.06955245543478412, 0.20865736630435236, 0.09372479597856499, 0.062483197319043324, 0.09372479597856499, 0.2499327892761733, 0.062483197319043324, 0.031241598659521662, 0.3436575852547383, 0.031241598659521662, 0.031241598659521662, 0.12697583969341686, 0.4655780788758618, 0.12697583969341686, 0.12697583969341686, 0.08465055979561123, 0.08465055979561123, 0.14988456144820325, 0.5245959650687113, 0.14988456144820325, 0.07494228072410163, 0.3892366249543987, 0.12974554165146623, 0.25949108330293247, 0.12974554165146623, 0.2409726016167794, 0.2409726016167794, 0.4819452032335588, 0.042777972769530014, 0.10694493192382502, 0.08555594553906003, 0.10694493192382502, 0.042777972769530014, 0.21388986384765005, 0.021388986384765007, 0.06416695915429502, 0.12833391830859003, 0.19250087746288505, 0.40225200719500115, 0.10056300179875029, 0.10056300179875029, 0.20112600359750057, 0.20112600359750057, 0.3960933053575463, 0.0565847579082209, 0.0565847579082209, 0.0565847579082209, 0.3960933053575463, 0.4138919241451798, 0.25868245259073736, 0.31041894310888485, 0.04830460452864546, 0.04830460452864546, 0.4347414407578092, 0.09660920905729092, 0.14491381358593639, 0.14491381358593639, 0.09660920905729092, 0.021198886124996874, 0.14839220287497812, 0.12719331674998124, 0.12719331674998124, 0.06359665837499062, 0.19078997512497187, 0.06359665837499062, 0.27558551962495936, 0.12405611741826338, 0.06202805870913169, 0.37216835225479017, 0.06202805870913169, 0.3101402935456585, 0.06202805870913169, 0.1500356590311931, 0.08335314390621838, 0.2000475453749241, 0.16670628781243677, 0.1333650302499494, 0.0666825151249747, 0.05001188634373103, 0.1500356590311931, 0.12214047427004265, 0.12214047427004265, 0.04071349142334755, 0.36642142281012796, 0.12214047427004265, 0.12214047427004265, 0.04071349142334755, 0.0814269828466951, 0.4853422556651913, 0.09706845113303825, 0.09706845113303825, 0.09706845113303825, 0.09706845113303825, 0.09706845113303825, 0.09706845113303825, 0.27181352612947945, 0.04942064111445081, 0.04942064111445081, 0.07413096167167621, 0.22239288501502863, 0.17297224390057783, 0.17297224390057783, 0.281491160528375, 0.37532154737116663, 0.09383038684279166, 0.09383038684279166, 0.05544236874269994, 0.24949065934214976, 0.11088473748539988, 0.11088473748539988, 0.3049330280848497, 0.05544236874269994, 0.05544236874269994, 0.02772118437134997, 0.44527235346595767, 0.08348856627486706, 0.22263617673297884, 0.1948066546413565, 0.027829522091622354, 0.18488108758072364, 0.03697621751614473, 0.36976217516144727, 0.07395243503228946, 0.03697621751614473, 0.03697621751614473, 0.18488108758072364, 0.03697621751614473, 0.131419809428972, 0.131419809428972, 0.525679237715888, 0.131419809428972, 0.44924886505017525, 0.06911521000771927, 0.03455760500385963, 0.03455760500385963, 0.03455760500385963, 0.27646084003087706, 0.03455760500385963, 0.03455760500385963, 0.19136384129256676, 0.10631324516253708, 0.021262649032507415, 0.021262649032507415, 0.17010119226005932, 0.04252529806501483, 0.3614650335526261, 0.06378794709752225, 0.021262649032507415, 0.12936852617391204, 0.35576344697825807, 0.06468426308695602, 0.29107918389130205, 0.12936852617391204, 0.083458468346943, 0.194736426142867, 0.027819489448981, 0.361653362836753, 0.083458468346943, 0.111277957795924, 0.027819489448981, 0.027819489448981, 0.055638978897962, 0.17656946625320477, 0.35313893250640954, 0.17656946625320477, 0.35313893250640954, 0.0724458805485765, 0.3984523430171707, 0.25356058192001774, 0.144891761097153, 0.10866882082286475, 0.15412458355988948, 0.07706229177994474, 0.2311868753398342, 0.15412458355988948, 0.3853114588997237, 0.07706229177994474, 0.061479096931758175, 0.061479096931758175, 0.18443729079527452, 0.18443729079527452, 0.4918327754540654, 0.05182836397953509, 0.10365672795907017, 0.10365672795907017, 0.15548509193860527, 0.46645527581581575, 0.10365672795907017, 0.22642053526770772, 0.45284107053541545, 0.22642053526770772, 0.05581548818227679, 0.2790774409113839, 0.05581548818227679, 0.05581548818227679, 0.05581548818227679, 0.3907084172759375, 0.11163097636455357, 0.043829645578029104, 0.021914822789014552, 0.10957411394507276, 0.06574446836704366, 0.021914822789014552, 0.10957411394507276, 0.10957411394507276, 0.10957411394507276, 0.17531858231211642, 0.2191482278901455, 0.07332074359805009, 0.10152102959730012, 0.07896080079790009, 0.07332074359805009, 0.10716108679715013, 0.022560228799400027, 0.056400571998500064, 0.17484177319535021, 0.20304205919460025, 0.11844120119685014, 0.05856047103048378, 0.02928023551524189, 0.08784070654572566, 0.14640117757620943, 0.11712094206096756, 0.35136282618290265, 0.08784070654572566, 0.11712094206096756, 0.056459277755413366, 0.028229638877706683, 0.08468891663312005, 0.42344458316560024, 0.056459277755413366, 0.056459277755413366, 0.22583711102165346, 0.14775395494221769, 0.07387697747110884, 0.1846924436777721, 0.03693848873555442, 0.07387697747110884, 0.05540773310333163, 0.29550790988443537, 0.11081546620666326, 0.2296170501156423, 0.03826950835260705, 0.1530780334104282, 0.11480852505782115, 0.03826950835260705, 0.0765390167052141, 0.03826950835260705, 0.3061560668208564, 0.09353173207277093, 0.023382933018192734, 0.2572122632001201, 0.16368053112734912, 0.0701487990545782, 0.09353173207277093, 0.18706346414554187, 0.11691466509096367, 0.023382933018192734, 0.18724924926028302, 0.18724924926028302, 0.5617477477808491, 0.2066924313304613, 0.0295274901900659, 0.0885824705701977, 0.0885824705701977, 0.295274901900659, 0.2657474117105931, 0.13609643726631526, 0.05103616397486822, 0.08506027329144704, 0.2041446558994729, 0.10207232794973645, 0.08506027329144704, 0.08506027329144704, 0.2381687652160517, 0.10259587703685046, 0.17099312839475075, 0.5129793851842522, 0.10259587703685046, 0.10259587703685046, 0.16118917822543785, 0.04029729455635946, 0.08059458911271893, 0.06044594183453919, 0.04029729455635946, 0.12089188366907838, 0.06044594183453919, 0.22163512005997704, 0.18133782550361757, 0.06044594183453919, 0.10278901541423986, 0.10278901541423986, 0.13705202055231983, 0.10278901541423986, 0.034263005138079956, 0.41115606165695945, 0.13705202055231983, 0.11490978452059987, 0.4596391380823995, 0.22981956904119974, 0.11490978452059987, 0.11490978452059987, 0.06417711858835871, 0.09626567788253806, 0.09626567788253806, 0.16044279647089676, 0.5775940672952283, 0.35726488822835356, 0.4912392213139861, 0.044658111028544195, 0.044658111028544195, 0.044658111028544195, 0.19298641853717424, 0.04824660463429356, 0.3859728370743485, 0.28947962780576136, 0.04824660463429356, 0.300710571137844, 0.100236857045948, 0.50118428522974, 0.100236857045948, 0.014559408930515615, 0.1019158625136093, 0.014559408930515615, 0.8590051269004213, 0.43197534736448423, 0.08639506947289684, 0.08639506947289684, 0.17279013894579368, 0.17279013894579368], \"Topic\": [1, 2, 3, 4, 6, 7, 9, 10, 6, 7, 8, 10, 2, 4, 5, 7, 8, 10, 2, 4, 5, 7, 8, 10, 2, 5, 7, 8, 2, 7, 8, 9, 2, 4, 6, 7, 8, 10, 1, 2, 4, 7, 8, 1, 2, 7, 10, 1, 2, 4, 5, 6, 7, 1, 2, 3, 6, 8, 1, 2, 3, 4, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 8, 9, 1, 2, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 6, 7, 8, 9, 10, 4, 7, 1, 2, 3, 7, 8, 10, 1, 3, 5, 10, 1, 2, 3, 5, 6, 7, 8, 10, 1, 2, 3, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 10, 1, 2, 3, 4, 5, 6, 7, 9, 1, 2, 3, 4, 5, 6, 9, 10, 1, 2, 3, 4, 5, 6, 8, 9, 10, 1, 2, 3, 6, 9, 10, 2, 3, 5, 6, 8, 9, 10, 1, 2, 4, 5, 7, 9, 10, 1, 3, 6, 7, 8, 9, 10, 7, 8, 1, 4, 5, 6, 9, 10, 1, 2, 6, 8, 9, 2, 3, 4, 7, 4, 5, 7, 9, 10, 1, 2, 5, 6, 8, 9, 1, 2, 3, 4, 5, 6, 7, 9, 10, 1, 6, 8, 9, 2, 3, 7, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 4, 6, 9, 10, 2, 3, 4, 5, 7, 8, 9, 10, 4, 5, 6, 8, 10, 2, 3, 4, 5, 7, 8, 9, 10, 2, 3, 4, 7, 8, 9, 1, 2, 3, 4, 6, 7, 10, 1, 2, 3, 4, 5, 8, 9, 2, 3, 4, 6, 8, 10, 2, 3, 5, 6, 9, 10, 1, 2, 5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 4, 5, 9, 2, 3, 7, 8, 9, 10, 2, 9, 10, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 5, 6, 7, 8, 9, 10, 2, 8, 1, 2, 5, 6, 10, 1, 2, 3, 4, 5, 8, 9, 4, 10, 3, 4, 7, 4, 9, 10, 1, 2, 3, 4, 7, 8, 9, 10, 1, 2, 5, 6, 7, 8, 1, 2, 3, 4, 6, 2, 6, 7, 8, 9, 1, 2, 3, 6, 9, 2, 8, 3, 4, 6, 9, 10, 1, 4, 5, 7, 8, 1, 2, 3, 5, 6, 7, 9, 2, 3, 4, 7, 10, 1, 2, 6, 9, 10, 1, 4, 5, 7, 8, 9, 10, 2, 4, 5, 6, 7, 8, 9, 2, 3, 4, 5, 8, 3, 7, 8, 3, 6, 7, 9, 2, 5, 6, 9, 10, 2, 4, 6, 8, 3, 4, 6, 7, 8, 9, 10, 3, 4, 5, 6, 7, 8, 9, 10, 1, 3, 5, 7, 9, 10, 1, 2, 4, 5, 6, 8, 10, 1, 4, 5, 7, 8, 1, 3, 6, 7, 9, 1, 3, 4, 6, 9, 10, 4, 6, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 3, 4, 5, 6, 7, 1, 4, 1, 2, 4, 6, 8, 1, 5, 6, 8, 10, 1, 2, 3, 4, 6, 8, 10, 2, 3, 6, 7, 8, 9, 10, 4, 5, 7, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 5, 6, 7, 8, 10, 2, 3, 4, 5, 6, 7, 9, 10, 2, 4, 5, 7, 10, 1, 2, 4, 5, 6, 9, 1, 3, 5, 7, 10, 2, 4, 5, 8, 9, 2, 3, 5, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 6, 7, 8, 10, 2, 3, 4, 5, 8, 9, 2, 3, 7, 8, 10, 1, 2, 3, 6, 8, 9, 1, 2, 3, 4, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 10, 1, 2, 3, 4, 8, 9, 10, 2, 5, 1, 2, 3, 6, 7, 8, 1, 2, 3, 4, 6, 7, 8, 10, 2, 4, 5, 6, 8, 2, 3, 5, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 5, 9, 1, 3, 5, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 7, 9, 1, 8, 10, 2, 3, 8, 2, 4, 5, 6, 7, 9, 1, 3, 5, 6, 7, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 4, 6, 7, 8, 10, 2, 4, 5, 7, 8, 10, 1, 2, 3, 4, 5, 6, 8, 10, 2, 3, 5, 9, 10, 2, 5, 8, 3, 6, 9, 10, 1, 5, 10, 2, 3, 4, 6, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 7, 1, 2, 3, 4, 6, 9, 1, 4, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 10, 3, 4, 2, 3, 7, 8, 9, 2, 3, 8, 9, 1, 3, 4, 2, 4, 6, 8, 1, 4, 6, 7, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 9, 10, 1, 4, 5, 7, 10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 3, 5, 9, 1, 2, 4, 5, 6, 1, 2, 3, 5, 7, 2, 4, 5, 6, 10, 4, 7, 8, 9, 1, 4, 6, 7, 8, 10, 1, 2, 3, 5, 6, 8, 9, 10, 2, 5, 6, 7, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 3, 6, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 4, 2, 4, 8, 1, 3, 10, 4, 6, 8, 9, 2, 4, 9, 10, 1, 2, 4, 5, 6, 7, 8, 9, 10, 1, 2, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 4, 6, 10, 1, 2, 4, 9, 1, 2, 4, 6, 7, 8, 9, 1, 4, 8, 10, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 6, 8, 9, 1, 2, 3, 6, 7, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 4, 7, 9, 1, 2, 4, 5, 6, 7, 8, 9, 2, 3, 4, 6, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 7, 8, 9, 10, 2, 3, 4, 5, 9, 10, 3, 5, 6, 9, 10, 3, 5, 6, 8, 10, 3, 4, 7, 9, 10, 2, 3, 4, 5, 6, 8, 9, 2, 5, 6, 7, 8, 10, 1, 2, 3, 4, 5, 6, 8, 9, 10, 2, 3, 5, 6, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 10, 1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 8, 9, 10, 1, 4, 9, 2, 3, 4, 5, 6, 7, 9, 1, 4, 7, 8, 2, 3, 4, 5, 6, 8, 9, 10, 1, 2, 3, 4, 6, 7, 8, 9, 10, 2, 5, 6, 9, 10, 2, 3, 5, 8, 1, 2, 5, 10, 4, 5, 6, 9, 10, 2, 4, 6, 9, 3, 5, 10, 1, 2, 3, 6, 8, 9, 10, 4, 5, 6, 7, 10, 2, 7, 9, 2, 3, 5, 7, 9, 10, 2, 3, 7, 9, 10, 1, 3, 4, 5, 6, 7, 1, 6, 7, 8, 2, 5, 6, 7, 3, 4, 9, 1, 2, 3, 4, 6, 8, 10, 1, 2, 4, 5, 6, 7, 9, 10, 1, 2, 3, 6, 7, 8, 10, 2, 3, 4, 5, 7, 9, 1, 2, 4, 8, 9, 10, 3, 4, 5, 6, 8, 9, 10, 1, 2, 4, 5, 6, 10, 2, 3, 5, 8, 1, 3, 4, 5, 6, 8, 9, 10, 2, 5, 6, 8, 9, 2, 5, 8, 10, 2, 7, 1, 2, 3, 4, 5, 6, 7, 9, 2, 6, 7, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 4, 5, 8, 1, 2, 3, 4, 5, 6, 7, 10, 1, 2, 3, 4, 5, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 7, 9, 2, 5, 6, 8, 9, 1, 2, 3, 4, 5, 8, 1, 2, 3, 4, 5, 6, 9, 1, 3, 4, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 5, 7, 8, 2, 3, 5, 7, 4, 5, 7, 8, 9, 10, 1, 2, 6, 8, 9, 4, 5, 10, 3, 4, 6, 7, 2, 4, 5, 7, 8, 1, 2, 5, 7, 8, 10, 1, 2, 5, 6, 8, 9, 2, 3, 6, 9, 10, 8, 9, 1, 6, 8, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 8, 9, 10, 1, 5, 6, 7, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 3, 4, 5, 8, 2, 3, 4, 5, 6, 8, 9, 10, 2, 3, 4, 5, 6, 7, 8, 2, 3, 5, 7, 8, 9, 3, 5, 6, 9, 1, 4, 5, 1, 2, 5, 10, 2, 4, 9, 4, 9, 10, 1, 3, 4, 5, 7, 9, 10, 5, 10, 2, 6, 7, 9, 10, 1, 3, 5, 6, 8, 2, 4, 5, 6, 9, 10, 4, 9, 10, 2, 3, 4, 5, 7, 1, 2, 3, 4, 5, 5, 6, 7, 9, 10, 2, 3, 4, 5, 6, 7, 8, 10, 1, 2, 4, 5, 6, 7, 9, 3, 4, 5, 7, 1, 2, 3, 5, 6, 7, 8, 10, 1, 2, 3, 5, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 5, 7, 8, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 6, 7, 8, 9, 10, 2, 5, 6, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 6, 7, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 5, 6, 9, 1, 5, 7, 8, 9, 2, 3, 7, 1, 2, 4, 6, 8, 1, 7, 1, 2, 3, 5, 6, 8, 9, 10, 1, 2, 9, 1, 4, 1, 2, 3, 4, 6, 8, 9, 10, 3, 5, 6, 9, 4, 5, 6, 8, 9, 2, 7, 9, 10, 2, 3, 9, 1, 2, 4, 10, 2, 4, 5, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 3, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 10, 5, 6, 7, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 7, 9, 10, 1, 2, 3, 5, 6, 7, 8, 9, 2, 3, 5, 7, 8, 9, 10, 1, 2, 5, 6, 7, 1, 2, 3, 4, 5, 6, 8, 9, 10, 1, 2, 4, 7, 8, 2, 3, 9, 10, 1, 6, 8, 10, 3, 4, 5, 6, 7, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 4, 6, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 4, 6, 7, 8, 9, 1, 2, 3, 4, 5, 8, 1, 2, 3, 5, 6, 7, 8, 10, 1, 2, 3, 4, 5, 6, 7, 8, 10, 1, 3, 5, 6, 8, 9, 2, 4, 6, 7, 9, 1, 3, 4, 9, 10, 1, 2, 4, 5, 6, 7, 8, 9, 2, 4, 8, 9, 2, 3, 5, 6, 7, 8, 9, 2, 3, 8, 1, 2, 3, 4, 5, 6, 8, 9, 10, 1, 3, 6, 7, 8, 9, 2, 4, 5, 6, 7, 8, 9, 2, 3, 5, 8, 9, 1, 2, 1, 2, 3, 4, 8, 1, 2, 3, 4, 7, 9, 1, 2, 3, 5, 7, 9, 10, 2, 3, 4, 5, 6, 9, 2, 6, 8, 10, 1, 3, 5, 6, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 4, 5, 8, 9, 2, 4, 6, 8, 2, 5, 7, 8, 3, 6, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 6, 8, 4, 5, 6, 8, 9, 1, 3, 4, 1, 2, 3, 6, 8, 9, 10, 1, 2, 3, 4, 5, 6, 9, 10, 2, 3, 4, 7, 8, 10, 1, 2, 4, 5, 6, 7, 9, 10, 1, 2, 3, 4, 5, 7, 8, 9, 2, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 7, 8, 9, 1, 2, 6, 7, 2, 3, 4, 5, 7, 8, 9, 10, 1, 6, 7, 9, 10, 1, 2, 3, 4, 6, 7, 9, 10, 1, 2, 5, 10, 1, 3, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 7, 8, 9, 10, 3, 4, 5, 7, 10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 6, 7, 8, 2, 7, 8, 9, 10, 1, 3, 5, 6, 8, 10, 1, 2, 5, 6, 8, 2, 3, 4, 6, 7, 10, 1, 2, 10, 1, 3, 5, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 3, 4, 6, 7, 8, 9, 10, 1, 3, 4, 7, 8, 9, 10, 1, 2, 4, 5, 7, 8, 9, 10, 1, 2, 3, 4, 5, 7, 8, 9, 1, 2, 3, 5, 6, 7, 8, 9, 10, 2, 5, 10, 1, 3, 4, 5, 6, 8, 2, 3, 4, 5, 7, 8, 9, 10, 3, 4, 5, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 3, 4, 5, 6, 7, 9, 3, 5, 6, 9, 10, 2, 6, 7, 8, 9, 1, 2, 6, 8, 9, 2, 6, 8, 9, 10, 3, 4, 7, 10, 2, 3, 6, 9, 3, 4, 6, 7, 9]}, \"plot.opts\": {\"xlab\": \"PC1\", \"ylab\": \"PC2\"}, \"mdsDat\": {\"x\": [0.006434175002056762, -0.03710442079780138, -0.06495077359408179, -0.05996330199266473, -0.06318925788867215, -0.052212360816310276, 0.07104915692769345, 0.14107868387568653, -0.037153827481593384, 0.096011926765687], \"topics\": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], \"cluster\": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], \"Freq\": [11.53474827116822, 11.488947015713888, 10.686543821489673, 10.684178896347749, 9.882998795120118, 9.689397325959634, 9.35167987298835, 9.047454833462895, 8.970306216256162, 8.663744951493305], \"y\": [-0.029811136117041948, -0.01982391657872439, -0.0019428560085652262, 0.08729150594280384, 0.0430174508662789, -0.0632393556044962, 0.05462859222588817, -0.0940447466606916, -0.08313005725462877, 0.10705451918917717]}, \"topic.order\": [2, 1, 8, 10, 4, 5, 3, 7, 6, 9], \"tinfo\": {\"loglift\": [30.0, 29.0, 28.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 21.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 1.9568, 1.7915, 1.6804, 1.6487, 1.6436, 1.6417, 1.5184, 1.4915, 1.4792, 1.4542, 1.434, 1.4154, 1.4096, 1.3701, 1.3698, 1.3692, 1.3584, 1.3395, 1.3236, 1.3029, 1.2855, 1.2762, 1.267, 1.2387, 1.2277, 1.2226, 1.2173, 1.2165, 1.2103, 1.2099, 1.199, 1.1661, 1.0305, 1.0604, 1.1354, 0.9994, 0.8663, 1.1037, 0.8695, 0.9716, 1.0086, -0.3334, 0.8646, 0.833, -0.3948, 0.0822, 0.8359, 0.2354, 0.7131, -0.4733, 0.8337, 0.3701, 0.076, -0.4808, 1.9253, 1.7998, 1.5869, 1.5753, 1.5352, 1.5256, 1.5146, 1.4752, 1.4666, 1.4292, 1.4287, 1.4133, 1.4101, 1.4084, 1.3911, 1.3616, 1.3579, 1.3463, 1.3448, 1.3304, 1.2902, 1.2876, 1.2607, 1.2595, 1.2566, 1.2437, 1.2193, 1.1929, 1.1798, 1.1659, 1.1404, 1.16, 1.1024, 1.0656, 0.8123, 0.8236, 1.1274, 1.0132, 1.0172, 0.9362, 0.8005, 0.8923, 0.0965, 0.7551, -0.2153, 0.6078, 0.2391, -0.2189, 0.3107, 0.5792, -0.1403, 0.0518, 0.3403, -0.4311, -0.4455, -0.3857, 0.2885, 1.6644, 1.6412, 1.6122, 1.5771, 1.5671, 1.5609, 1.559, 1.5326, 1.5271, 1.5125, 1.5116, 1.5048, 1.4776, 1.475, 1.4699, 1.444, 1.4332, 1.4109, 1.409, 1.4074, 1.3415, 1.333, 1.3244, 1.3172, 1.2721, 1.2684, 1.2676, 1.2557, 1.2465, 1.2417, 1.2382, 1.2285, 1.1158, 0.5707, 0.6918, 1.152, 0.8319, 1.1728, 1.1301, 1.1221, 1.0269, 1.1438, 1.0875, 0.9453, 1.063, 0.9375, 0.8329, -0.2404, 0.4389, -0.281, 0.5617, 0.1535, -0.3385, -0.1509, 0.3909, 0.2319, -0.8521, 1.8043, 1.7135, 1.7091, 1.6959, 1.6413, 1.5996, 1.5715, 1.5561, 1.5528, 1.5466, 1.5343, 1.5087, 1.4641, 1.4581, 1.448, 1.4259, 1.3629, 1.3609, 1.3385, 1.3317, 1.3134, 1.31, 1.2986, 1.2962, 1.285, 1.2737, 1.2702, 1.2663, 1.2605, 1.2442, 1.1713, 1.1958, 1.207, 1.1781, 1.1006, 1.1286, 0.5556, 1.0946, 0.7757, 0.4654, 0.86, -0.1828, 0.503, 0.8207, 0.7296, 0.6369, -0.4923, 0.5185, 0.6559, 0.5937, 0.0458, 0.2058, 0.545, 0.2692, -0.1259, -0.3655, -0.7446, 2.0829, 2.0673, 1.8326, 1.8194, 1.7578, 1.7533, 1.714, 1.7127, 1.6624, 1.6596, 1.636, 1.6211, 1.5204, 1.519, 1.5138, 1.5, 1.484, 1.4487, 1.4307, 1.3846, 1.3718, 1.3465, 1.3421, 1.3212, 1.3183, 1.3165, 1.3074, 1.2877, 1.252, 1.2493, 1.2027, 0.9057, 1.0847, 1.177, 1.0761, 0.6829, 1.1837, 0.6095, -0.0213, 0.8075, 0.5653, 0.888, 0.0668, 0.7124, 0.6638, 0.6603, 0.2949, 0.3824, 0.4485, -0.7464, 0.1846, 1.9916, 1.7878, 1.7723, 1.759, 1.6849, 1.5207, 1.4993, 1.4848, 1.478, 1.475, 1.4676, 1.4237, 1.371, 1.358, 1.3305, 1.3109, 1.3029, 1.291, 1.2536, 1.2523, 1.2422, 1.2306, 1.2197, 1.2099, 1.204, 1.1934, 1.1833, 1.1779, 1.1745, 1.1691, 1.1625, 1.1517, 1.1621, 1.1084, 1.0378, 0.8114, 1.0816, 1.0326, 0.8674, 0.5942, 0.8446, 1.0344, 0.6427, 0.6258, 0.5943, 0.9139, -0.5722, 0.8316, -0.4504, -0.4289, 0.3082, 0.4169, 0.1701, 2.0676, 2.0448, 1.9371, 1.8754, 1.8419, 1.8193, 1.788, 1.7748, 1.7606, 1.6839, 1.6705, 1.653, 1.6174, 1.6151, 1.6117, 1.6097, 1.5792, 1.569, 1.5434, 1.5417, 1.535, 1.5343, 1.5281, 1.4915, 1.4886, 1.4691, 1.4677, 1.4582, 1.4581, 1.434, 1.4285, 1.3871, 1.3358, 1.3384, 1.3655, 1.0596, 1.0427, 1.1748, 1.2013, 1.2238, 0.7139, 0.7969, 1.1706, 0.4085, 0.2172, 1.1298, 0.5842, -0.7611, 0.9163, -0.051, 0.3371, 0.7082, -0.3473, 0.5463, 0.381, -0.8503, 0.4469, 2.0663, 1.9935, 1.8006, 1.7956, 1.761, 1.7508, 1.7255, 1.6944, 1.6763, 1.6599, 1.6587, 1.6484, 1.6278, 1.6037, 1.603, 1.5871, 1.5353, 1.5235, 1.5215, 1.5203, 1.5095, 1.5063, 1.4855, 1.4831, 1.4589, 1.4529, 1.4523, 1.4448, 1.4231, 1.4135, 1.3883, 1.3721, 1.3888, 1.1272, 1.2336, 1.3643, 1.3034, 1.1996, 1.1589, 0.6746, 1.0344, 0.9531, 0.7629, 0.6278, 0.4447, 0.8069, 0.042, 0.9177, -0.0831, 0.3982, -1.2461, 0.2262, 0.7514, 0.7086, 0.1213, 2.2639, 2.0827, 1.9616, 1.8866, 1.761, 1.7401, 1.7109, 1.6869, 1.6711, 1.6595, 1.5798, 1.5669, 1.5543, 1.5523, 1.5243, 1.4974, 1.4803, 1.4413, 1.4364, 1.425, 1.3864, 1.3579, 1.3547, 1.3475, 1.343, 1.3399, 1.3314, 1.3183, 1.3074, 1.3041, 1.2654, 1.2206, 1.2304, 1.276, 1.1768, 0.8165, 0.8829, 0.1264, 1.0397, 1.2224, 0.7102, 0.83, 1.0097, 1.101, 0.8552, -0.079, 0.2041, 0.6987, 0.15, -0.264, 0.8127, 0.7964, 0.717, 0.7023, 0.4086, 0.0844, 2.2537, 2.0296, 1.9849, 1.9816, 1.9748, 1.9412, 1.8902, 1.8645, 1.8366, 1.8138, 1.8041, 1.7549, 1.7063, 1.6665, 1.6449, 1.6051, 1.5823, 1.4915, 1.4657, 1.4396, 1.4295, 1.3954, 1.3726, 1.3655, 1.3646, 1.364, 1.3466, 1.324, 1.3217, 1.3198, 1.3172, 1.2982, 1.2819, 1.2519, 1.2521, 1.1489, 1.2567, 1.2337, 1.1268, 0.9827, 1.1196, 1.1468, 1.0895, 1.1976, 1.0112, 0.3529, -0.4103, 0.4865, 0.1953, 0.295, 0.7287, -0.1241, 0.4505, 0.144, 0.3483, 0.9334, -0.6909], \"Term\": [\"drug\", \"information\", \"|\", \"fda\", \"product\", \"1\", \"amp\", \"2\", \"agency\", \"safety\", \"policy\", \"2013\", \"0\", \"accessibility\", \"public\", \"2016\", \"budget\", \"file\", \"regulatory\", \"2017\", \"3\", \"2014\", \"research\", \"archive\", \"2015\", \"site\", \"forms\", \"form\", \"february\", \"use\", \"committee_on\", \"m.d._before\", \"house\", \"recalls\", \"specific\", \"registration\", \"learn_more\", \"prevention\", \"collection\", \"2014\", \"nutrition\", \"alerts\", \"labeling\", \"adverse\", \"tobacco_products\", \"product\", \"section_508\", \"tissue\", \"control\", \"radiological_health\", \"sentinel\", \"subscribe\", \"supplement\", \"partner\", \"sponsor\", \"health_professionals_science_amp\", \"recall\", \"outreach\", \"february\", \"board\", \"receive\", \"forms\", \"safety\", \"include\", \"law\", \"regulatory\", \"amp\", \"emergency\", \"products\", \"2012\", \"announcement\", \"fda\", \"page_last_updated\", \"'s\", \"information\", \"program\", \"initiative\", \"public\", \"available\", \"food\", \"therapy\", \"electronic\", \"office_of\", \"use\", \"small\", \"health_professional\", \"american\", \"plan\", \"need\", \"stakeholder\", \"local\", \"cber\", \"microsoft\", \"state\", \"regulated_products\", \"work\", \"national\", \"testimony\", \"list\", \"potential\", \"if_you\", \"medical_device\", \"datum\", \"1_888_463_6332\", \"this_page\", \"currently\", \"student\", \"identify\", \"share_tweet\", \"issue\", \"produce\", \"advisory\", \"undeclared\", \"letter\", \"2015\", \"export\", \"complaint\", \"recall\", \"fda\", \"report\", \"government\", \"compliance\", \"spotlight\", \"number_of\", \"request\", \"session\", \"safety\", \"industry\", \"information\", \"new\", \"health\", \"food\", \"2017\", \"section\", \"use\", \"program\", \"2014\", \"product\", \"drug\", \"amp\", \"fda_'s\", \"pediatric\", \"on_twitter_follow\", \"kb\", \"register\", \"navigate_the\", \"important\", \"fda_photo_on\", \"fda_archive_combination\", \"check\", \"clinical_trial\", \"meeting\", \"option\", \"note\", \"practice\", \"outbreak\", \"support\", \"policy\", \"coordinator\", \"\\u00e2\\u0080\\u0093\", \"pdf\", \"long\", \"patients\", \"federal_register\", \"health_professionals_science_amp\", \"good\", \"administrative\", \"contact\", \"draft\", \"flickr\", \"fda_on_facebook\", \"animal\", \"tobacco\", \"guidance\", \"fda\", \"information\", \"do_not\", \"page\", \"process\", \"official\", \"regulations\", \"announcement\", \"press\", \"inspection\", \"requirement\", \"disability\", \"accessible\", \"view\", \"safety\", \"federal\", \"drug\", \"patient\", \"office_of\", \"use\", \"program\", \"document\", \"fda_'s\", \"food\", \"budget\", \"recalls_market_withdrawals_amp\", \"podcasts\", \"current\", \"record\", \"key\", \"drug\", \"2016\", \"stay\", \"medical_devices\", \"application\", \"address\", \"begin\", \"fda_voice\", \"require\", \"m.d.\", \"safety_alerts\", \"heart\", \"facility\", \"contain\", \"subscribe_to\", \"health_care\", \"answer\", \"robert\", \"privacy\", \"industry\", \"contact_fda_browse_by\", \"fda_archive_combination\", \"expert\", \"treat\", \"service\", \"test\", \"technical\", \"20\", \"contact_fda\", \"july\", \"safety\", \"base\", \"2017\", \"food\", \"march\", \"fda\", \"agency\", \"more_in\", \"form\", \"technology\", \"information\", \"guidance\", \"approval\", \"section\", \"health\", \"accessibility\", \"patient\", \"policy\", \"report\", \"use\", \"product\", \"director\", \"ph.d.\", \"division_of\", \"appropriate\", \"instruction\", \"email_print\", \"tobacco_product\", \"resource\", \"mail\", \"web_site\", \"9\", \"comment_on\", \"medwatch\", \"problem_with\", \"staff\", \"standard\", \"option_linkedin_pin_it\", \"problem\", \"wide\", \"late\", \"en\", \"screening\", \"approved\", \"301_796\", \"treatment\", \"2012\", \"leadership\", \"download\", \"collect\", \"learn_more\", \"form\", \"information\", \"provide\", \"microsoft\", \"find\", \"food\", \"base\", \"report\", \"fda\", \"human\", \"office_of\", \"page_last_updated\", \"use\", \"web\", \"development\", \"news\", \"accessibility\", \"fda_'s\", \"document\", \"product\", \"1\", \"2013\", \"protect\", \"fdasia\", \"adverse_event\", \"guidance_documents\", \"activity\", \"investigation\", \"back_to\", \"clearance\", \"action\", \"electronically\", \"electronic\", \"inform\", \"chemical\", \"agency\", \"investigator\", \"come\", \"march\", \"change\", \"share\", \"section\", \"international_programs\", \"2015\", \"human\", \"pet\", \"communications\", \"evaluation\", \"regulation\", \"impact\", \"trial\", \"final_rule\", \"visit\", \"government\", \"response\", \"research\", \"information\", \"combination_product\", \"form\", \"health\", \"amp\", \"site\", \"compliance\", \"accessibility\", \"include\", \"office_of\", \"january\", \"fda\", \"submit\", \"safety\", \"food\", \"2017\", \"federal\", \"page\", \"more_sharing_option_linkedin\", \"cheese\", \"color_additive\", \"security\", \"archive\", \"reaction\", \"reduce\", \"condition\", \"advanced\", \"220\", \"look\", \"radiation_emitting\", \"result\", \"you_should\", \"indicate\", \"milk\", \"center_for_biologics_evaluation\", \"u.s._food\", \"listing_of\", \"resources\", \"animal_drug\", \"authority\", \"veterinary\", \"forms\", \"biologic\", \"reporting\", \"20\", \"tribal\", \"what_'_new\", \"cell\", \"effort\", \"certain\", \"online\", \"standards\", \"internet\", \"regulatory\", \"research\", \"cosmetic\", \"disease\", \"know\", \"program\", \"page\", \"time\", \"food\", \"product\", \"clinical\", \"2017\", \"fda\", \"help\", \"drug\", \"health\", \"human\", \"safety\", \"fda_'s\", \"policy\", \"information\", \"provide\", \"outbreaks\", \"avenue_silver_spring_md\", \"3\", \"function\", \"open\", \"course\", \"u.s._department_of\", \"professional\", \"early\", \"e_mail\", \"and_human_services\", \"0_0\", \"file\", \"right\", \"silver_spring_md_20993\", \"learn\", \"affairs\", \"committees\", \"site_map\", \"13\", \"public\", \"submissions\", \"protection\", \"workshops\", \"building\", \"medwatch_safety_alerts_news\", \"type\", \"radiation\", \"person\", \"our_website\", \"training\", \"2\", \"user\", \"amp\", \"site\", \"7\", \"session\", \"4\", \"search\", \"use\", \"biologics\", \"regulation\", \"1\", \"page\", \"program\", \"development\", \"food\", \"website\", \"drug\", \"regulatory\", \"fda\", \"health\", \"consumer\", \"products\", \"office_of\", \"|\", \"player\", \"enhance\", \"windows\", \"food_safety_modernization\", \"healthcare\", \"comment\", \"drug_evaluation\", \"legislation\", \"publications\", \"science_amp_research\", \"no_fear_act\", \"news_amp_event\", \"import\", \"recently\", \"employment\", \"business\", \"upcoming\", \"subscribe_to\", \"download\", \"effective\", \"biologics\", \"media\", \"preparedness\", \"maintain\", \"meeting_conference\", \"accessibility\", \"apply\", \"print_this_page_home\", \"date\", \"devices\", \"veterinary_cosmetics_tobacco_products\", \"know\", \"center\", \"cookie\", \"use\", \"agency\", \"fda\", \"combination_product\", \"video\", \"research\", \"consumer\", \"requirement\", \"and_drug_administration\", \"'s\", \"food\", \"report\", \"if_you\", \"program\", \"product\", \"event\", \"science\", \"available\", \"website\", \"guidance\", \"include\", \"popular\", \"advisory_committees\", \"establish\", \"reportable_food_registry\", \"2007\", \"operation\", \"0\", \"violation\", \"prescription_drug\", \"analysis\", \"1\", \"it_more_sharing_option\", \"linkedin_pin_it_email\", \"imaging\", \"cholesterol\", \"icsrs\", \"2\", \"order\", \"301_796\", \"begin\", \"code\", \"provision\", \"freedom_of_information\", \"center_for_tobacco_products\", \"attachment\", \"biological_product\", \"rfr\", \"market\", \"cder\", \"general\", \"field\", \"new\", \"approve\", \"act\", \"management\", \"fda_'s\", \"devices\", \"clinical\", \"patient\", \"2017\", \"february\", \"system\", \"4\", \"share_tweet_linkedin_pin\", \"web\", \"product\", \"fda\", \"report\", \"drug\", \"use\", \"federal\", \"safety\", \"include\", \"program\", \"office_of\", \"update\", \"food\"], \"logprob\": [30.0, 29.0, 28.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 21.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, -5.3629, -5.763, -5.647, -5.6382, -5.3444, -5.752, -5.5394, -6.0705, -5.8106, -3.963, -5.9185, -5.4995, -5.6372, -6.3853, -5.0746, -3.0035, -4.7788, -4.8885, -5.3741, -5.5987, -5.9512, -6.2364, -5.581, -5.8654, -5.3584, -6.3445, -4.6144, -6.0422, -4.6337, -5.8653, -5.0911, -4.9373, -3.2674, -4.1854, -4.9263, -4.2437, -3.7157, -4.9797, -4.7998, -5.1004, -5.1944, -3.4308, -5.1271, -5.1016, -4.2618, -4.8708, -5.2387, -5.0046, -5.2065, -4.8417, -5.2753, -5.1907, -5.1672, -5.1125, -5.7406, -5.8403, -5.9757, -5.5491, -4.4908, -5.8579, -5.7531, -6.4943, -4.8873, -5.218, -6.0839, -5.2875, -4.3818, -6.0688, -4.8896, -5.8558, -4.3803, -4.7367, -5.0167, -5.7061, -6.1531, -6.6144, -6.5066, -6.0376, -6.1323, -4.5185, -6.26, -6.7776, -7.1444, -6.7422, -4.4006, -6.4652, -4.8591, -4.7661, -2.2851, -4.044, -5.7545, -5.2271, -5.3423, -5.301, -5.0345, -5.3546, -4.2014, -5.2246, -4.0823, -5.0921, -4.7833, -4.5872, -4.9953, -5.1632, -4.772, -4.9011, -5.0769, -4.8038, -4.8386, -4.9677, -5.1911, -5.3568, -5.9518, -4.2691, -6.3496, -4.8644, -5.2512, -6.1631, -6.1631, -6.1293, -5.271, -4.4958, -5.7448, -4.7868, -5.72, -6.2385, -5.3353, -3.9144, -6.25, -5.9518, -4.2393, -5.775, -6.0215, -5.5303, -6.25, -5.4091, -5.7224, -4.4879, -5.6008, -6.1631, -6.4502, -5.1418, -5.2836, -4.4911, -2.5267, -3.1752, -5.2359, -4.3729, -5.4091, -5.3187, -5.3399, -5.1761, -5.4322, -5.3568, -5.2021, -5.3754, -5.288, -5.221, -4.5383, -5.0243, -4.6741, -5.1185, -5.0896, -4.9702, -5.1039, -5.2076, -5.2476, -5.2204, -4.2252, -5.5135, -4.9696, -5.2791, -5.3333, -5.6242, -2.8215, -3.8931, -5.6657, -4.9349, -5.5616, -4.7392, -6.5672, -5.4431, -4.8778, -5.2392, -5.2379, -6.7272, -6.6776, -6.0255, -5.6242, -7.0064, -5.4956, -5.8224, -5.1995, -4.706, -6.3883, -6.4294, -6.1967, -5.134, -4.9697, -5.4126, -5.8224, -5.6074, -5.2742, -5.5681, -3.7423, -5.4632, -4.5303, -3.903, -4.9198, -3.2802, -4.7014, -5.2068, -5.1372, -5.0785, -4.3593, -5.0883, -5.1655, -5.1487, -4.9766, -5.0446, -5.1352, -5.0785, -4.9935, -4.9972, -5.1173, -5.6788, -4.9082, -5.3697, -5.7046, -5.3841, -5.0879, -6.0662, -4.6198, -5.727, -4.7744, -5.1104, -5.3712, -4.9081, -5.4615, -4.7642, -4.5452, -5.7361, -4.6741, -6.2153, -5.8309, -6.2991, -5.9469, -6.2881, -5.307, -4.9092, -4.7555, -5.8872, -5.4615, -5.0841, -5.8085, -4.6641, -2.9614, -4.3765, -5.1769, -4.9972, -3.6854, -5.3741, -4.2582, -3.1187, -4.8595, -4.6779, -5.1037, -4.5649, -5.0233, -5.0002, -5.0233, -4.9555, -5.0972, -5.15, -5.1192, -5.1488, -4.1328, -4.6544, -5.3946, -5.4081, -6.0275, -4.6811, -4.9859, -5.8175, -6.4178, -4.7974, -5.7407, -4.1371, -6.3267, -6.4178, -3.8739, -5.6068, -5.8476, -4.4887, -5.5217, -5.1081, -4.5002, -5.6272, -4.3212, -4.4571, -6.4178, -6.523, -6.0851, -4.4881, -5.6066, -6.9064, -5.4657, -5.1354, -5.7199, -5.043, -4.3984, -3.0556, -4.9422, -4.8341, -4.1551, -3.9878, -4.7055, -5.2059, -4.6077, -4.6199, -4.6489, -5.1635, -3.6696, -5.1331, -4.7483, -4.7973, -4.9979, -5.0462, -5.0347, -5.3133, -5.4257, -4.7893, -5.2807, -4.2273, -6.4781, -5.5411, -6.0256, -5.6624, -6.655, -5.433, -5.4507, -5.7174, -5.8942, -5.7173, -6.2875, -5.4949, -5.2808, -5.6625, -4.7159, -5.9943, -5.7174, -4.7141, -4.6119, -6.3926, -6.2875, -5.3179, -5.0333, -4.9778, -5.5138, -5.3179, -5.1635, -4.7996, -5.0052, -5.2807, -4.1835, -4.3935, -4.8302, -5.0121, -5.0681, -4.239, -4.4078, -5.0534, -3.9598, -4.1556, -5.1032, -4.7218, -3.8585, -4.9783, -4.444, -4.6853, -4.9588, -4.6453, -4.9332, -4.9667, -4.7173, -5.0143, -5.7873, -6.1932, -4.2297, -4.921, -5.0465, -5.1368, -5.295, -5.2196, -5.6266, -6.3281, -5.1368, -4.6091, -3.9142, -5.3781, -5.5231, -6.1259, -5.54, -5.5647, -5.6266, -5.1648, -3.7305, -6.8802, -5.3781, -5.295, -5.5746, -6.1574, -5.7941, -5.6265, -6.2033, -5.1649, -4.5704, -4.254, -4.8899, -3.4547, -4.3165, -5.1648, -4.9435, -4.5929, -4.5815, -3.9571, -4.8124, -4.7128, -4.5704, -4.577, -4.5083, -4.8571, -4.3264, -4.9872, -4.4761, -4.8449, -4.3436, -4.7962, -4.9684, -4.9607, -5.1219, -3.3162, -4.6314, -5.3976, -4.4837, -5.46, -6.0318, -4.3575, -6.0779, -5.8095, -5.7498, -5.1696, -5.5959, -5.061, -5.5715, -6.1618, -5.8095, -5.7498, -5.4826, -5.5012, -5.3242, -6.0779, -4.489, -5.3242, -6.0815, -6.1618, -5.4939, -3.9191, -5.6687, -5.5474, -5.477, -5.1223, -4.5974, -5.0615, -5.4589, -5.0394, -3.8152, -4.3215, -2.9711, -4.9841, -5.3241, -4.726, -4.8898, -5.1377, -5.284, -5.0794, -4.4473, -4.6635, -5.0395, -4.803, -4.6367, -5.1562, -5.2108, -5.2026, -5.2026, -5.1983, -5.1613, -5.1923, -5.1446, -5.6764, -5.6669, -5.5766, -5.0126, -4.1809, -5.5766, -5.7981, -5.2842, -3.5293, -4.847, -5.631, -6.1217, -5.7165, -6.2244, -4.0438, -5.6286, -5.1626, -6.5916, -5.6534, -5.8887, -5.2841, -6.3589, -6.8871, -6.7423, -6.7148, -5.5541, -6.3589, -5.1163, -5.0848, -4.4017, -4.6057, -4.6058, -5.0681, -4.3307, -5.131, -4.9994, -4.5534, -4.3233, -4.7244, -4.8089, -4.703, -5.0086, -4.7245, -4.0198, -3.5077, -4.3811, -4.1978, -4.3367, -4.7344, -4.422, -4.7952, -4.8089, -4.8949, -5.0555, -5.0592], \"Total\": [225.0, 380.0, 68.0, 822.0, 229.0, 87.0, 186.0, 65.0, 99.0, 247.0, 86.0, 39.0, 42.0, 95.0, 96.0, 78.0, 43.0, 71.0, 96.0, 90.0, 43.0, 80.0, 79.0, 42.0, 71.0, 70.0, 40.0, 51.0, 52.0, 177.0, 12.05908910201556, 9.535860040480838, 11.96773947964597, 12.462008905975473, 16.803606001135698, 11.19930855450023, 15.670739572399684, 9.463909956230067, 12.426310436344531, 80.83258899018422, 11.67069490598017, 18.07770630572424, 15.84263780884295, 7.8002489943216835, 28.937190522558293, 229.71437132327247, 39.343477329794304, 35.93306405721694, 22.463159662099187, 18.319147400310307, 13.102552421625235, 9.944015016588608, 19.328717313155067, 14.961559207503415, 25.114509861484535, 9.416235249060477, 53.40247056947255, 12.818755296022498, 52.74616108262496, 15.398854651346145, 33.767286552542295, 40.69734868539678, 247.55165964177507, 95.95083792519765, 42.429248127720015, 96.20362717505738, 186.34365771118624, 41.52065163921537, 62.823194542271935, 41.992973490502706, 36.83949188848192, 822.3375461161522, 45.50875717984954, 48.18284220713449, 380.8999747621172, 128.58800784121337, 41.887064398662005, 96.50365928612233, 48.91095699731717, 230.7242253858534, 40.46892057446805, 70.0191616285552, 96.19379161509806, 177.30316636267347, 8.530605691230155, 8.753981328192673, 9.45840431437342, 14.661483542913164, 43.97435188052437, 11.314267839441687, 12.703619461214267, 6.29668291555436, 31.679466946064654, 23.626541925168596, 9.944337271275458, 22.392348824625127, 55.56939416135488, 10.302008410842353, 34.08390475483987, 13.359258254639613, 58.63690351759961, 41.538890355845425, 31.439689404289915, 16.00766615937093, 10.657528266141036, 6.736870918387996, 7.707393928696887, 12.335436845928648, 11.2534652967796, 57.24637460266129, 10.280427320130336, 6.2906884681416075, 4.41656053333525, 6.695714289312071, 71.4209357477166, 8.884500302292063, 46.90489545048467, 53.40247056947255, 822.3375461161522, 140.0414948476558, 18.6824748854038, 35.490026558234455, 31.501478613052864, 35.60168805982047, 53.22318789074396, 35.255914899318846, 247.55165964177507, 46.05695305421706, 380.8999747621172, 60.92469234070225, 119.95573081105249, 230.7242253858534, 90.33420370383624, 58.393355676792716, 177.30316636267347, 128.58800784121337, 80.83258899018422, 229.71437132327247, 225.0973712138107, 186.34365771118624, 75.94537285952231, 16.25473368967203, 9.176078196730938, 50.82069396443826, 6.572722649626224, 29.313985401169838, 20.035105964854775, 8.064384952985634, 8.280363194217752, 8.612138542598213, 20.617315736613154, 44.79905368401462, 12.934852131786775, 34.648003385002355, 13.661091774572437, 8.17550799681535, 20.701960191123867, 86.65120513632655, 8.573272791531254, 11.57473460118823, 64.2566103639083, 14.777837471094069, 11.646415534762626, 19.199223954530453, 9.416235249060477, 22.838237704421882, 16.7560220383652, 57.62883054995072, 19.164250976420792, 11.022728379311364, 8.311024209993075, 30.864811853737184, 27.04441035818159, 66.86616418176348, 822.3375461161522, 380.8999747621172, 30.61831821790701, 99.96246491921194, 25.219998248972644, 28.810024593956026, 28.432526563168175, 36.83949188848192, 25.37097237857826, 28.939598904693565, 38.94568507710455, 29.111465139868876, 36.02096924681485, 42.76623463882676, 247.55165964177507, 77.2020937063623, 225.0973712138107, 62.138279083744536, 96.19379161509806, 177.30316636267347, 128.58800784121337, 67.42800423419368, 75.94537285952231, 230.7242253858534, 43.81810830817925, 13.230274345085123, 22.893737952427664, 17.021586149126414, 17.028458446003434, 13.273251349975867, 225.0973712138107, 78.28766471121483, 13.343602440943561, 27.88316287117334, 15.084794496114519, 35.22291916292133, 5.919969025193321, 18.326033910424275, 32.57959603327572, 23.206633394844648, 24.748520922235908, 5.59278758413074, 6.010149025686698, 11.615750861740418, 17.672603665142045, 4.451441241328245, 20.397011492145122, 14.745584459089043, 27.799894225974242, 46.05695305421706, 8.593760451151274, 8.280363194217752, 10.510660351376401, 30.919421580352093, 39.19599400092053, 24.56188268409081, 16.12173620795231, 20.573061560328092, 31.022660531125343, 22.48411141136177, 247.55165964177507, 25.835392700988884, 90.33420370383624, 230.7242253858534, 56.250173773342624, 822.3375461161522, 99.99996094266857, 43.907830256994615, 51.562717368931544, 59.98573977756088, 380.8999747621172, 66.86616418176348, 53.958296109131005, 58.393355676792716, 119.95573081105249, 95.49878634615975, 62.138279083744536, 86.65120513632655, 140.0414948476558, 177.30316636267347, 229.71437132327247, 7.751105220203356, 17.015226222101123, 13.562069525767104, 9.83142377754805, 14.405795729646956, 19.461190868747394, 7.609202937860494, 32.365196931181906, 11.247651189883968, 29.240941124003044, 21.39543919983173, 16.73121483496449, 29.400970550500443, 16.930161940883266, 34.180037564865884, 43.13291286765499, 13.322588731888391, 39.91440919787081, 8.702479115873114, 13.384566343247052, 8.488007277125694, 12.380989397332797, 8.84132314119648, 24.07938961800821, 35.9460227274814, 41.992973490502706, 13.667283348620435, 21.334593736481974, 32.25036105782531, 15.670739572399684, 51.562717368931544, 380.8999747621172, 77.35071971470546, 31.679466946064654, 41.94225344336118, 230.7242253858534, 25.835392700988884, 140.0414948476558, 822.3375461161522, 62.962577844294024, 96.19379161509806, 45.50875717984954, 177.30316636267347, 58.78184734803526, 63.15224799488618, 61.92808305019667, 95.49878634615975, 75.94537285952231, 67.42800423419368, 229.71437132327247, 87.9014000233212, 39.850153325005174, 29.002038977826405, 14.049328783061865, 14.048288376274913, 8.143331504142738, 36.88332667459517, 27.781442161347396, 12.27070377573435, 6.77833972922404, 34.37082337047841, 13.480285342407965, 70.0191616285552, 8.263379921325098, 7.643000907609065, 99.99996094266857, 18.026247131112278, 14.283635815050244, 56.250173773342624, 20.786314146535258, 31.473071194465717, 58.393355676792716, 19.13811200085716, 71.4209357477166, 62.962577844294024, 8.91548644564183, 8.110513123182393, 12.69380062123209, 63.02761831056468, 20.664603828484285, 5.663493361677701, 24.08070689125748, 33.86674565169903, 18.6824748854038, 38.78867086359147, 79.31117085986799, 380.8999747621172, 44.06963142788137, 51.562717368931544, 119.95573081105249, 186.34365771118624, 70.77639918007257, 35.490026558234455, 95.49878634615975, 95.95083792519765, 96.19379161509806, 41.77145106582382, 822.3375461161522, 46.753033641289434, 247.55165964177507, 230.7242253858534, 90.33420370383624, 77.2020937063623, 99.96246491921194, 11.344262061050209, 10.370672625901072, 21.826294787092447, 14.202245682615905, 42.11045882281569, 4.536609698488328, 11.946196659609843, 7.456851005926468, 10.875531799300502, 4.352111043010015, 14.969606058629447, 14.96657772940881, 11.879411777645524, 9.97637026409962, 11.947613680074454, 6.768951629878356, 15.416776029503804, 19.294454295236086, 13.514058538480674, 34.88089428023715, 9.779265379179222, 12.908626356199017, 35.4237616829634, 40.69734868539678, 6.877721972391971, 7.791024750031233, 20.573061560328092, 27.606814698855896, 29.18599801651953, 17.49325210622733, 21.395820187866253, 26.02230306564542, 39.41634586590961, 32.00860528611985, 23.64893665436882, 96.20362717505738, 79.31117085986799, 44.90746364678599, 36.459996144486475, 33.70633222605261, 128.58800784121337, 99.96246491921194, 36.073494790269734, 230.7242253858534, 229.71437132327247, 35.74763191777474, 90.33420370383624, 822.3375461161522, 50.143070167490336, 225.0973712138107, 119.95573081105249, 62.962577844294024, 247.55165964177507, 75.94537285952231, 86.65120513632655, 380.8999747621172, 77.35071971470546, 7.070608987867749, 5.067556826388528, 43.78508509772793, 22.043799746479543, 20.127206135315486, 18.578672414558508, 16.26569110327044, 18.093305701726067, 12.26317833757367, 6.18187852816746, 20.369363588007875, 34.88593838720869, 71.34660371044211, 16.907945081675177, 14.636316038285234, 8.13763132993814, 15.397048173068782, 15.200763070882125, 14.318115057790317, 22.74765039872796, 96.50365928612233, 4.14984937412224, 19.028943054374672, 20.72684715494368, 16.055879486847473, 9.018067645811179, 12.976515191834036, 15.459042012298566, 8.87388140785125, 25.311197312511258, 47.03082849513009, 65.59352630891726, 34.15273075516923, 186.34365771118624, 70.77639918007257, 26.586212734583242, 35.255914899318846, 55.53740628122179, 58.506638213968465, 177.30316636267347, 52.60169036964414, 63.02761831056468, 87.9014000233212, 99.96246491921194, 128.58800784121337, 63.15224799488618, 230.7242253858534, 49.63112342946042, 225.0973712138107, 96.20362717505738, 822.3375461161522, 119.95573081105249, 59.72271054051249, 62.823194542271935, 96.19379161509806, 68.68410694228542, 22.097390890887617, 11.592026055303341, 31.163754995425837, 13.31129294119307, 7.672600047738372, 42.14400593741546, 7.727284994993759, 10.267642307195583, 11.026493187712434, 21.330381621537388, 14.109259522603665, 24.393927290642118, 14.670501819317668, 8.35988524965917, 12.215906379685235, 13.190399203417542, 17.91617403281142, 17.672603665142045, 21.334593736481974, 10.43626880654949, 52.60169036964414, 22.888783403168585, 10.811588580788111, 10.022727845387669, 19.60580684133717, 95.49878634615975, 16.819510197096466, 19.198350598924378, 20.66618954856206, 30.6264016577828, 54.14406675698508, 33.70633222605261, 21.643941435872865, 36.35407108352216, 177.30316636267347, 99.99996094266857, 822.3375461161522, 44.06963142788137, 26.130463730712563, 79.31117085986799, 59.72271054051249, 38.94568507710455, 30.707737238519496, 48.18284220713449, 230.7242253858534, 140.0414948476558, 58.63690351759961, 128.58800784121337, 229.71437132327247, 46.553286817413, 44.80595624919192, 48.91095699731717, 49.63112342946042, 66.86616418176348, 95.95083792519765, 10.62854169939168, 13.947969329387533, 8.569781650368812, 8.680803209238666, 9.565694593263323, 17.387583487087518, 42.03397156682934, 10.680950700207774, 8.801057063143448, 15.05198348928748, 87.9014000233212, 24.720577694246625, 11.849638673199186, 7.548532000941596, 11.567220218351526, 7.243591900959204, 65.59352630891726, 14.72326040202865, 24.07938961800821, 5.919969025193321, 15.281373307809488, 12.496731587686686, 23.403376151875115, 8.045816810336106, 4.748811471353731, 5.491452787428752, 5.7437895086762065, 18.75394562568977, 8.406152917740345, 29.177858571397287, 30.19218121990669, 60.92469234070225, 50.49879149281083, 52.03654207429078, 32.76560286096731, 75.94537285952231, 30.6264016577828, 35.74763191777474, 62.138279083744536, 90.33420370383624, 52.74616108262496, 47.17228981294636, 55.53740628122179, 36.719999689419836, 58.78184734803526, 229.71437132327247, 822.3375461161522, 140.0414948476558, 225.0973712138107, 177.30316636267347, 77.2020937063623, 247.55165964177507, 95.95083792519765, 128.58800784121337, 96.19379161509806, 45.631215439318055, 230.7242253858534], \"Category\": [\"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\"], \"Freq\": [225.0, 380.0, 68.0, 822.0, 229.0, 87.0, 186.0, 65.0, 99.0, 247.0, 86.0, 39.0, 42.0, 95.0, 96.0, 78.0, 43.0, 71.0, 96.0, 90.0, 43.0, 80.0, 79.0, 42.0, 71.0, 70.0, 40.0, 51.0, 52.0, 177.0, 9.843814405505569, 6.597999325915664, 7.409625558382336, 7.475056218601638, 10.028115761786042, 6.67103267269038, 8.251058584042587, 4.851108772839954, 6.291362632917979, 39.916323166479515, 5.64774897439857, 8.58716034349238, 7.482347537721302, 3.5412332436571496, 13.133557412324256, 104.19036239451877, 17.6541707377022, 15.820446015213026, 9.734639409914603, 7.775742063850685, 5.465913300344349, 4.109669431895738, 7.9148824354975424, 5.955949456701312, 9.888440215878367, 3.6885919866455796, 20.80737460858309, 4.990597196194905, 20.409653705033985, 5.956091682734755, 12.918342999554987, 15.065908173599167, 80.02303496158126, 31.956738232943785, 15.232772756442209, 30.14640398679118, 51.11464356577711, 14.440546389054294, 17.287471398547595, 12.798602501843076, 11.650771634675063, 67.96460956107178, 12.4618416013958, 12.783603809803614, 29.60401243276663, 16.102837416372317, 11.14563332200458, 14.085562715414904, 11.510895496294577, 16.578103527870148, 10.745075740383493, 11.693955784381536, 11.972387570335625, 12.645493623241904, 6.72074343138896, 6.082926196716722, 5.312433513594281, 8.139388012192924, 23.452854595327114, 5.977030170862721, 6.637430301975349, 3.162799207148165, 15.775785165644145, 11.333612836576627, 4.7679497826976105, 10.57228957440751, 26.15356737133244, 4.840324745918632, 15.739760181786288, 5.989446570652817, 26.192594416912698, 18.340591640080877, 13.861180857628048, 6.956368821196262, 4.448894947072594, 2.8050305980174963, 3.1240670951389244, 4.993615216501446, 4.542384769769135, 22.811386836388166, 3.997766982276992, 2.3825775854241225, 1.6510091055260894, 2.4684901502537864, 25.667115932681504, 3.256168451049482, 16.227175235135103, 17.80835813764229, 212.87542450886852, 36.66378940285382, 6.6275527990660885, 11.230544183940898, 10.008756908697878, 10.431299474571631, 13.61593145430336, 9.886735470638436, 31.32348997816199, 11.259004946276011, 35.28632685566649, 12.854404048781472, 17.504721775004377, 21.297305220071316, 14.160731613320639, 11.972155913047693, 17.703470891281743, 15.559135906882666, 13.051417539862884, 17.148858758608345, 16.5636564532782, 14.557737973472063, 11.642834772574357, 9.17574833313024, 5.06116653021207, 27.229608199734063, 3.4001675429565252, 15.014031610255737, 10.198154030984904, 4.097017346701181, 4.097048936760848, 4.237942440989335, 9.998412583631648, 21.705856314845494, 6.224814688252652, 16.225900138499252, 6.38124288475508, 3.7995473921415925, 9.37542136781309, 38.820204750211694, 3.7562094674992195, 5.061281073102423, 28.0531078963007, 6.040034470406558, 4.72025348753578, 7.714524139137409, 3.7562565757190547, 8.708625366321272, 6.366162029920663, 21.87730252267483, 7.1890304695777685, 4.09702479412966, 3.0744557617604027, 11.37720289298406, 9.872766591867036, 21.808345948011084, 155.50255652505064, 81.29986938571165, 10.354595549637635, 24.544973218680546, 8.70824284277964, 9.531768027646708, 9.33227212523632, 10.992979866105685, 8.509458304288609, 9.175463552649072, 10.710984260302691, 9.006380125509896, 9.82951375899563, 10.511036029081893, 20.801636079289267, 12.795754133316892, 18.161467179416345, 11.644975550872205, 11.985861958200081, 13.50638043644665, 11.816652819517136, 10.652536755624281, 10.234579602694094, 10.516393325810764, 28.443976365049544, 7.843053978668646, 13.511383072407044, 9.914712522693602, 9.391541470090553, 7.0215769821644285, 115.7768330546229, 39.64916431336219, 6.735662810674476, 13.9880922163085, 7.475021976566656, 17.012766423932675, 2.734586142183535, 8.415268455825862, 14.810303754707753, 10.31894366220071, 10.331882200173865, 2.330160078416823, 2.4486434245957795, 4.700217705789385, 7.021665377243443, 1.7626257720016687, 7.985259087633539, 5.758687781529338, 10.736556813482458, 17.587177950588085, 3.2701736995058974, 3.138494582946191, 3.9608630752875946, 11.463448634469746, 13.510731071435108, 8.675841730567283, 5.758883752606476, 7.14004464182651, 9.963830779024613, 7.426244593701831, 46.099844906677546, 8.247876559309374, 20.96487616295947, 39.25884887455711, 14.201821531361595, 73.1813960722261, 17.66730797486714, 10.659006926879636, 11.427242636820958, 12.1170676665648, 24.874006699905927, 11.999148683172622, 11.108118855374427, 11.296172410969751, 13.416935403696153, 12.535082464207386, 11.449973977779129, 12.11788984062978, 13.19276200807422, 13.144056682323484, 11.655866133518767, 6.149741709142479, 13.290394531679395, 8.377246808276016, 5.993085256136818, 8.257023294618715, 11.104357120546576, 4.174743995665425, 17.733005306351497, 5.86061055811107, 15.19237955815202, 10.857412325117346, 8.364496180315177, 13.291176719873276, 7.642338094357593, 15.349012544763793, 19.105482465435298, 5.807302426633208, 16.794869480013734, 3.596297379098769, 5.281991841929855, 3.307110423494884, 4.703545913153443, 3.343903241390263, 8.9190734495584, 13.276629532281513, 15.481953565205592, 4.992854566302301, 7.64235793318314, 11.146597911584665, 5.401840492713419, 16.964851284380593, 93.11571139076203, 22.617440544426092, 10.158914005206595, 12.158126026062156, 45.140977809376366, 8.340086373638153, 25.45852128233122, 79.55689705085852, 13.952575008974224, 16.73118411744892, 10.930157250132932, 18.734195705945066, 11.845176891589606, 12.1215975562574, 11.844920244838823, 12.67562719689409, 11.001299788204319, 10.435424104282681, 10.762199303454498, 10.448442360447764, 28.292598922907782, 16.794010517181214, 8.010676357860026, 7.903597186994548, 4.254354830993891, 16.350986007501806, 12.055252018258438, 5.248362645116055, 2.8795324537214473, 14.556289264520444, 5.667431162352601, 28.17263940069193, 3.1541085546223004, 2.8795430185728152, 36.653908066686526, 6.47946003308796, 5.09280892107027, 19.82026686630764, 7.055098903323441, 10.6687588627391, 19.59415334647742, 6.348318826409393, 23.434084139541092, 20.456653680981994, 2.879541833329323, 2.592083164904396, 4.016108255230223, 19.832823261239213, 6.480445234164893, 1.7664621533101983, 7.461442442093974, 10.381491072404353, 5.786463457937428, 11.386477241441343, 21.69338207733157, 83.08091291033305, 12.594628475985806, 14.031275610608214, 27.669883710358967, 32.70962995797205, 15.957836823707035, 9.674702905713481, 17.596321946630717, 17.383597760124047, 16.88710144489605, 10.094280552624758, 44.96092106500136, 10.405160265599745, 15.288172131295514, 14.557977903280596, 11.911913372135652, 11.350002509971539, 11.481792864531371, 8.386718622591022, 7.494811527525532, 14.162518430958977, 8.664398694856821, 24.84364616856212, 2.616481917084457, 6.677879632098388, 4.113603917248658, 5.915050315701786, 2.1923301870552887, 7.440548075756942, 7.309901321113412, 5.598780619927701, 4.691354611855614, 5.599135026535212, 3.1659341402036563, 6.993676682816938, 8.664154208764078, 5.914991212857744, 15.241372864121942, 4.244720586059509, 5.5988443909297985, 15.269357147208579, 16.912538900730155, 2.849940572274351, 3.1659191354605545, 8.348511271458396, 11.097204525734883, 11.72974366604092, 6.863115326393852, 8.348262054302209, 9.742239764303422, 14.017650459636751, 11.41325187899275, 8.664181808856785, 25.956885619325988, 21.041144889884976, 13.595745185098536, 11.334639969911644, 10.717094555303586, 24.55498473306396, 20.74096495662641, 10.876024002938443, 32.463246331015114, 26.69276117533241, 10.347055501808642, 15.151596276364442, 35.925572029089416, 11.723605793030007, 20.00445833343777, 15.715349696303585, 11.955141399695483, 16.35737269885638, 12.26470629712506, 11.860862940228268, 15.220599858455818, 11.309505162002926, 5.050944642341784, 3.365684257991135, 23.979119985985044, 12.011967324622471, 10.595055234978368, 9.68039828276747, 8.26335848084052, 8.910517103760561, 5.9313757078922995, 2.941125721427711, 9.679850226878276, 16.40785455699602, 32.87485798097472, 7.604618481070253, 6.5784388253418005, 3.600039497302103, 6.467749660441671, 6.3101123594782065, 5.931687052809065, 9.41246335735919, 39.50172078983669, 1.6932285957731121, 7.60448983458463, 8.263545474034862, 6.248223224535304, 3.488352596217247, 5.0168291331607175, 5.931824289352842, 3.3319945340739046, 9.412246021242634, 17.054783128943576, 23.403335334088315, 12.39129771302928, 52.04701643413849, 21.986131061483444, 9.412490152877622, 11.744050295966126, 16.67578279813854, 16.866720195410903, 31.492111051434964, 13.389108128047075, 14.79111276928598, 17.055124851500082, 16.943117487682176, 18.14893569399139, 12.804167236036665, 21.769282742329487, 11.242109051696286, 18.742081972227457, 12.961459934956931, 21.398406326459842, 13.608209458951407, 11.455453787076662, 11.544262625956009, 9.82566905845741, 59.27335563001129, 15.908981287159895, 7.394075502434358, 18.441330531396236, 6.947049123446189, 3.921449549100814, 20.92160931519268, 3.7449381872263072, 4.89810624532528, 5.199353782202019, 9.287660794520274, 6.06437373183614, 10.353821315169402, 6.214288068495912, 3.44368016172445, 4.898124587975418, 5.199394361499123, 6.791715158740745, 6.66688905299512, 7.95709175844048, 3.7450192392617376, 18.34491069710929, 7.9577063750300985, 3.7316686398475762, 3.4437411018053625, 6.715759429754429, 32.43501888527922, 5.638606043206991, 6.365724764231196, 6.830036594493868, 9.738227962549182, 16.460423939324432, 10.348488160942717, 6.954787823120966, 10.57919016661839, 35.98378007958355, 21.68977562245766, 83.70169892821859, 11.181412288535723, 7.958261099220858, 14.472921461026326, 12.28648862879558, 9.588928831280194, 8.283857968972766, 10.164533397355118, 19.124951114034875, 15.406545335638505, 10.578864402535444, 13.40079048147238, 15.825143671466888, 9.412881112994858, 8.912828509864562, 8.986690899646923, 8.986432476620077, 9.025147832196975, 9.3648645923118, 8.768925344550201, 9.197432467996014, 5.4040826721241695, 5.455827101108681, 5.971229881529043, 10.495877232905013, 24.11039889908805, 5.971245452292099, 4.784866149596228, 7.999031806814991, 46.25940869296463, 12.385682354626184, 5.654942155048587, 3.4618266824915063, 5.191618416587595, 3.124163831581516, 27.654019849389567, 5.668425770811919, 9.033891441904291, 2.1639403609258485, 5.529607782597849, 4.37032037595212, 7.999826013095641, 2.7308734063537834, 1.6103718794271427, 1.8611705318774328, 1.913095224054671, 6.106792219052609, 2.7309232954223512, 9.461350574937482, 9.76477629763674, 19.333116092698116, 15.765481281736598, 15.764886989508721, 9.92898020656746, 20.756833757378683, 9.323271802915807, 10.634871965809129, 16.61154498327215, 20.909417996825056, 14.000563956199242, 12.866358042657156, 14.303674989673306, 10.536990822719513, 13.999762825731931, 28.3237203990627, 47.27006404930755, 19.735911140153, 23.707070382137367, 20.632271012624262, 13.861566616872624, 18.944522931753987, 13.043909085314827, 12.86649148981175, 11.806009794793491, 10.054237084963956, 10.01713853750761]}, \"lambda.step\": 0.01, \"R\": 30};\n", | |
"\n", | |
"function LDAvis_load_lib(url, callback){\n", | |
" var s = document.createElement('script');\n", | |
" s.src = url;\n", | |
" s.async = true;\n", | |
" s.onreadystatechange = s.onload = callback;\n", | |
" s.onerror = function(){console.warn(\"failed to load library \" + url);};\n", | |
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n", | |
"}\n", | |
"\n", | |
"if(typeof(LDAvis) !== \"undefined\"){\n", | |
" // already loaded: just create the visualization\n", | |
" !function(LDAvis){\n", | |
" new LDAvis(\"#\" + \"ldavis_el247220212014195049402964239\", ldavis_el247220212014195049402964239_data);\n", | |
" }(LDAvis);\n", | |
"}else if(typeof define === \"function\" && define.amd){\n", | |
" // require.js is available: use it to load d3/LDAvis\n", | |
" require.config({paths: {d3: \"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min\"}});\n", | |
" require([\"d3\"], function(d3){\n", | |
" window.d3 = d3;\n", | |
" LDAvis_load_lib(\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.js\", function(){\n", | |
" new LDAvis(\"#\" + \"ldavis_el247220212014195049402964239\", ldavis_el247220212014195049402964239_data);\n", | |
" });\n", | |
" });\n", | |
"}else{\n", | |
" // require.js not available: dynamically load d3 & LDAvis\n", | |
" LDAvis_load_lib(\"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js\", function(){\n", | |
" LDAvis_load_lib(\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.js\", function(){\n", | |
" new LDAvis(\"#\" + \"ldavis_el247220212014195049402964239\", ldavis_el247220212014195049402964239_data);\n", | |
" })\n", | |
" });\n", | |
"}\n", | |
"</script>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"execution_count": 57, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"pyLDAvis.display(LDAvis_prepared)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Describing text with LDA\n", | |
"Beyond data exploration, one of the key uses for an LDA model is providing a compact, quantitative description of natural language text. Once an LDA model has been trained, it can be used to represent free text as a mixture of the topics the model learned from the original corpus. This mixture can be interpreted as a probability distribution across the topics, so the LDA representation of a paragraph of text might look like 50% _Topic A_, 20% _Topic B_, 20% _Topic C_, and 10% _Topic D_.\n", | |
"\n", | |
"To use an LDA model to generate a vector representation of new text, you'll need to apply any text preprocessing steps you used on the model's training corpus to the new text, too. For our model, the preprocessing steps we used include:\n", | |
"1. Using spaCy to remove punctuation and lemmatize the text\n", | |
"1. Applying our first-order phrase model to join word pairs\n", | |
"1. Applying our second-order phrase model to join longer phrases\n", | |
"1. Removing stopwords\n", | |
"1. Creating a bag-of-words representation\n", | |
"\n", | |
"Once you've applied these preprocessing steps to the new text, it's ready to pass directly to the model to create an LDA representation. The `lda_description(...)` function will perform all these steps for us, including printing the resulting topical description of the input text." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def get_sample_desc(desc):\n", | |
" \"\"\"\n", | |
" retrieve a particular review index\n", | |
" from the reviews file and return it\n", | |
" \"\"\"\n", | |
" \n", | |
" return list(it.islice(line_review(desc_txt_filepath),\n", | |
" desc, desc+1))[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def lda_description(desc_text, min_topic_freq=0.05):\n", | |
" \"\"\"\n", | |
" accept the original text of a review and (1) parse it with spaCy,\n", | |
" (2) apply text pre-proccessing steps, (3) create a bag-of-words\n", | |
" representation, (4) create an LDA representation, and\n", | |
" (5) print a sorted list of the top topics in the LDA representation\n", | |
" \"\"\"\n", | |
" \n", | |
" # parse the review text with spaCy\n", | |
" parsed_desc = nlp(desc_text)\n", | |
" \n", | |
" # lemmatize the text and remove punctuation and whitespace\n", | |
" unigram_desc = [token.lemma_ for token in parsed_desc\n", | |
" if not punct_space(token)]\n", | |
" \n", | |
" # apply the first-order and secord-order phrase models\n", | |
" bigram_desc = bigram_model[unigram_desc]\n", | |
" trigram_desc = trigram_model[bigram_desc]\n", | |
" \n", | |
" # remove any remaining stopwords\n", | |
" trigram_desc = [term for term in trigram_desc\n", | |
" if not term in spacy.en.stop_words.STOP_WORDS]\n", | |
" \n", | |
" # create a bag-of-words representation\n", | |
" desc_bow = trigram_dictionary.doc2bow(trigram_desc)\n", | |
" \n", | |
" # create an LDA representation\n", | |
" desc_lda = lda[desc_bow]\n", | |
" \n", | |
" # sort with the most highly related topics first\n", | |
" desc_lda = sorted(desc_lda, key=lambda topic_lda: -topic_lda[1])\n", | |
" \n", | |
" for topic_number, freq in desc_lda:\n", | |
" if freq < min_topic_freq:\n", | |
" break\n", | |
" \n", | |
" # print the most highly related topic names and frequencies\n", | |
" print('{:2} {}'.format(topic_names[topic_number],\n", | |
" round(freq, 2)))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 62, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" appropriate based on surrogate or intermediate endpoints reasonably \n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"sample_desc = get_sample_desc(89)\n", | |
"print(sample_desc)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"UNK3 0.7\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\aashis_tiwari\\AppData\\Local\\Continuum\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\gensim\\models\\phrases.py:274: UserWarning: For a faster implementation, use the gensim.models.phrases.Phraser class\n", | |
" warnings.warn(\"For a faster implementation, use the gensim.models.phrases.Phraser class\")\n" | |
] | |
} | |
], | |
"source": [ | |
"lda_description(sample_desc)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 67, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" Nutrition Month Learn how the changes to the Nutrition Facts Label \n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"sample_desc= get_sample_desc(313)\n", | |
"print(sample_desc)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 68, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"UNK9 0.5\n", | |
"UNK1 0.39\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\aashis_tiwari\\AppData\\Local\\Continuum\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\gensim\\models\\phrases.py:274: UserWarning: For a faster implementation, use the gensim.models.phrases.Phraser class\n", | |
" warnings.warn(\"For a faster implementation, use the gensim.models.phrases.Phraser class\")\n" | |
] | |
} | |
], | |
"source": [ | |
"lda_description(sample_desc)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Word Vector Embedding with Word2Vec" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Pop quiz! Can you complete this text snippet?\n", | |
"\n", | |
"<br><br>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<br><br><br>\n", | |
"You just demonstrated the core machine learning concept behind word vector embedding models!\n", | |
"<br><br><br>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The goal of *word vector embedding models*, or *word vector models* for short, is to learn dense, numerical vector representations for each term in a corpus vocabulary. If the model is successful, the vectors it learns about each term should encode some information about the *meaning* or *concept* the term represents, and the relationship between it and other terms in the vocabulary. Word vector models are also fully unsupervised &mdash they learn all of these meanings and relationships solely by analyzing the text of the corpus, without any advance knowledge provided.\n", | |
"\n", | |
"Perhaps the best-known word vector model is [word2vec](https://arxiv.org/pdf/1301.3781v3.pdf), originally proposed in 2013. The general idea of word2vec is, for a given *focus word*, to use the *context* of the word — i.e., the other words immediately before and after it — to provide hints about what the focus word might mean. To do this, word2vec uses a *sliding window* technique, where it considers snippets of text only a few tokens long at a time.\n", | |
"\n", | |
"At the start of the learning process, the model initializes random vectors for all terms in the corpus vocabulary. The model then slides the window across every snippet of text in the corpus, with each word taking turns as the focus word. Each time the model considers a new snippet, it tries to learn some information about the focus word based on the surrouding context, and it \"nudges\" the words' vector representations accordingly. One complete pass sliding the window across all of the corpus text is known as a training *epoch*. It's common to train a word2vec model for multiple passes/epochs over the corpus. Over time, the model rearranges the terms' vector representations such that terms that frequently appear in similar contexts have vector representations that are *close* to each other in vector space.\n", | |
"\n", | |
"For a deeper dive into word2vec's machine learning process, see [here](https://arxiv.org/pdf/1411.2738v4.pdf).\n", | |
"\n", | |
"Word2vec has a number of user-defined hyperparameters, including:\n", | |
"- The dimensionality of the vectors. Typical choices include a few dozen to several hundred.\n", | |
"- The width of the sliding window, in tokens. Five is a common default choice, but narrower and wider windows are possible.\n", | |
"- The number of training epochs.\n", | |
"\n", | |
"For using word2vec in Python, [gensim](https://rare-technologies.com/deep-learning-with-word2vec-and-gensim/) comes to the rescue again! It offers a [highly-optimized](https://rare-technologies.com/word2vec-in-python-part-two-optimizing/), [parallelized](https://rare-technologies.com/parallelizing-word2vec-in-python/) implementation of the word2vec algorithm with its [Word2Vec](https://radimrehurek.com/gensim/models/word2vec.html) class." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 69, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from gensim.models import Word2Vec\n", | |
"\n", | |
"trigram_sentences = LineSentence(trigram_sentences_filepath)\n", | |
"word2vec_filepath = os.path.join(intermediate_directory, 'word2vec_model_all')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We'll train our word2vec model using the normalized sentences with our phrase models applied. We'll use 100-dimensional vectors, and set up our training process to run for twelve epochs." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 70, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"12 training epochs so far.\n", | |
"Wall time: 5.86 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"# this is a bit time consuming - make the if statement True\n", | |
"# if you want to train the word2vec model yourself.\n", | |
"if 1 == 1:\n", | |
"\n", | |
" # initiate the model and perform the first epoch of training\n", | |
" desc2vec = Word2Vec(trigram_sentences, size=100, window=5,\n", | |
" min_count=20, sg=1, workers=1)\n", | |
" \n", | |
" desc2vec.save(word2vec_filepath)\n", | |
"\n", | |
" # perform another 11 epochs of training\n", | |
" for i in range(1,12):\n", | |
"\n", | |
" desc2vec.train(trigram_sentences)\n", | |
" desc2vec.save(word2vec_filepath)\n", | |
" \n", | |
"# load the finished model from disk\n", | |
"desc2vec = Word2Vec.load(word2vec_filepath)\n", | |
"desc2vec.init_sims()\n", | |
"\n", | |
"print('{} training epochs so far.'.format(desc2vec.train_count))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 71, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"342 terms in the recall2vec vocabulary.\n" | |
] | |
} | |
], | |
"source": [ | |
"print('{:,} terms in the recall2vec vocabulary.'.format(len(desc2vec.wv.vocab)))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's take a peek at the word vectors our model has learned. We'll create a pandas DataFrame with the terms as the row labels, and the 100 dimensions of the word vector model as the columns." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>90</th>\n", | |
" <th>91</th>\n", | |
" <th>92</th>\n", | |
" <th>93</th>\n", | |
" <th>94</th>\n", | |
" <th>95</th>\n", | |
" <th>96</th>\n", | |
" <th>97</th>\n", | |
" <th>98</th>\n", | |
" <th>99</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>and</th>\n", | |
" <td>-0.100746</td>\n", | |
" <td>-0.138351</td>\n", | |
" <td>-0.065297</td>\n", | |
" <td>0.005219</td>\n", | |
" <td>-0.064796</td>\n", | |
" <td>0.121452</td>\n", | |
" <td>0.006101</td>\n", | |
" <td>0.318425</td>\n", | |
" <td>-0.126467</td>\n", | |
" <td>-0.380278</td>\n", | |
" <td>...</td>\n", | |
" <td>0.096713</td>\n", | |
" <td>-0.085560</td>\n", | |
" <td>0.658379</td>\n", | |
" <td>-0.261223</td>\n", | |
" <td>-0.071542</td>\n", | |
" <td>-0.181564</td>\n", | |
" <td>-0.014139</td>\n", | |
" <td>-0.078411</td>\n", | |
" <td>-0.020709</td>\n", | |
" <td>-0.003840</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>the</th>\n", | |
" <td>-0.018843</td>\n", | |
" <td>-0.254019</td>\n", | |
" <td>0.248716</td>\n", | |
" <td>-0.190241</td>\n", | |
" <td>-0.161016</td>\n", | |
" <td>0.074082</td>\n", | |
" <td>-0.170656</td>\n", | |
" <td>-0.169396</td>\n", | |
" <td>0.166634</td>\n", | |
" <td>-0.168215</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.044964</td>\n", | |
" <td>-0.344766</td>\n", | |
" <td>0.553132</td>\n", | |
" <td>-0.360908</td>\n", | |
" <td>-0.294987</td>\n", | |
" <td>0.203086</td>\n", | |
" <td>-0.141143</td>\n", | |
" <td>-0.366750</td>\n", | |
" <td>-0.245072</td>\n", | |
" <td>-0.123179</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>fda</th>\n", | |
" <td>0.077440</td>\n", | |
" <td>-0.207896</td>\n", | |
" <td>0.269897</td>\n", | |
" <td>0.162306</td>\n", | |
" <td>-0.050462</td>\n", | |
" <td>0.209877</td>\n", | |
" <td>-0.053269</td>\n", | |
" <td>0.277357</td>\n", | |
" <td>-0.344069</td>\n", | |
" <td>-0.206898</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.087527</td>\n", | |
" <td>-0.177071</td>\n", | |
" <td>0.737522</td>\n", | |
" <td>-0.076948</td>\n", | |
" <td>-0.137575</td>\n", | |
" <td>-0.101029</td>\n", | |
" <td>-0.451010</td>\n", | |
" <td>0.197468</td>\n", | |
" <td>-0.017319</td>\n", | |
" <td>0.105784</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>to</th>\n", | |
" <td>-0.313661</td>\n", | |
" <td>-0.045335</td>\n", | |
" <td>0.089928</td>\n", | |
" <td>-0.039525</td>\n", | |
" <td>-0.035883</td>\n", | |
" <td>0.346876</td>\n", | |
" <td>0.005409</td>\n", | |
" <td>0.336155</td>\n", | |
" <td>-0.142290</td>\n", | |
" <td>-0.306906</td>\n", | |
" <td>...</td>\n", | |
" <td>0.022846</td>\n", | |
" <td>-0.148466</td>\n", | |
" <td>0.072798</td>\n", | |
" <td>-0.228918</td>\n", | |
" <td>0.020430</td>\n", | |
" <td>-0.064480</td>\n", | |
" <td>0.007564</td>\n", | |
" <td>-0.602445</td>\n", | |
" <td>-0.077429</td>\n", | |
" <td>-0.071003</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>of</th>\n", | |
" <td>-0.001229</td>\n", | |
" <td>0.145161</td>\n", | |
" <td>-0.044128</td>\n", | |
" <td>0.021164</td>\n", | |
" <td>-0.402606</td>\n", | |
" <td>-0.152231</td>\n", | |
" <td>-0.153400</td>\n", | |
" <td>0.136081</td>\n", | |
" <td>0.073617</td>\n", | |
" <td>-0.055965</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.155064</td>\n", | |
" <td>-0.245643</td>\n", | |
" <td>0.417786</td>\n", | |
" <td>-0.173233</td>\n", | |
" <td>0.190563</td>\n", | |
" <td>-0.218247</td>\n", | |
" <td>-0.337335</td>\n", | |
" <td>-0.130735</td>\n", | |
" <td>0.017617</td>\n", | |
" <td>-0.065531</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>for</th>\n", | |
" <td>0.042443</td>\n", | |
" <td>0.284259</td>\n", | |
" <td>-0.266208</td>\n", | |
" <td>0.018548</td>\n", | |
" <td>-0.168083</td>\n", | |
" <td>0.183761</td>\n", | |
" <td>-0.005107</td>\n", | |
" <td>0.149055</td>\n", | |
" <td>0.020657</td>\n", | |
" <td>-0.257668</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.319383</td>\n", | |
" <td>0.010611</td>\n", | |
" <td>0.213731</td>\n", | |
" <td>0.004790</td>\n", | |
" <td>-0.263443</td>\n", | |
" <td>-0.361939</td>\n", | |
" <td>-0.567995</td>\n", | |
" <td>-0.588227</td>\n", | |
" <td>-0.011243</td>\n", | |
" <td>0.094405</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>be</th>\n", | |
" <td>0.123699</td>\n", | |
" <td>-0.042003</td>\n", | |
" <td>-0.086108</td>\n", | |
" <td>-0.259822</td>\n", | |
" <td>-0.393856</td>\n", | |
" <td>0.329685</td>\n", | |
" <td>-0.109405</td>\n", | |
" <td>-0.234873</td>\n", | |
" <td>-0.156663</td>\n", | |
" <td>-0.137926</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.382407</td>\n", | |
" <td>-0.409366</td>\n", | |
" <td>0.301657</td>\n", | |
" <td>-0.129887</td>\n", | |
" <td>-0.080969</td>\n", | |
" <td>0.249293</td>\n", | |
" <td>0.010820</td>\n", | |
" <td>-0.530850</td>\n", | |
" <td>-0.094270</td>\n", | |
" <td>-0.052313</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>a</th>\n", | |
" <td>-0.141158</td>\n", | |
" <td>-0.072698</td>\n", | |
" <td>0.483275</td>\n", | |
" <td>-0.025337</td>\n", | |
" <td>-0.195636</td>\n", | |
" <td>0.584957</td>\n", | |
" <td>-0.140417</td>\n", | |
" <td>0.080549</td>\n", | |
" <td>-0.007860</td>\n", | |
" <td>-0.360949</td>\n", | |
" <td>...</td>\n", | |
" <td>0.057271</td>\n", | |
" <td>-0.366956</td>\n", | |
" <td>0.507431</td>\n", | |
" <td>-0.140168</td>\n", | |
" <td>-0.306811</td>\n", | |
" <td>0.294925</td>\n", | |
" <td>-0.060049</td>\n", | |
" <td>-0.188385</td>\n", | |
" <td>-0.162945</td>\n", | |
" <td>-0.171217</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>in</th>\n", | |
" <td>0.117567</td>\n", | |
" <td>-0.305639</td>\n", | |
" <td>0.376731</td>\n", | |
" <td>-0.012498</td>\n", | |
" <td>-0.344085</td>\n", | |
" <td>0.148888</td>\n", | |
" <td>0.235434</td>\n", | |
" <td>0.266651</td>\n", | |
" <td>0.153096</td>\n", | |
" <td>-0.142239</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.518270</td>\n", | |
" <td>0.122086</td>\n", | |
" <td>0.506006</td>\n", | |
" <td>-0.362484</td>\n", | |
" <td>-0.330909</td>\n", | |
" <td>-0.322330</td>\n", | |
" <td>-0.178481</td>\n", | |
" <td>-0.121798</td>\n", | |
" <td>0.535190</td>\n", | |
" <td>-0.152664</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>information</th>\n", | |
" <td>0.205563</td>\n", | |
" <td>0.389818</td>\n", | |
" <td>-0.072361</td>\n", | |
" <td>0.305935</td>\n", | |
" <td>-0.253689</td>\n", | |
" <td>0.419405</td>\n", | |
" <td>-0.259644</td>\n", | |
" <td>-0.377517</td>\n", | |
" <td>0.043917</td>\n", | |
" <td>-0.027382</td>\n", | |
" <td>...</td>\n", | |
" <td>0.128690</td>\n", | |
" <td>-0.172287</td>\n", | |
" <td>0.063882</td>\n", | |
" <td>0.148540</td>\n", | |
" <td>-0.184837</td>\n", | |
" <td>-0.053577</td>\n", | |
" <td>0.385220</td>\n", | |
" <td>-0.080125</td>\n", | |
" <td>-0.290900</td>\n", | |
" <td>-0.150123</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>on</th>\n", | |
" <td>-0.447642</td>\n", | |
" <td>0.087927</td>\n", | |
" <td>0.101921</td>\n", | |
" <td>-0.095423</td>\n", | |
" <td>-0.244817</td>\n", | |
" <td>0.300095</td>\n", | |
" <td>-0.165446</td>\n", | |
" <td>0.435071</td>\n", | |
" <td>0.079664</td>\n", | |
" <td>-0.477322</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.127808</td>\n", | |
" <td>-0.369725</td>\n", | |
" <td>0.878684</td>\n", | |
" <td>-0.102216</td>\n", | |
" <td>-0.069740</td>\n", | |
" <td>-0.369469</td>\n", | |
" <td>-0.197006</td>\n", | |
" <td>-0.157897</td>\n", | |
" <td>0.202324</td>\n", | |
" <td>-0.127013</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>drug</th>\n", | |
" <td>-0.360087</td>\n", | |
" <td>0.276967</td>\n", | |
" <td>-0.217438</td>\n", | |
" <td>-0.554882</td>\n", | |
" <td>0.135724</td>\n", | |
" <td>0.015508</td>\n", | |
" <td>-0.177810</td>\n", | |
" <td>0.301658</td>\n", | |
" <td>0.196800</td>\n", | |
" <td>0.180682</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.241992</td>\n", | |
" <td>-0.422613</td>\n", | |
" <td>-0.096184</td>\n", | |
" <td>-0.014281</td>\n", | |
" <td>-0.358708</td>\n", | |
" <td>0.127414</td>\n", | |
" <td>0.215810</td>\n", | |
" <td>-0.090970</td>\n", | |
" <td>-0.330258</td>\n", | |
" <td>-0.244218</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>or</th>\n", | |
" <td>0.151288</td>\n", | |
" <td>0.054059</td>\n", | |
" <td>-0.286732</td>\n", | |
" <td>-0.172177</td>\n", | |
" <td>-0.458850</td>\n", | |
" <td>0.540072</td>\n", | |
" <td>-0.026338</td>\n", | |
" <td>0.591284</td>\n", | |
" <td>0.046338</td>\n", | |
" <td>-0.475921</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.236732</td>\n", | |
" <td>-0.054929</td>\n", | |
" <td>0.108159</td>\n", | |
" <td>0.056314</td>\n", | |
" <td>-0.030552</td>\n", | |
" <td>0.006140</td>\n", | |
" <td>-0.018091</td>\n", | |
" <td>0.140386</td>\n", | |
" <td>-0.191139</td>\n", | |
" <td>0.004434</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>food</th>\n", | |
" <td>0.397796</td>\n", | |
" <td>-0.316026</td>\n", | |
" <td>-0.421999</td>\n", | |
" <td>-0.214100</td>\n", | |
" <td>-0.304859</td>\n", | |
" <td>-0.145224</td>\n", | |
" <td>-0.686528</td>\n", | |
" <td>0.363681</td>\n", | |
" <td>0.023887</td>\n", | |
" <td>0.143341</td>\n", | |
" <td>...</td>\n", | |
" <td>0.473187</td>\n", | |
" <td>-0.202776</td>\n", | |
" <td>0.423708</td>\n", | |
" <td>0.389340</td>\n", | |
" <td>-0.423926</td>\n", | |
" <td>-0.001822</td>\n", | |
" <td>-0.120035</td>\n", | |
" <td>-0.066002</td>\n", | |
" <td>0.069482</td>\n", | |
" <td>0.010907</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>safety</th>\n", | |
" <td>0.487511</td>\n", | |
" <td>-0.051075</td>\n", | |
" <td>-0.222669</td>\n", | |
" <td>-0.172921</td>\n", | |
" <td>0.134228</td>\n", | |
" <td>-0.374469</td>\n", | |
" <td>0.048707</td>\n", | |
" <td>0.131852</td>\n", | |
" <td>-0.418310</td>\n", | |
" <td>-0.020593</td>\n", | |
" <td>...</td>\n", | |
" <td>0.291371</td>\n", | |
" <td>0.527957</td>\n", | |
" <td>0.115536</td>\n", | |
" <td>-0.268350</td>\n", | |
" <td>-0.239712</td>\n", | |
" <td>-0.113702</td>\n", | |
" <td>-0.277039</td>\n", | |
" <td>-0.497297</td>\n", | |
" <td>-0.129665</td>\n", | |
" <td>0.271672</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>product</th>\n", | |
" <td>0.006890</td>\n", | |
" <td>0.065497</td>\n", | |
" <td>-0.016135</td>\n", | |
" <td>-0.207994</td>\n", | |
" <td>0.357020</td>\n", | |
" <td>-0.002370</td>\n", | |
" <td>-0.011452</td>\n", | |
" <td>0.155764</td>\n", | |
" <td>-0.054490</td>\n", | |
" <td>-0.525449</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.644613</td>\n", | |
" <td>-0.294260</td>\n", | |
" <td>0.122726</td>\n", | |
" <td>-0.413096</td>\n", | |
" <td>0.482507</td>\n", | |
" <td>0.260640</td>\n", | |
" <td>-0.114928</td>\n", | |
" <td>0.113158</td>\n", | |
" <td>-0.188181</td>\n", | |
" <td>-0.664799</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>that</th>\n", | |
" <td>-0.214687</td>\n", | |
" <td>-0.350257</td>\n", | |
" <td>0.036187</td>\n", | |
" <td>0.269608</td>\n", | |
" <td>-0.176686</td>\n", | |
" <td>0.147215</td>\n", | |
" <td>0.151621</td>\n", | |
" <td>0.378561</td>\n", | |
" <td>0.029806</td>\n", | |
" <td>-0.233842</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.469595</td>\n", | |
" <td>-0.112579</td>\n", | |
" <td>0.075520</td>\n", | |
" <td>-0.171400</td>\n", | |
" <td>-0.008943</td>\n", | |
" <td>0.028535</td>\n", | |
" <td>0.409268</td>\n", | |
" <td>-0.269048</td>\n", | |
" <td>-0.106427</td>\n", | |
" <td>0.378756</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>with</th>\n", | |
" <td>-0.130633</td>\n", | |
" <td>0.080453</td>\n", | |
" <td>0.136298</td>\n", | |
" <td>0.194036</td>\n", | |
" <td>-0.113387</td>\n", | |
" <td>-0.332066</td>\n", | |
" <td>-0.186781</td>\n", | |
" <td>0.441635</td>\n", | |
" <td>-0.019013</td>\n", | |
" <td>0.180890</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.078067</td>\n", | |
" <td>0.235780</td>\n", | |
" <td>-0.418466</td>\n", | |
" <td>-0.138317</td>\n", | |
" <td>-0.423310</td>\n", | |
" <td>-0.569176</td>\n", | |
" <td>0.115315</td>\n", | |
" <td>-0.749452</td>\n", | |
" <td>-0.364128</td>\n", | |
" <td>0.384155</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>amp</th>\n", | |
" <td>0.362384</td>\n", | |
" <td>0.076609</td>\n", | |
" <td>0.261473</td>\n", | |
" <td>-0.177449</td>\n", | |
" <td>-0.162025</td>\n", | |
" <td>0.036550</td>\n", | |
" <td>-0.382035</td>\n", | |
" <td>0.428040</td>\n", | |
" <td>-0.032117</td>\n", | |
" <td>-0.134149</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.468903</td>\n", | |
" <td>0.374384</td>\n", | |
" <td>0.595388</td>\n", | |
" <td>0.146841</td>\n", | |
" <td>0.059823</td>\n", | |
" <td>-0.967745</td>\n", | |
" <td>-0.528763</td>\n", | |
" <td>-0.331435</td>\n", | |
" <td>0.111991</td>\n", | |
" <td>-0.294012</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>about</th>\n", | |
" <td>0.114440</td>\n", | |
" <td>0.045978</td>\n", | |
" <td>-0.097556</td>\n", | |
" <td>-0.140161</td>\n", | |
" <td>-0.015015</td>\n", | |
" <td>-0.540265</td>\n", | |
" <td>-0.320632</td>\n", | |
" <td>-0.008932</td>\n", | |
" <td>0.230474</td>\n", | |
" <td>-0.181113</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.380960</td>\n", | |
" <td>-0.294769</td>\n", | |
" <td>0.287081</td>\n", | |
" <td>0.035232</td>\n", | |
" <td>0.108893</td>\n", | |
" <td>-0.547401</td>\n", | |
" <td>-0.028685</td>\n", | |
" <td>-0.301636</td>\n", | |
" <td>0.081251</td>\n", | |
" <td>0.108835</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>report</th>\n", | |
" <td>-0.313987</td>\n", | |
" <td>0.244142</td>\n", | |
" <td>-0.004940</td>\n", | |
" <td>-0.598828</td>\n", | |
" <td>0.134250</td>\n", | |
" <td>-0.183933</td>\n", | |
" <td>-0.016496</td>\n", | |
" <td>-0.228413</td>\n", | |
" <td>-0.393799</td>\n", | |
" <td>-0.139182</td>\n", | |
" <td>...</td>\n", | |
" <td>0.386709</td>\n", | |
" <td>0.627971</td>\n", | |
" <td>0.437185</td>\n", | |
" <td>-0.328031</td>\n", | |
" <td>-0.653785</td>\n", | |
" <td>-0.237967</td>\n", | |
" <td>-0.187325</td>\n", | |
" <td>-0.336905</td>\n", | |
" <td>-0.112999</td>\n", | |
" <td>0.312071</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>by</th>\n", | |
" <td>-0.095231</td>\n", | |
" <td>0.569164</td>\n", | |
" <td>-0.214255</td>\n", | |
" <td>0.067282</td>\n", | |
" <td>-0.362989</td>\n", | |
" <td>0.147236</td>\n", | |
" <td>0.174758</td>\n", | |
" <td>0.113209</td>\n", | |
" <td>-0.363191</td>\n", | |
" <td>-0.561365</td>\n", | |
" <td>...</td>\n", | |
" <td>0.161771</td>\n", | |
" <td>-0.400323</td>\n", | |
" <td>0.641776</td>\n", | |
" <td>-0.247879</td>\n", | |
" <td>-0.109704</td>\n", | |
" <td>0.250926</td>\n", | |
" <td>-0.616060</td>\n", | |
" <td>-0.687903</td>\n", | |
" <td>-0.012988</td>\n", | |
" <td>0.731686</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>use</th>\n", | |
" <td>-0.421068</td>\n", | |
" <td>-0.043560</td>\n", | |
" <td>-0.198231</td>\n", | |
" <td>-0.648959</td>\n", | |
" <td>0.036279</td>\n", | |
" <td>-0.229710</td>\n", | |
" <td>-0.432814</td>\n", | |
" <td>0.224305</td>\n", | |
" <td>0.303759</td>\n", | |
" <td>-0.281524</td>\n", | |
" <td>...</td>\n", | |
" <td>0.057442</td>\n", | |
" <td>0.054344</td>\n", | |
" <td>0.399492</td>\n", | |
" <td>0.155429</td>\n", | |
" <td>-0.440814</td>\n", | |
" <td>-0.308274</td>\n", | |
" <td>-0.043453</td>\n", | |
" <td>-0.346402</td>\n", | |
" <td>-0.212835</td>\n", | |
" <td>-0.432940</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>health</th>\n", | |
" <td>-0.326454</td>\n", | |
" <td>0.373163</td>\n", | |
" <td>-0.061973</td>\n", | |
" <td>-0.024176</td>\n", | |
" <td>-0.419274</td>\n", | |
" <td>0.268802</td>\n", | |
" <td>-0.177778</td>\n", | |
" <td>0.145061</td>\n", | |
" <td>-0.462755</td>\n", | |
" <td>0.480433</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.356257</td>\n", | |
" <td>0.055684</td>\n", | |
" <td>-0.239780</td>\n", | |
" <td>-0.022977</td>\n", | |
" <td>0.290967</td>\n", | |
" <td>0.433374</td>\n", | |
" <td>0.002770</td>\n", | |
" <td>-0.305807</td>\n", | |
" <td>0.059909</td>\n", | |
" <td>-0.072929</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>from</th>\n", | |
" <td>-0.181439</td>\n", | |
" <td>0.015173</td>\n", | |
" <td>0.195936</td>\n", | |
" <td>-0.076520</td>\n", | |
" <td>-0.253937</td>\n", | |
" <td>-0.256348</td>\n", | |
" <td>-0.405886</td>\n", | |
" <td>0.271998</td>\n", | |
" <td>-0.486178</td>\n", | |
" <td>-0.553072</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.282554</td>\n", | |
" <td>0.349735</td>\n", | |
" <td>0.305131</td>\n", | |
" <td>0.021239</td>\n", | |
" <td>-0.092425</td>\n", | |
" <td>0.469081</td>\n", | |
" <td>0.114366</td>\n", | |
" <td>-0.325147</td>\n", | |
" <td>0.707807</td>\n", | |
" <td>-0.181133</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>have</th>\n", | |
" <td>-0.249859</td>\n", | |
" <td>0.093082</td>\n", | |
" <td>-0.043083</td>\n", | |
" <td>-0.118076</td>\n", | |
" <td>-0.468669</td>\n", | |
" <td>0.090478</td>\n", | |
" <td>0.309928</td>\n", | |
" <td>-0.359987</td>\n", | |
" <td>0.090183</td>\n", | |
" <td>0.141688</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.459824</td>\n", | |
" <td>-0.281754</td>\n", | |
" <td>-0.019800</td>\n", | |
" <td>0.209448</td>\n", | |
" <td>0.184108</td>\n", | |
" <td>-0.272827</td>\n", | |
" <td>-0.095660</td>\n", | |
" <td>-0.142429</td>\n", | |
" <td>-0.179014</td>\n", | |
" <td>0.024573</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>program</th>\n", | |
" <td>-0.338854</td>\n", | |
" <td>0.377686</td>\n", | |
" <td>-0.018415</td>\n", | |
" <td>0.285381</td>\n", | |
" <td>0.062863</td>\n", | |
" <td>0.159740</td>\n", | |
" <td>-0.224956</td>\n", | |
" <td>0.124830</td>\n", | |
" <td>0.292951</td>\n", | |
" <td>0.535436</td>\n", | |
" <td>...</td>\n", | |
" <td>0.119695</td>\n", | |
" <td>-0.123484</td>\n", | |
" <td>0.443025</td>\n", | |
" <td>-0.173975</td>\n", | |
" <td>-0.095587</td>\n", | |
" <td>0.225543</td>\n", | |
" <td>0.099020</td>\n", | |
" <td>-0.280379</td>\n", | |
" <td>-0.131907</td>\n", | |
" <td>-0.017053</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>include</th>\n", | |
" <td>-0.123905</td>\n", | |
" <td>0.013977</td>\n", | |
" <td>0.036592</td>\n", | |
" <td>0.352391</td>\n", | |
" <td>0.263708</td>\n", | |
" <td>0.087568</td>\n", | |
" <td>-0.284553</td>\n", | |
" <td>-0.029643</td>\n", | |
" <td>0.024506</td>\n", | |
" <td>0.081021</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.357194</td>\n", | |
" <td>-0.250013</td>\n", | |
" <td>0.086160</td>\n", | |
" <td>-0.073455</td>\n", | |
" <td>0.343971</td>\n", | |
" <td>0.777451</td>\n", | |
" <td>-0.029254</td>\n", | |
" <td>-0.032943</td>\n", | |
" <td>-0.092698</td>\n", | |
" <td>-0.255635</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>at</th>\n", | |
" <td>-0.729614</td>\n", | |
" <td>-0.058433</td>\n", | |
" <td>0.402095</td>\n", | |
" <td>0.190323</td>\n", | |
" <td>-0.091917</td>\n", | |
" <td>0.355921</td>\n", | |
" <td>-0.238918</td>\n", | |
" <td>-0.038873</td>\n", | |
" <td>-0.285274</td>\n", | |
" <td>-0.863475</td>\n", | |
" <td>...</td>\n", | |
" <td>0.142590</td>\n", | |
" <td>0.143932</td>\n", | |
" <td>0.142247</td>\n", | |
" <td>-0.402303</td>\n", | |
" <td>-0.283580</td>\n", | |
" <td>-0.227817</td>\n", | |
" <td>0.224417</td>\n", | |
" <td>-0.174887</td>\n", | |
" <td>0.204955</td>\n", | |
" <td>-0.350327</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>you</th>\n", | |
" <td>0.302627</td>\n", | |
" <td>-0.056322</td>\n", | |
" <td>0.098500</td>\n", | |
" <td>-0.016854</td>\n", | |
" <td>-0.298733</td>\n", | |
" <td>0.293087</td>\n", | |
" <td>-0.110522</td>\n", | |
" <td>0.185406</td>\n", | |
" <td>0.022000</td>\n", | |
" <td>-0.256998</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.436992</td>\n", | |
" <td>-0.353025</td>\n", | |
" <td>0.228000</td>\n", | |
" <td>0.003869</td>\n", | |
" <td>0.044580</td>\n", | |
" <td>-0.016428</td>\n", | |
" <td>0.173895</td>\n", | |
" <td>0.191892</td>\n", | |
" <td>0.656802</td>\n", | |
" <td>0.121159</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>icsrs</th>\n", | |
" <td>-0.670037</td>\n", | |
" <td>0.986876</td>\n", | |
" <td>-0.056592</td>\n", | |
" <td>-0.056073</td>\n", | |
" <td>-0.387325</td>\n", | |
" <td>-0.066606</td>\n", | |
" <td>0.444818</td>\n", | |
" <td>0.074839</td>\n", | |
" <td>-0.227907</td>\n", | |
" <td>-0.315928</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.192464</td>\n", | |
" <td>0.171606</td>\n", | |
" <td>0.282802</td>\n", | |
" <td>0.064755</td>\n", | |
" <td>-0.483003</td>\n", | |
" <td>0.597319</td>\n", | |
" <td>0.138253</td>\n", | |
" <td>-0.216816</td>\n", | |
" <td>0.173946</td>\n", | |
" <td>0.455001</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>globalization</th>\n", | |
" <td>-0.756844</td>\n", | |
" <td>0.164792</td>\n", | |
" <td>0.273173</td>\n", | |
" <td>0.456379</td>\n", | |
" <td>-0.530471</td>\n", | |
" <td>0.419227</td>\n", | |
" <td>-0.193192</td>\n", | |
" <td>0.011261</td>\n", | |
" <td>-0.318280</td>\n", | |
" <td>0.212006</td>\n", | |
" <td>...</td>\n", | |
" <td>0.035500</td>\n", | |
" <td>-0.218479</td>\n", | |
" <td>0.272757</td>\n", | |
" <td>-0.001142</td>\n", | |
" <td>-0.395881</td>\n", | |
" <td>-0.569329</td>\n", | |
" <td>0.261788</td>\n", | |
" <td>-0.411565</td>\n", | |
" <td>0.215041</td>\n", | |
" <td>-0.138835</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>number_of</th>\n", | |
" <td>-0.086318</td>\n", | |
" <td>-0.028921</td>\n", | |
" <td>0.600787</td>\n", | |
" <td>0.241617</td>\n", | |
" <td>-0.099482</td>\n", | |
" <td>-0.128321</td>\n", | |
" <td>-0.325093</td>\n", | |
" <td>-0.573600</td>\n", | |
" <td>-0.236761</td>\n", | |
" <td>-0.461566</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.505880</td>\n", | |
" <td>-0.347972</td>\n", | |
" <td>0.558945</td>\n", | |
" <td>-0.242274</td>\n", | |
" <td>-0.407655</td>\n", | |
" <td>0.265753</td>\n", | |
" <td>0.093912</td>\n", | |
" <td>-0.054711</td>\n", | |
" <td>0.028873</td>\n", | |
" <td>0.205457</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>colorectal_cancer</th>\n", | |
" <td>0.211972</td>\n", | |
" <td>0.017612</td>\n", | |
" <td>0.019982</td>\n", | |
" <td>0.036653</td>\n", | |
" <td>-0.599313</td>\n", | |
" <td>-0.387482</td>\n", | |
" <td>-0.131744</td>\n", | |
" <td>0.319752</td>\n", | |
" <td>-0.584418</td>\n", | |
" <td>0.390028</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.138576</td>\n", | |
" <td>-0.715155</td>\n", | |
" <td>-0.281504</td>\n", | |
" <td>-0.004626</td>\n", | |
" <td>0.075382</td>\n", | |
" <td>-0.141210</td>\n", | |
" <td>0.241308</td>\n", | |
" <td>-0.086936</td>\n", | |
" <td>0.515117</td>\n", | |
" <td>-0.573498</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>forms</th>\n", | |
" <td>-0.145254</td>\n", | |
" <td>0.462194</td>\n", | |
" <td>-0.179167</td>\n", | |
" <td>-0.349078</td>\n", | |
" <td>0.021910</td>\n", | |
" <td>-0.228419</td>\n", | |
" <td>-0.323046</td>\n", | |
" <td>-0.177653</td>\n", | |
" <td>0.444248</td>\n", | |
" <td>0.253918</td>\n", | |
" <td>...</td>\n", | |
" <td>0.337017</td>\n", | |
" <td>0.195908</td>\n", | |
" <td>0.205079</td>\n", | |
" <td>-0.332613</td>\n", | |
" <td>-0.465560</td>\n", | |
" <td>0.348318</td>\n", | |
" <td>-0.214522</td>\n", | |
" <td>0.061973</td>\n", | |
" <td>-0.548650</td>\n", | |
" <td>-0.881712</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>history</th>\n", | |
" <td>-0.277609</td>\n", | |
" <td>-0.151912</td>\n", | |
" <td>-0.314123</td>\n", | |
" <td>0.367317</td>\n", | |
" <td>-0.522343</td>\n", | |
" <td>-0.093235</td>\n", | |
" <td>-0.059787</td>\n", | |
" <td>0.542382</td>\n", | |
" <td>-0.681108</td>\n", | |
" <td>-0.108222</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.355341</td>\n", | |
" <td>-0.190357</td>\n", | |
" <td>0.106477</td>\n", | |
" <td>-0.024125</td>\n", | |
" <td>-0.065548</td>\n", | |
" <td>-0.226189</td>\n", | |
" <td>0.665346</td>\n", | |
" <td>-0.475630</td>\n", | |
" <td>0.857476</td>\n", | |
" <td>-0.289213</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>keep</th>\n", | |
" <td>0.140956</td>\n", | |
" <td>-0.316210</td>\n", | |
" <td>-0.067040</td>\n", | |
" <td>-0.046209</td>\n", | |
" <td>-0.539977</td>\n", | |
" <td>-0.114015</td>\n", | |
" <td>-0.455139</td>\n", | |
" <td>-0.436040</td>\n", | |
" <td>0.050257</td>\n", | |
" <td>-0.290427</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.486789</td>\n", | |
" <td>0.348746</td>\n", | |
" <td>0.011119</td>\n", | |
" <td>-0.118036</td>\n", | |
" <td>-0.721557</td>\n", | |
" <td>-0.051965</td>\n", | |
" <td>-0.094726</td>\n", | |
" <td>-0.511012</td>\n", | |
" <td>-0.227344</td>\n", | |
" <td>0.279749</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2013</th>\n", | |
" <td>0.137664</td>\n", | |
" <td>0.315705</td>\n", | |
" <td>0.389434</td>\n", | |
" <td>0.640446</td>\n", | |
" <td>-0.238969</td>\n", | |
" <td>0.887329</td>\n", | |
" <td>-0.377709</td>\n", | |
" <td>-0.020824</td>\n", | |
" <td>-0.455942</td>\n", | |
" <td>-0.011407</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.537258</td>\n", | |
" <td>-0.044758</td>\n", | |
" <td>0.682783</td>\n", | |
" <td>-0.247158</td>\n", | |
" <td>-0.568316</td>\n", | |
" <td>-0.301633</td>\n", | |
" <td>-0.068568</td>\n", | |
" <td>-0.222043</td>\n", | |
" <td>0.071884</td>\n", | |
" <td>-0.268367</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>international_programs</th>\n", | |
" <td>-0.424657</td>\n", | |
" <td>-0.083993</td>\n", | |
" <td>0.253733</td>\n", | |
" <td>0.188070</td>\n", | |
" <td>-0.617051</td>\n", | |
" <td>0.055949</td>\n", | |
" <td>-0.939139</td>\n", | |
" <td>0.422482</td>\n", | |
" <td>-0.673611</td>\n", | |
" <td>0.270475</td>\n", | |
" <td>...</td>\n", | |
" <td>0.057390</td>\n", | |
" <td>-0.171511</td>\n", | |
" <td>-0.008584</td>\n", | |
" <td>-0.142159</td>\n", | |
" <td>-0.160608</td>\n", | |
" <td>-0.377406</td>\n", | |
" <td>-0.228351</td>\n", | |
" <td>-0.449024</td>\n", | |
" <td>0.464816</td>\n", | |
" <td>0.046855</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>what_'_new</th>\n", | |
" <td>0.153876</td>\n", | |
" <td>0.367768</td>\n", | |
" <td>0.067474</td>\n", | |
" <td>0.032732</td>\n", | |
" <td>0.033835</td>\n", | |
" <td>0.041480</td>\n", | |
" <td>-0.223525</td>\n", | |
" <td>-0.173208</td>\n", | |
" <td>0.370511</td>\n", | |
" <td>-0.040983</td>\n", | |
" <td>...</td>\n", | |
" <td>0.056626</td>\n", | |
" <td>0.079707</td>\n", | |
" <td>0.237721</td>\n", | |
" <td>-0.176131</td>\n", | |
" <td>0.144763</td>\n", | |
" <td>0.258459</td>\n", | |
" <td>-0.087715</td>\n", | |
" <td>-0.342799</td>\n", | |
" <td>-0.259102</td>\n", | |
" <td>-0.649807</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>control</th>\n", | |
" <td>-0.860089</td>\n", | |
" <td>-0.127252</td>\n", | |
" <td>0.116539</td>\n", | |
" <td>-0.016179</td>\n", | |
" <td>-0.233742</td>\n", | |
" <td>-0.291944</td>\n", | |
" <td>0.107615</td>\n", | |
" <td>-0.219637</td>\n", | |
" <td>-0.367005</td>\n", | |
" <td>-0.194711</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.532837</td>\n", | |
" <td>0.247479</td>\n", | |
" <td>0.456738</td>\n", | |
" <td>-0.616877</td>\n", | |
" <td>-0.442985</td>\n", | |
" <td>-0.194895</td>\n", | |
" <td>-0.007769</td>\n", | |
" <td>-0.274928</td>\n", | |
" <td>0.194839</td>\n", | |
" <td>0.486809</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>fdasia</th>\n", | |
" <td>-0.347185</td>\n", | |
" <td>0.586335</td>\n", | |
" <td>0.366932</td>\n", | |
" <td>-0.271150</td>\n", | |
" <td>-0.153863</td>\n", | |
" <td>-0.034427</td>\n", | |
" <td>-0.097134</td>\n", | |
" <td>-0.431051</td>\n", | |
" <td>-0.191800</td>\n", | |
" <td>0.021186</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.208896</td>\n", | |
" <td>-0.440354</td>\n", | |
" <td>-0.060221</td>\n", | |
" <td>-0.166022</td>\n", | |
" <td>0.100170</td>\n", | |
" <td>1.181485</td>\n", | |
" <td>-0.529689</td>\n", | |
" <td>0.140017</td>\n", | |
" <td>-0.067644</td>\n", | |
" <td>-0.222800</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>i</th>\n", | |
" <td>0.391899</td>\n", | |
" <td>0.633526</td>\n", | |
" <td>0.268310</td>\n", | |
" <td>-0.213539</td>\n", | |
" <td>0.076514</td>\n", | |
" <td>0.411798</td>\n", | |
" <td>-0.359650</td>\n", | |
" <td>-0.234105</td>\n", | |
" <td>-0.250011</td>\n", | |
" <td>-0.058321</td>\n", | |
" <td>...</td>\n", | |
" <td>0.299877</td>\n", | |
" <td>0.097668</td>\n", | |
" <td>0.424918</td>\n", | |
" <td>-0.090243</td>\n", | |
" <td>-0.253589</td>\n", | |
" <td>0.198022</td>\n", | |
" <td>-0.582632</td>\n", | |
" <td>-0.021947</td>\n", | |
" <td>0.316409</td>\n", | |
" <td>-0.328948</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>cell</th>\n", | |
" <td>-0.155692</td>\n", | |
" <td>0.498993</td>\n", | |
" <td>0.231628</td>\n", | |
" <td>-0.347468</td>\n", | |
" <td>-0.589963</td>\n", | |
" <td>-0.039768</td>\n", | |
" <td>0.164418</td>\n", | |
" <td>0.318974</td>\n", | |
" <td>0.236614</td>\n", | |
" <td>0.151425</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.637283</td>\n", | |
" <td>0.042374</td>\n", | |
" <td>0.212400</td>\n", | |
" <td>-0.055624</td>\n", | |
" <td>-0.466727</td>\n", | |
" <td>0.389587</td>\n", | |
" <td>0.333547</td>\n", | |
" <td>-0.050657</td>\n", | |
" <td>0.168267</td>\n", | |
" <td>-0.069596</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>base</th>\n", | |
" <td>-0.579550</td>\n", | |
" <td>0.292780</td>\n", | |
" <td>0.053669</td>\n", | |
" <td>0.255007</td>\n", | |
" <td>-0.630248</td>\n", | |
" <td>-0.252487</td>\n", | |
" <td>0.366064</td>\n", | |
" <td>0.548158</td>\n", | |
" <td>0.037580</td>\n", | |
" <td>-0.515223</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.509152</td>\n", | |
" <td>0.127183</td>\n", | |
" <td>-0.063938</td>\n", | |
" <td>-0.237319</td>\n", | |
" <td>-0.305357</td>\n", | |
" <td>0.554057</td>\n", | |
" <td>-0.025749</td>\n", | |
" <td>0.192353</td>\n", | |
" <td>-0.126196</td>\n", | |
" <td>0.117023</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>sponsor</th>\n", | |
" <td>-0.099899</td>\n", | |
" <td>0.580624</td>\n", | |
" <td>0.164068</td>\n", | |
" <td>0.061469</td>\n", | |
" <td>0.197635</td>\n", | |
" <td>0.064082</td>\n", | |
" <td>-0.346589</td>\n", | |
" <td>0.380860</td>\n", | |
" <td>0.355099</td>\n", | |
" <td>-0.087534</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.502186</td>\n", | |
" <td>-0.234967</td>\n", | |
" <td>0.225882</td>\n", | |
" <td>-0.227946</td>\n", | |
" <td>-0.552372</td>\n", | |
" <td>0.056744</td>\n", | |
" <td>0.063037</td>\n", | |
" <td>-0.426578</td>\n", | |
" <td>-0.430068</td>\n", | |
" <td>-0.762406</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>manufacturer</th>\n", | |
" <td>-0.499727</td>\n", | |
" <td>0.115834</td>\n", | |
" <td>-0.246060</td>\n", | |
" <td>0.086174</td>\n", | |
" <td>-0.078136</td>\n", | |
" <td>-0.191147</td>\n", | |
" <td>-0.041554</td>\n", | |
" <td>0.192491</td>\n", | |
" <td>-0.091883</td>\n", | |
" <td>-0.377924</td>\n", | |
" <td>...</td>\n", | |
" <td>0.154475</td>\n", | |
" <td>0.140800</td>\n", | |
" <td>0.079055</td>\n", | |
" <td>-0.026018</td>\n", | |
" <td>-0.528376</td>\n", | |
" <td>-0.084371</td>\n", | |
" <td>0.177761</td>\n", | |
" <td>-0.195612</td>\n", | |
" <td>0.082558</td>\n", | |
" <td>-0.293565</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>after</th>\n", | |
" <td>0.219793</td>\n", | |
" <td>0.726680</td>\n", | |
" <td>-0.298105</td>\n", | |
" <td>-0.464775</td>\n", | |
" <td>0.228087</td>\n", | |
" <td>0.060263</td>\n", | |
" <td>-0.142558</td>\n", | |
" <td>-0.732037</td>\n", | |
" <td>-0.413099</td>\n", | |
" <td>-0.102423</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.297644</td>\n", | |
" <td>0.174359</td>\n", | |
" <td>0.395813</td>\n", | |
" <td>0.184923</td>\n", | |
" <td>-0.560124</td>\n", | |
" <td>0.143060</td>\n", | |
" <td>-0.279353</td>\n", | |
" <td>0.086654</td>\n", | |
" <td>-0.203589</td>\n", | |
" <td>0.021547</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>radiation_emitting_products</th>\n", | |
" <td>-0.517896</td>\n", | |
" <td>0.345238</td>\n", | |
" <td>0.152520</td>\n", | |
" <td>0.355690</td>\n", | |
" <td>0.226520</td>\n", | |
" <td>0.261320</td>\n", | |
" <td>-0.123914</td>\n", | |
" <td>0.186653</td>\n", | |
" <td>0.177671</td>\n", | |
" <td>0.259878</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.435348</td>\n", | |
" <td>0.277482</td>\n", | |
" <td>-0.301940</td>\n", | |
" <td>-0.322466</td>\n", | |
" <td>-0.066306</td>\n", | |
" <td>-0.245094</td>\n", | |
" <td>-0.696399</td>\n", | |
" <td>-0.589838</td>\n", | |
" <td>-0.056358</td>\n", | |
" <td>-0.468397</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>reports</th>\n", | |
" <td>-0.819913</td>\n", | |
" <td>0.672922</td>\n", | |
" <td>0.118040</td>\n", | |
" <td>-0.208243</td>\n", | |
" <td>-0.358976</td>\n", | |
" <td>-0.218042</td>\n", | |
" <td>0.023756</td>\n", | |
" <td>-0.087797</td>\n", | |
" <td>-0.087497</td>\n", | |
" <td>-0.167597</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.016510</td>\n", | |
" <td>0.224201</td>\n", | |
" <td>0.714018</td>\n", | |
" <td>-0.067980</td>\n", | |
" <td>0.026771</td>\n", | |
" <td>0.385963</td>\n", | |
" <td>-0.262126</td>\n", | |
" <td>-0.142826</td>\n", | |
" <td>-0.066285</td>\n", | |
" <td>-0.161707</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>propose</th>\n", | |
" <td>-0.291297</td>\n", | |
" <td>0.065714</td>\n", | |
" <td>0.231442</td>\n", | |
" <td>-0.173887</td>\n", | |
" <td>-0.794620</td>\n", | |
" <td>-0.301345</td>\n", | |
" <td>0.312986</td>\n", | |
" <td>0.188217</td>\n", | |
" <td>-0.334185</td>\n", | |
" <td>-0.069473</td>\n", | |
" <td>...</td>\n", | |
" <td>0.026771</td>\n", | |
" <td>0.273053</td>\n", | |
" <td>0.969158</td>\n", | |
" <td>-0.517199</td>\n", | |
" <td>0.173537</td>\n", | |
" <td>-0.233968</td>\n", | |
" <td>-0.628791</td>\n", | |
" <td>-1.071669</td>\n", | |
" <td>0.045059</td>\n", | |
" <td>-0.019422</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>state</th>\n", | |
" <td>-0.194972</td>\n", | |
" <td>-0.334247</td>\n", | |
" <td>-0.294576</td>\n", | |
" <td>-0.301573</td>\n", | |
" <td>-0.745521</td>\n", | |
" <td>-0.365826</td>\n", | |
" <td>0.323370</td>\n", | |
" <td>0.032289</td>\n", | |
" <td>0.044034</td>\n", | |
" <td>0.208383</td>\n", | |
" <td>...</td>\n", | |
" <td>0.195559</td>\n", | |
" <td>-0.146882</td>\n", | |
" <td>0.125041</td>\n", | |
" <td>0.036993</td>\n", | |
" <td>0.238543</td>\n", | |
" <td>-0.068296</td>\n", | |
" <td>-0.066709</td>\n", | |
" <td>-0.224307</td>\n", | |
" <td>-0.036184</td>\n", | |
" <td>-0.192618</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>good</th>\n", | |
" <td>0.107126</td>\n", | |
" <td>0.441641</td>\n", | |
" <td>0.234304</td>\n", | |
" <td>0.386480</td>\n", | |
" <td>0.083099</td>\n", | |
" <td>0.268010</td>\n", | |
" <td>-0.102117</td>\n", | |
" <td>-0.368875</td>\n", | |
" <td>-0.528338</td>\n", | |
" <td>-0.511297</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.223398</td>\n", | |
" <td>-0.090410</td>\n", | |
" <td>0.192321</td>\n", | |
" <td>-0.209002</td>\n", | |
" <td>-0.249478</td>\n", | |
" <td>0.310825</td>\n", | |
" <td>-0.500189</td>\n", | |
" <td>-0.544082</td>\n", | |
" <td>-0.050430</td>\n", | |
" <td>-0.113742</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>instruction</th>\n", | |
" <td>-0.087943</td>\n", | |
" <td>0.607522</td>\n", | |
" <td>-0.029160</td>\n", | |
" <td>-0.416139</td>\n", | |
" <td>-0.224984</td>\n", | |
" <td>-0.330316</td>\n", | |
" <td>0.617920</td>\n", | |
" <td>-0.185820</td>\n", | |
" <td>-0.237984</td>\n", | |
" <td>-0.320105</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.005201</td>\n", | |
" <td>0.224316</td>\n", | |
" <td>0.183818</td>\n", | |
" <td>0.411965</td>\n", | |
" <td>-0.583193</td>\n", | |
" <td>-0.007658</td>\n", | |
" <td>-0.223261</td>\n", | |
" <td>-0.108533</td>\n", | |
" <td>-0.170367</td>\n", | |
" <td>0.090651</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>pediatric</th>\n", | |
" <td>-0.544837</td>\n", | |
" <td>0.127466</td>\n", | |
" <td>0.021078</td>\n", | |
" <td>-0.018524</td>\n", | |
" <td>-0.358618</td>\n", | |
" <td>0.121592</td>\n", | |
" <td>-0.442547</td>\n", | |
" <td>0.141838</td>\n", | |
" <td>0.077958</td>\n", | |
" <td>0.225415</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.327372</td>\n", | |
" <td>0.183654</td>\n", | |
" <td>0.329552</td>\n", | |
" <td>-0.039640</td>\n", | |
" <td>-0.260104</td>\n", | |
" <td>-0.096961</td>\n", | |
" <td>-0.070001</td>\n", | |
" <td>-0.088493</td>\n", | |
" <td>-0.082268</td>\n", | |
" <td>-0.255420</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>assistance</th>\n", | |
" <td>0.063515</td>\n", | |
" <td>0.326815</td>\n", | |
" <td>-0.259475</td>\n", | |
" <td>0.223801</td>\n", | |
" <td>-0.038623</td>\n", | |
" <td>-0.022181</td>\n", | |
" <td>-0.480358</td>\n", | |
" <td>0.408379</td>\n", | |
" <td>-0.095099</td>\n", | |
" <td>0.316031</td>\n", | |
" <td>...</td>\n", | |
" <td>0.201947</td>\n", | |
" <td>-0.032633</td>\n", | |
" <td>-0.299168</td>\n", | |
" <td>-0.756430</td>\n", | |
" <td>-0.081868</td>\n", | |
" <td>-0.429563</td>\n", | |
" <td>-0.711387</td>\n", | |
" <td>-0.083787</td>\n", | |
" <td>-0.266346</td>\n", | |
" <td>0.450779</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>opportunity</th>\n", | |
" <td>0.026528</td>\n", | |
" <td>0.074498</td>\n", | |
" <td>0.156543</td>\n", | |
" <td>0.154675</td>\n", | |
" <td>0.007096</td>\n", | |
" <td>0.282774</td>\n", | |
" <td>-0.243165</td>\n", | |
" <td>0.424377</td>\n", | |
" <td>-0.482240</td>\n", | |
" <td>0.333609</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.065741</td>\n", | |
" <td>-0.501121</td>\n", | |
" <td>0.519100</td>\n", | |
" <td>-0.054072</td>\n", | |
" <td>-0.234008</td>\n", | |
" <td>-0.586764</td>\n", | |
" <td>-0.114451</td>\n", | |
" <td>-0.314732</td>\n", | |
" <td>0.258744</td>\n", | |
" <td>-0.287433</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>print</th>\n", | |
" <td>0.398673</td>\n", | |
" <td>0.049656</td>\n", | |
" <td>0.055799</td>\n", | |
" <td>-0.028932</td>\n", | |
" <td>-0.152425</td>\n", | |
" <td>0.034694</td>\n", | |
" <td>-0.088721</td>\n", | |
" <td>-0.066744</td>\n", | |
" <td>-0.448654</td>\n", | |
" <td>-0.380830</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.095950</td>\n", | |
" <td>-0.211475</td>\n", | |
" <td>0.155255</td>\n", | |
" <td>0.350879</td>\n", | |
" <td>-0.153447</td>\n", | |
" <td>0.106156</td>\n", | |
" <td>-0.424197</td>\n", | |
" <td>-0.484203</td>\n", | |
" <td>0.269765</td>\n", | |
" <td>-0.559124</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>communication</th>\n", | |
" <td>-0.084541</td>\n", | |
" <td>-0.120396</td>\n", | |
" <td>-0.184933</td>\n", | |
" <td>0.563721</td>\n", | |
" <td>-0.253147</td>\n", | |
" <td>-0.255660</td>\n", | |
" <td>-0.603447</td>\n", | |
" <td>0.375056</td>\n", | |
" <td>-0.396814</td>\n", | |
" <td>0.228875</td>\n", | |
" <td>...</td>\n", | |
" <td>0.200097</td>\n", | |
" <td>0.108296</td>\n", | |
" <td>0.123390</td>\n", | |
" <td>-0.680676</td>\n", | |
" <td>-0.443950</td>\n", | |
" <td>0.020271</td>\n", | |
" <td>-0.292703</td>\n", | |
" <td>-0.072843</td>\n", | |
" <td>-0.289178</td>\n", | |
" <td>0.099642</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>registration</th>\n", | |
" <td>-0.081540</td>\n", | |
" <td>0.165071</td>\n", | |
" <td>0.312197</td>\n", | |
" <td>-0.411570</td>\n", | |
" <td>0.002028</td>\n", | |
" <td>-0.123760</td>\n", | |
" <td>-0.441877</td>\n", | |
" <td>0.222860</td>\n", | |
" <td>0.245775</td>\n", | |
" <td>-0.002378</td>\n", | |
" <td>...</td>\n", | |
" <td>0.212378</td>\n", | |
" <td>-0.000226</td>\n", | |
" <td>-0.199470</td>\n", | |
" <td>-0.170622</td>\n", | |
" <td>0.001819</td>\n", | |
" <td>0.271526</td>\n", | |
" <td>-0.584541</td>\n", | |
" <td>-0.156514</td>\n", | |
" <td>0.128360</td>\n", | |
" <td>-0.626579</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>342 rows × 100 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 4 \\\n", | |
"and -0.100746 -0.138351 -0.065297 0.005219 -0.064796 \n", | |
"the -0.018843 -0.254019 0.248716 -0.190241 -0.161016 \n", | |
"fda 0.077440 -0.207896 0.269897 0.162306 -0.050462 \n", | |
"to -0.313661 -0.045335 0.089928 -0.039525 -0.035883 \n", | |
"of -0.001229 0.145161 -0.044128 0.021164 -0.402606 \n", | |
"for 0.042443 0.284259 -0.266208 0.018548 -0.168083 \n", | |
"be 0.123699 -0.042003 -0.086108 -0.259822 -0.393856 \n", | |
"a -0.141158 -0.072698 0.483275 -0.025337 -0.195636 \n", | |
"in 0.117567 -0.305639 0.376731 -0.012498 -0.344085 \n", | |
"information 0.205563 0.389818 -0.072361 0.305935 -0.253689 \n", | |
"on -0.447642 0.087927 0.101921 -0.095423 -0.244817 \n", | |
"drug -0.360087 0.276967 -0.217438 -0.554882 0.135724 \n", | |
"or 0.151288 0.054059 -0.286732 -0.172177 -0.458850 \n", | |
"food 0.397796 -0.316026 -0.421999 -0.214100 -0.304859 \n", | |
"safety 0.487511 -0.051075 -0.222669 -0.172921 0.134228 \n", | |
"product 0.006890 0.065497 -0.016135 -0.207994 0.357020 \n", | |
"that -0.214687 -0.350257 0.036187 0.269608 -0.176686 \n", | |
"with -0.130633 0.080453 0.136298 0.194036 -0.113387 \n", | |
"amp 0.362384 0.076609 0.261473 -0.177449 -0.162025 \n", | |
"about 0.114440 0.045978 -0.097556 -0.140161 -0.015015 \n", | |
"report -0.313987 0.244142 -0.004940 -0.598828 0.134250 \n", | |
"by -0.095231 0.569164 -0.214255 0.067282 -0.362989 \n", | |
"use -0.421068 -0.043560 -0.198231 -0.648959 0.036279 \n", | |
"health -0.326454 0.373163 -0.061973 -0.024176 -0.419274 \n", | |
"from -0.181439 0.015173 0.195936 -0.076520 -0.253937 \n", | |
"have -0.249859 0.093082 -0.043083 -0.118076 -0.468669 \n", | |
"program -0.338854 0.377686 -0.018415 0.285381 0.062863 \n", | |
"include -0.123905 0.013977 0.036592 0.352391 0.263708 \n", | |
"at -0.729614 -0.058433 0.402095 0.190323 -0.091917 \n", | |
"you 0.302627 -0.056322 0.098500 -0.016854 -0.298733 \n", | |
"... ... ... ... ... ... \n", | |
"icsrs -0.670037 0.986876 -0.056592 -0.056073 -0.387325 \n", | |
"globalization -0.756844 0.164792 0.273173 0.456379 -0.530471 \n", | |
"number_of -0.086318 -0.028921 0.600787 0.241617 -0.099482 \n", | |
"colorectal_cancer 0.211972 0.017612 0.019982 0.036653 -0.599313 \n", | |
"forms -0.145254 0.462194 -0.179167 -0.349078 0.021910 \n", | |
"history -0.277609 -0.151912 -0.314123 0.367317 -0.522343 \n", | |
"keep 0.140956 -0.316210 -0.067040 -0.046209 -0.539977 \n", | |
"2013 0.137664 0.315705 0.389434 0.640446 -0.238969 \n", | |
"international_programs -0.424657 -0.083993 0.253733 0.188070 -0.617051 \n", | |
"what_'_new 0.153876 0.367768 0.067474 0.032732 0.033835 \n", | |
"control -0.860089 -0.127252 0.116539 -0.016179 -0.233742 \n", | |
"fdasia -0.347185 0.586335 0.366932 -0.271150 -0.153863 \n", | |
"i 0.391899 0.633526 0.268310 -0.213539 0.076514 \n", | |
"cell -0.155692 0.498993 0.231628 -0.347468 -0.589963 \n", | |
"base -0.579550 0.292780 0.053669 0.255007 -0.630248 \n", | |
"sponsor -0.099899 0.580624 0.164068 0.061469 0.197635 \n", | |
"manufacturer -0.499727 0.115834 -0.246060 0.086174 -0.078136 \n", | |
"after 0.219793 0.726680 -0.298105 -0.464775 0.228087 \n", | |
"radiation_emitting_products -0.517896 0.345238 0.152520 0.355690 0.226520 \n", | |
"reports -0.819913 0.672922 0.118040 -0.208243 -0.358976 \n", | |
"propose -0.291297 0.065714 0.231442 -0.173887 -0.794620 \n", | |
"state -0.194972 -0.334247 -0.294576 -0.301573 -0.745521 \n", | |
"good 0.107126 0.441641 0.234304 0.386480 0.083099 \n", | |
"instruction -0.087943 0.607522 -0.029160 -0.416139 -0.224984 \n", | |
"pediatric -0.544837 0.127466 0.021078 -0.018524 -0.358618 \n", | |
"assistance 0.063515 0.326815 -0.259475 0.223801 -0.038623 \n", | |
"opportunity 0.026528 0.074498 0.156543 0.154675 0.007096 \n", | |
"print 0.398673 0.049656 0.055799 -0.028932 -0.152425 \n", | |
"communication -0.084541 -0.120396 -0.184933 0.563721 -0.253147 \n", | |
"registration -0.081540 0.165071 0.312197 -0.411570 0.002028 \n", | |
"\n", | |
" 5 6 7 8 9 \\\n", | |
"and 0.121452 0.006101 0.318425 -0.126467 -0.380278 \n", | |
"the 0.074082 -0.170656 -0.169396 0.166634 -0.168215 \n", | |
"fda 0.209877 -0.053269 0.277357 -0.344069 -0.206898 \n", | |
"to 0.346876 0.005409 0.336155 -0.142290 -0.306906 \n", | |
"of -0.152231 -0.153400 0.136081 0.073617 -0.055965 \n", | |
"for 0.183761 -0.005107 0.149055 0.020657 -0.257668 \n", | |
"be 0.329685 -0.109405 -0.234873 -0.156663 -0.137926 \n", | |
"a 0.584957 -0.140417 0.080549 -0.007860 -0.360949 \n", | |
"in 0.148888 0.235434 0.266651 0.153096 -0.142239 \n", | |
"information 0.419405 -0.259644 -0.377517 0.043917 -0.027382 \n", | |
"on 0.300095 -0.165446 0.435071 0.079664 -0.477322 \n", | |
"drug 0.015508 -0.177810 0.301658 0.196800 0.180682 \n", | |
"or 0.540072 -0.026338 0.591284 0.046338 -0.475921 \n", | |
"food -0.145224 -0.686528 0.363681 0.023887 0.143341 \n", | |
"safety -0.374469 0.048707 0.131852 -0.418310 -0.020593 \n", | |
"product -0.002370 -0.011452 0.155764 -0.054490 -0.525449 \n", | |
"that 0.147215 0.151621 0.378561 0.029806 -0.233842 \n", | |
"with -0.332066 -0.186781 0.441635 -0.019013 0.180890 \n", | |
"amp 0.036550 -0.382035 0.428040 -0.032117 -0.134149 \n", | |
"about -0.540265 -0.320632 -0.008932 0.230474 -0.181113 \n", | |
"report -0.183933 -0.016496 -0.228413 -0.393799 -0.139182 \n", | |
"by 0.147236 0.174758 0.113209 -0.363191 -0.561365 \n", | |
"use -0.229710 -0.432814 0.224305 0.303759 -0.281524 \n", | |
"health 0.268802 -0.177778 0.145061 -0.462755 0.480433 \n", | |
"from -0.256348 -0.405886 0.271998 -0.486178 -0.553072 \n", | |
"have 0.090478 0.309928 -0.359987 0.090183 0.141688 \n", | |
"program 0.159740 -0.224956 0.124830 0.292951 0.535436 \n", | |
"include 0.087568 -0.284553 -0.029643 0.024506 0.081021 \n", | |
"at 0.355921 -0.238918 -0.038873 -0.285274 -0.863475 \n", | |
"you 0.293087 -0.110522 0.185406 0.022000 -0.256998 \n", | |
"... ... ... ... ... ... \n", | |
"icsrs -0.066606 0.444818 0.074839 -0.227907 -0.315928 \n", | |
"globalization 0.419227 -0.193192 0.011261 -0.318280 0.212006 \n", | |
"number_of -0.128321 -0.325093 -0.573600 -0.236761 -0.461566 \n", | |
"colorectal_cancer -0.387482 -0.131744 0.319752 -0.584418 0.390028 \n", | |
"forms -0.228419 -0.323046 -0.177653 0.444248 0.253918 \n", | |
"history -0.093235 -0.059787 0.542382 -0.681108 -0.108222 \n", | |
"keep -0.114015 -0.455139 -0.436040 0.050257 -0.290427 \n", | |
"2013 0.887329 -0.377709 -0.020824 -0.455942 -0.011407 \n", | |
"international_programs 0.055949 -0.939139 0.422482 -0.673611 0.270475 \n", | |
"what_'_new 0.041480 -0.223525 -0.173208 0.370511 -0.040983 \n", | |
"control -0.291944 0.107615 -0.219637 -0.367005 -0.194711 \n", | |
"fdasia -0.034427 -0.097134 -0.431051 -0.191800 0.021186 \n", | |
"i 0.411798 -0.359650 -0.234105 -0.250011 -0.058321 \n", | |
"cell -0.039768 0.164418 0.318974 0.236614 0.151425 \n", | |
"base -0.252487 0.366064 0.548158 0.037580 -0.515223 \n", | |
"sponsor 0.064082 -0.346589 0.380860 0.355099 -0.087534 \n", | |
"manufacturer -0.191147 -0.041554 0.192491 -0.091883 -0.377924 \n", | |
"after 0.060263 -0.142558 -0.732037 -0.413099 -0.102423 \n", | |
"radiation_emitting_products 0.261320 -0.123914 0.186653 0.177671 0.259878 \n", | |
"reports -0.218042 0.023756 -0.087797 -0.087497 -0.167597 \n", | |
"propose -0.301345 0.312986 0.188217 -0.334185 -0.069473 \n", | |
"state -0.365826 0.323370 0.032289 0.044034 0.208383 \n", | |
"good 0.268010 -0.102117 -0.368875 -0.528338 -0.511297 \n", | |
"instruction -0.330316 0.617920 -0.185820 -0.237984 -0.320105 \n", | |
"pediatric 0.121592 -0.442547 0.141838 0.077958 0.225415 \n", | |
"assistance -0.022181 -0.480358 0.408379 -0.095099 0.316031 \n", | |
"opportunity 0.282774 -0.243165 0.424377 -0.482240 0.333609 \n", | |
"print 0.034694 -0.088721 -0.066744 -0.448654 -0.380830 \n", | |
"communication -0.255660 -0.603447 0.375056 -0.396814 0.228875 \n", | |
"registration -0.123760 -0.441877 0.222860 0.245775 -0.002378 \n", | |
"\n", | |
" ... 90 91 92 93 \\\n", | |
"and ... 0.096713 -0.085560 0.658379 -0.261223 \n", | |
"the ... -0.044964 -0.344766 0.553132 -0.360908 \n", | |
"fda ... -0.087527 -0.177071 0.737522 -0.076948 \n", | |
"to ... 0.022846 -0.148466 0.072798 -0.228918 \n", | |
"of ... -0.155064 -0.245643 0.417786 -0.173233 \n", | |
"for ... -0.319383 0.010611 0.213731 0.004790 \n", | |
"be ... -0.382407 -0.409366 0.301657 -0.129887 \n", | |
"a ... 0.057271 -0.366956 0.507431 -0.140168 \n", | |
"in ... -0.518270 0.122086 0.506006 -0.362484 \n", | |
"information ... 0.128690 -0.172287 0.063882 0.148540 \n", | |
"on ... -0.127808 -0.369725 0.878684 -0.102216 \n", | |
"drug ... -0.241992 -0.422613 -0.096184 -0.014281 \n", | |
"or ... -0.236732 -0.054929 0.108159 0.056314 \n", | |
"food ... 0.473187 -0.202776 0.423708 0.389340 \n", | |
"safety ... 0.291371 0.527957 0.115536 -0.268350 \n", | |
"product ... -0.644613 -0.294260 0.122726 -0.413096 \n", | |
"that ... -0.469595 -0.112579 0.075520 -0.171400 \n", | |
"with ... -0.078067 0.235780 -0.418466 -0.138317 \n", | |
"amp ... -0.468903 0.374384 0.595388 0.146841 \n", | |
"about ... -0.380960 -0.294769 0.287081 0.035232 \n", | |
"report ... 0.386709 0.627971 0.437185 -0.328031 \n", | |
"by ... 0.161771 -0.400323 0.641776 -0.247879 \n", | |
"use ... 0.057442 0.054344 0.399492 0.155429 \n", | |
"health ... -0.356257 0.055684 -0.239780 -0.022977 \n", | |
"from ... -0.282554 0.349735 0.305131 0.021239 \n", | |
"have ... -0.459824 -0.281754 -0.019800 0.209448 \n", | |
"program ... 0.119695 -0.123484 0.443025 -0.173975 \n", | |
"include ... -0.357194 -0.250013 0.086160 -0.073455 \n", | |
"at ... 0.142590 0.143932 0.142247 -0.402303 \n", | |
"you ... -0.436992 -0.353025 0.228000 0.003869 \n", | |
"... ... ... ... ... ... \n", | |
"icsrs ... -0.192464 0.171606 0.282802 0.064755 \n", | |
"globalization ... 0.035500 -0.218479 0.272757 -0.001142 \n", | |
"number_of ... -0.505880 -0.347972 0.558945 -0.242274 \n", | |
"colorectal_cancer ... -0.138576 -0.715155 -0.281504 -0.004626 \n", | |
"forms ... 0.337017 0.195908 0.205079 -0.332613 \n", | |
"history ... -0.355341 -0.190357 0.106477 -0.024125 \n", | |
"keep ... -0.486789 0.348746 0.011119 -0.118036 \n", | |
"2013 ... -0.537258 -0.044758 0.682783 -0.247158 \n", | |
"international_programs ... 0.057390 -0.171511 -0.008584 -0.142159 \n", | |
"what_'_new ... 0.056626 0.079707 0.237721 -0.176131 \n", | |
"control ... -0.532837 0.247479 0.456738 -0.616877 \n", | |
"fdasia ... -0.208896 -0.440354 -0.060221 -0.166022 \n", | |
"i ... 0.299877 0.097668 0.424918 -0.090243 \n", | |
"cell ... -0.637283 0.042374 0.212400 -0.055624 \n", | |
"base ... -0.509152 0.127183 -0.063938 -0.237319 \n", | |
"sponsor ... -0.502186 -0.234967 0.225882 -0.227946 \n", | |
"manufacturer ... 0.154475 0.140800 0.079055 -0.026018 \n", | |
"after ... -0.297644 0.174359 0.395813 0.184923 \n", | |
"radiation_emitting_products ... -0.435348 0.277482 -0.301940 -0.322466 \n", | |
"reports ... -0.016510 0.224201 0.714018 -0.067980 \n", | |
"propose ... 0.026771 0.273053 0.969158 -0.517199 \n", | |
"state ... 0.195559 -0.146882 0.125041 0.036993 \n", | |
"good ... -0.223398 -0.090410 0.192321 -0.209002 \n", | |
"instruction ... -0.005201 0.224316 0.183818 0.411965 \n", | |
"pediatric ... -0.327372 0.183654 0.329552 -0.039640 \n", | |
"assistance ... 0.201947 -0.032633 -0.299168 -0.756430 \n", | |
"opportunity ... -0.065741 -0.501121 0.519100 -0.054072 \n", | |
"print ... -0.095950 -0.211475 0.155255 0.350879 \n", | |
"communication ... 0.200097 0.108296 0.123390 -0.680676 \n", | |
"registration ... 0.212378 -0.000226 -0.199470 -0.170622 \n", | |
"\n", | |
" 94 95 96 97 98 \\\n", | |
"and -0.071542 -0.181564 -0.014139 -0.078411 -0.020709 \n", | |
"the -0.294987 0.203086 -0.141143 -0.366750 -0.245072 \n", | |
"fda -0.137575 -0.101029 -0.451010 0.197468 -0.017319 \n", | |
"to 0.020430 -0.064480 0.007564 -0.602445 -0.077429 \n", | |
"of 0.190563 -0.218247 -0.337335 -0.130735 0.017617 \n", | |
"for -0.263443 -0.361939 -0.567995 -0.588227 -0.011243 \n", | |
"be -0.080969 0.249293 0.010820 -0.530850 -0.094270 \n", | |
"a -0.306811 0.294925 -0.060049 -0.188385 -0.162945 \n", | |
"in -0.330909 -0.322330 -0.178481 -0.121798 0.535190 \n", | |
"information -0.184837 -0.053577 0.385220 -0.080125 -0.290900 \n", | |
"on -0.069740 -0.369469 -0.197006 -0.157897 0.202324 \n", | |
"drug -0.358708 0.127414 0.215810 -0.090970 -0.330258 \n", | |
"or -0.030552 0.006140 -0.018091 0.140386 -0.191139 \n", | |
"food -0.423926 -0.001822 -0.120035 -0.066002 0.069482 \n", | |
"safety -0.239712 -0.113702 -0.277039 -0.497297 -0.129665 \n", | |
"product 0.482507 0.260640 -0.114928 0.113158 -0.188181 \n", | |
"that -0.008943 0.028535 0.409268 -0.269048 -0.106427 \n", | |
"with -0.423310 -0.569176 0.115315 -0.749452 -0.364128 \n", | |
"amp 0.059823 -0.967745 -0.528763 -0.331435 0.111991 \n", | |
"about 0.108893 -0.547401 -0.028685 -0.301636 0.081251 \n", | |
"report -0.653785 -0.237967 -0.187325 -0.336905 -0.112999 \n", | |
"by -0.109704 0.250926 -0.616060 -0.687903 -0.012988 \n", | |
"use -0.440814 -0.308274 -0.043453 -0.346402 -0.212835 \n", | |
"health 0.290967 0.433374 0.002770 -0.305807 0.059909 \n", | |
"from -0.092425 0.469081 0.114366 -0.325147 0.707807 \n", | |
"have 0.184108 -0.272827 -0.095660 -0.142429 -0.179014 \n", | |
"program -0.095587 0.225543 0.099020 -0.280379 -0.131907 \n", | |
"include 0.343971 0.777451 -0.029254 -0.032943 -0.092698 \n", | |
"at -0.283580 -0.227817 0.224417 -0.174887 0.204955 \n", | |
"you 0.044580 -0.016428 0.173895 0.191892 0.656802 \n", | |
"... ... ... ... ... ... \n", | |
"icsrs -0.483003 0.597319 0.138253 -0.216816 0.173946 \n", | |
"globalization -0.395881 -0.569329 0.261788 -0.411565 0.215041 \n", | |
"number_of -0.407655 0.265753 0.093912 -0.054711 0.028873 \n", | |
"colorectal_cancer 0.075382 -0.141210 0.241308 -0.086936 0.515117 \n", | |
"forms -0.465560 0.348318 -0.214522 0.061973 -0.548650 \n", | |
"history -0.065548 -0.226189 0.665346 -0.475630 0.857476 \n", | |
"keep -0.721557 -0.051965 -0.094726 -0.511012 -0.227344 \n", | |
"2013 -0.568316 -0.301633 -0.068568 -0.222043 0.071884 \n", | |
"international_programs -0.160608 -0.377406 -0.228351 -0.449024 0.464816 \n", | |
"what_'_new 0.144763 0.258459 -0.087715 -0.342799 -0.259102 \n", | |
"control -0.442985 -0.194895 -0.007769 -0.274928 0.194839 \n", | |
"fdasia 0.100170 1.181485 -0.529689 0.140017 -0.067644 \n", | |
"i -0.253589 0.198022 -0.582632 -0.021947 0.316409 \n", | |
"cell -0.466727 0.389587 0.333547 -0.050657 0.168267 \n", | |
"base -0.305357 0.554057 -0.025749 0.192353 -0.126196 \n", | |
"sponsor -0.552372 0.056744 0.063037 -0.426578 -0.430068 \n", | |
"manufacturer -0.528376 -0.084371 0.177761 -0.195612 0.082558 \n", | |
"after -0.560124 0.143060 -0.279353 0.086654 -0.203589 \n", | |
"radiation_emitting_products -0.066306 -0.245094 -0.696399 -0.589838 -0.056358 \n", | |
"reports 0.026771 0.385963 -0.262126 -0.142826 -0.066285 \n", | |
"propose 0.173537 -0.233968 -0.628791 -1.071669 0.045059 \n", | |
"state 0.238543 -0.068296 -0.066709 -0.224307 -0.036184 \n", | |
"good -0.249478 0.310825 -0.500189 -0.544082 -0.050430 \n", | |
"instruction -0.583193 -0.007658 -0.223261 -0.108533 -0.170367 \n", | |
"pediatric -0.260104 -0.096961 -0.070001 -0.088493 -0.082268 \n", | |
"assistance -0.081868 -0.429563 -0.711387 -0.083787 -0.266346 \n", | |
"opportunity -0.234008 -0.586764 -0.114451 -0.314732 0.258744 \n", | |
"print -0.153447 0.106156 -0.424197 -0.484203 0.269765 \n", | |
"communication -0.443950 0.020271 -0.292703 -0.072843 -0.289178 \n", | |
"registration 0.001819 0.271526 -0.584541 -0.156514 0.128360 \n", | |
"\n", | |
" 99 \n", | |
"and -0.003840 \n", | |
"the -0.123179 \n", | |
"fda 0.105784 \n", | |
"to -0.071003 \n", | |
"of -0.065531 \n", | |
"for 0.094405 \n", | |
"be -0.052313 \n", | |
"a -0.171217 \n", | |
"in -0.152664 \n", | |
"information -0.150123 \n", | |
"on -0.127013 \n", | |
"drug -0.244218 \n", | |
"or 0.004434 \n", | |
"food 0.010907 \n", | |
"safety 0.271672 \n", | |
"product -0.664799 \n", | |
"that 0.378756 \n", | |
"with 0.384155 \n", | |
"amp -0.294012 \n", | |
"about 0.108835 \n", | |
"report 0.312071 \n", | |
"by 0.731686 \n", | |
"use -0.432940 \n", | |
"health -0.072929 \n", | |
"from -0.181133 \n", | |
"have 0.024573 \n", | |
"program -0.017053 \n", | |
"include -0.255635 \n", | |
"at -0.350327 \n", | |
"you 0.121159 \n", | |
"... ... \n", | |
"icsrs 0.455001 \n", | |
"globalization -0.138835 \n", | |
"number_of 0.205457 \n", | |
"colorectal_cancer -0.573498 \n", | |
"forms -0.881712 \n", | |
"history -0.289213 \n", | |
"keep 0.279749 \n", | |
"2013 -0.268367 \n", | |
"international_programs 0.046855 \n", | |
"what_'_new -0.649807 \n", | |
"control 0.486809 \n", | |
"fdasia -0.222800 \n", | |
"i -0.328948 \n", | |
"cell -0.069596 \n", | |
"base 0.117023 \n", | |
"sponsor -0.762406 \n", | |
"manufacturer -0.293565 \n", | |
"after 0.021547 \n", | |
"radiation_emitting_products -0.468397 \n", | |
"reports -0.161707 \n", | |
"propose -0.019422 \n", | |
"state -0.192618 \n", | |
"good -0.113742 \n", | |
"instruction 0.090651 \n", | |
"pediatric -0.255420 \n", | |
"assistance 0.450779 \n", | |
"opportunity -0.287433 \n", | |
"print -0.559124 \n", | |
"communication 0.099642 \n", | |
"registration -0.626579 \n", | |
"\n", | |
"[342 rows x 100 columns]" | |
] | |
}, | |
"execution_count": 72, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# build a list of the terms, integer indices,\n", | |
"# and term counts from the food2vec model vocabulary\n", | |
"ordered_vocab = [(term, voc.index, voc.count)\n", | |
" for term, voc in desc2vec.wv.vocab.items()]\n", | |
"\n", | |
"# sort by the term counts, so the most common terms appear first\n", | |
"ordered_vocab = sorted(ordered_vocab, key=lambda term_index: -term_index[2])\n", | |
"\n", | |
"# unzip the terms, integer indices, and counts into separate lists\n", | |
"ordered_terms, term_indices, term_counts = zip(*ordered_vocab)\n", | |
"\n", | |
"# create a DataFrame with the food2vec vectors as data,\n", | |
"# and the terms as row labels\n", | |
"word_vectors = pd.DataFrame(desc2vec.wv.syn0[term_indices, :],\n", | |
" index=ordered_terms)\n", | |
"\n", | |
"word_vectors" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### So... what can we do with all these numbers?\n", | |
"The first thing we can use them for is to simply look up related words and phrases for a given term of interest." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def get_related_terms(token, topn=5):\n", | |
" \"\"\"\n", | |
" look up the topn most similar terms to token\n", | |
" and print them as a formatted list\n", | |
" \"\"\"\n", | |
"\n", | |
" for word, similarity in desc2vec.most_similar(positive=[token], topn=topn):\n", | |
"\n", | |
" print('{:20} {}'.format(word, round(similarity, 3)))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"###" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 74, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"international_programs 0.442\n", | |
"globalization 0.435\n", | |
"animal 0.396\n", | |
"u.s. 0.388\n", | |
"cosmetics 0.382\n" | |
] | |
} | |
], | |
"source": [ | |
"get_related_terms('health')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 75, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"cigarette 0.497\n", | |
"control 0.438\n", | |
"radiation 0.423\n", | |
"keep 0.406\n", | |
"video 0.404\n" | |
] | |
} | |
], | |
"source": [ | |
"get_related_terms('tobacco')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 76, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"cell 0.609\n", | |
"tissue 0.595\n", | |
"therapy 0.486\n", | |
"certain 0.472\n", | |
"biologics 0.461\n" | |
] | |
} | |
], | |
"source": [ | |
"get_related_terms('blood')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 77, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"control 0.532\n", | |
"medwatch 0.491\n", | |
"radiation_emitting_products 0.485\n", | |
"reports 0.476\n", | |
"inspection 0.468\n", | |
"electronic 0.443\n", | |
"assistance 0.433\n", | |
"submit 0.429\n", | |
"tobacco 0.423\n", | |
"icsrs 0.412\n", | |
"online 0.41\n", | |
"updates 0.408\n", | |
"response 0.406\n", | |
"regulatory_information 0.399\n", | |
"datum 0.394\n", | |
"international_programs 0.394\n", | |
"manufacturer 0.389\n", | |
"state 0.383\n", | |
"innovation 0.376\n", | |
"medical 0.375\n" | |
] | |
} | |
], | |
"source": [ | |
"get_related_terms('radiation', topn=20)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 78, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"treat 0.534\n", | |
"history 0.473\n", | |
"colorectal_cancer 0.467\n", | |
"treatment 0.437\n", | |
"clinical_trial 0.428\n", | |
"first 0.412\n", | |
"approve 0.411\n", | |
"certain 0.405\n", | |
"april 0.399\n", | |
"u.s. 0.393\n", | |
"late 0.39\n", | |
"risk 0.389\n", | |
"february 0.385\n", | |
"before 0.384\n", | |
"m.d. 0.38\n", | |
"national 0.376\n", | |
"opportunity 0.376\n", | |
"more_than 0.374\n", | |
"commissioner 0.374\n", | |
"test 0.371\n" | |
] | |
} | |
], | |
"source": [ | |
"get_related_terms('disease', topn=20)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Word algebra!\n", | |
"No self-respecting word2vec demo would be complete without a healthy dose of *word algebra*, also known as *analogy completion*.\n", | |
"\n", | |
"The core idea is that once words are represented as numerical vectors, you can do math with them. The mathematical procedure goes like this:\n", | |
"1. Provide a set of words or phrases that you'd like to add or subtract.\n", | |
"1. Look up the vectors that represent those terms in the word vector model.\n", | |
"1. Add and subtract those vectors to produce a new, combined vector.\n", | |
"1. Look up the most similar vector(s) to this new, combined vector via cosine similarity.\n", | |
"1. Return the word(s) associated with the similar vector(s)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 79, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def word_algebra(add=[], subtract=[], topn=1):\n", | |
" \"\"\"\n", | |
" combine the vectors associated with the words provided\n", | |
" in add= and subtract=, look up the topn most similar\n", | |
" terms to the combined vector, and print the result(s)\n", | |
" \"\"\"\n", | |
" answers = desc2vec.most_similar(positive=add, negative=subtract, topn=topn)\n", | |
" \n", | |
" for term, similarity in answers:\n", | |
" print(term)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 80, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"reports\n" | |
] | |
} | |
], | |
"source": [ | |
"word_algebra(add=['devices', 'radiation'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 81, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"clinical\n" | |
] | |
} | |
], | |
"source": [ | |
"word_algebra(add=['new', 'drug'], subtract=['animal'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we're getting a bit more nuanced. The model has discovered that:\n", | |
"- Both *lunch* and *dinner* are meals\n", | |
"- The main difference between them is time of day\n", | |
"- Day and night are times of day\n", | |
"- Lunch is associated with day, and dinner is associated with night\n", | |
"\n", | |
"What else?" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 82, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"cigarette\n" | |
] | |
} | |
], | |
"source": [ | |
"word_algebra(add=['product', 'tobacco'], subtract=['drug'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Word Vector Visualization with t-SNE" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"[t-Distributed Stochastic Neighbor Embedding](https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf), or *t-SNE* for short, is a dimensionality reduction technique to assist with visualizing high-dimensional datasets. It attempts to map high-dimensional data onto a low two- or three-dimensional representation such that the relative distances between points are preserved as closely as possible in both high-dimensional and low-dimensional space.\n", | |
"\n", | |
"scikit-learn provides a convenient implementation of the t-SNE algorithm with its [TSNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) class." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 83, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.manifold import TSNE" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Our input for t-SNE will be the DataFrame of word vectors we created before. Let's first:\n", | |
"1. Drop stopwords — it's probably not too interesting to visualize *the*, *of*, *or*, and so on\n", | |
"1. Take only the 5,000 most frequent terms in the vocabulary — no need to visualize all ~50,000 terms right now." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 84, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"tsne_input = word_vectors.drop(spacy.en.stop_words.STOP_WORDS, errors='ignore')\n", | |
"tsne_input = tsne_input.head(1000)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 85, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>90</th>\n", | |
" <th>91</th>\n", | |
" <th>92</th>\n", | |
" <th>93</th>\n", | |
" <th>94</th>\n", | |
" <th>95</th>\n", | |
" <th>96</th>\n", | |
" <th>97</th>\n", | |
" <th>98</th>\n", | |
" <th>99</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>fda</th>\n", | |
" <td>0.077440</td>\n", | |
" <td>-0.207896</td>\n", | |
" <td>0.269897</td>\n", | |
" <td>0.162306</td>\n", | |
" <td>-0.050462</td>\n", | |
" <td>0.209877</td>\n", | |
" <td>-0.053269</td>\n", | |
" <td>0.277357</td>\n", | |
" <td>-0.344069</td>\n", | |
" <td>-0.206898</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.087527</td>\n", | |
" <td>-0.177071</td>\n", | |
" <td>0.737522</td>\n", | |
" <td>-0.076948</td>\n", | |
" <td>-0.137575</td>\n", | |
" <td>-0.101029</td>\n", | |
" <td>-0.451010</td>\n", | |
" <td>0.197468</td>\n", | |
" <td>-0.017319</td>\n", | |
" <td>0.105784</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>information</th>\n", | |
" <td>0.205563</td>\n", | |
" <td>0.389818</td>\n", | |
" <td>-0.072361</td>\n", | |
" <td>0.305935</td>\n", | |
" <td>-0.253689</td>\n", | |
" <td>0.419405</td>\n", | |
" <td>-0.259644</td>\n", | |
" <td>-0.377517</td>\n", | |
" <td>0.043917</td>\n", | |
" <td>-0.027382</td>\n", | |
" <td>...</td>\n", | |
" <td>0.128690</td>\n", | |
" <td>-0.172287</td>\n", | |
" <td>0.063882</td>\n", | |
" <td>0.148540</td>\n", | |
" <td>-0.184837</td>\n", | |
" <td>-0.053577</td>\n", | |
" <td>0.385220</td>\n", | |
" <td>-0.080125</td>\n", | |
" <td>-0.290900</td>\n", | |
" <td>-0.150123</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>drug</th>\n", | |
" <td>-0.360087</td>\n", | |
" <td>0.276967</td>\n", | |
" <td>-0.217438</td>\n", | |
" <td>-0.554882</td>\n", | |
" <td>0.135724</td>\n", | |
" <td>0.015508</td>\n", | |
" <td>-0.177810</td>\n", | |
" <td>0.301658</td>\n", | |
" <td>0.196800</td>\n", | |
" <td>0.180682</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.241992</td>\n", | |
" <td>-0.422613</td>\n", | |
" <td>-0.096184</td>\n", | |
" <td>-0.014281</td>\n", | |
" <td>-0.358708</td>\n", | |
" <td>0.127414</td>\n", | |
" <td>0.215810</td>\n", | |
" <td>-0.090970</td>\n", | |
" <td>-0.330258</td>\n", | |
" <td>-0.244218</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>food</th>\n", | |
" <td>0.397796</td>\n", | |
" <td>-0.316026</td>\n", | |
" <td>-0.421999</td>\n", | |
" <td>-0.214100</td>\n", | |
" <td>-0.304859</td>\n", | |
" <td>-0.145224</td>\n", | |
" <td>-0.686528</td>\n", | |
" <td>0.363681</td>\n", | |
" <td>0.023887</td>\n", | |
" <td>0.143341</td>\n", | |
" <td>...</td>\n", | |
" <td>0.473187</td>\n", | |
" <td>-0.202776</td>\n", | |
" <td>0.423708</td>\n", | |
" <td>0.389340</td>\n", | |
" <td>-0.423926</td>\n", | |
" <td>-0.001822</td>\n", | |
" <td>-0.120035</td>\n", | |
" <td>-0.066002</td>\n", | |
" <td>0.069482</td>\n", | |
" <td>0.010907</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>safety</th>\n", | |
" <td>0.487511</td>\n", | |
" <td>-0.051075</td>\n", | |
" <td>-0.222669</td>\n", | |
" <td>-0.172921</td>\n", | |
" <td>0.134228</td>\n", | |
" <td>-0.374469</td>\n", | |
" <td>0.048707</td>\n", | |
" <td>0.131852</td>\n", | |
" <td>-0.418310</td>\n", | |
" <td>-0.020593</td>\n", | |
" <td>...</td>\n", | |
" <td>0.291371</td>\n", | |
" <td>0.527957</td>\n", | |
" <td>0.115536</td>\n", | |
" <td>-0.268350</td>\n", | |
" <td>-0.239712</td>\n", | |
" <td>-0.113702</td>\n", | |
" <td>-0.277039</td>\n", | |
" <td>-0.497297</td>\n", | |
" <td>-0.129665</td>\n", | |
" <td>0.271672</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows × 100 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 4 5 \\\n", | |
"fda 0.077440 -0.207896 0.269897 0.162306 -0.050462 0.209877 \n", | |
"information 0.205563 0.389818 -0.072361 0.305935 -0.253689 0.419405 \n", | |
"drug -0.360087 0.276967 -0.217438 -0.554882 0.135724 0.015508 \n", | |
"food 0.397796 -0.316026 -0.421999 -0.214100 -0.304859 -0.145224 \n", | |
"safety 0.487511 -0.051075 -0.222669 -0.172921 0.134228 -0.374469 \n", | |
"\n", | |
" 6 7 8 9 ... 90 \\\n", | |
"fda -0.053269 0.277357 -0.344069 -0.206898 ... -0.087527 \n", | |
"information -0.259644 -0.377517 0.043917 -0.027382 ... 0.128690 \n", | |
"drug -0.177810 0.301658 0.196800 0.180682 ... -0.241992 \n", | |
"food -0.686528 0.363681 0.023887 0.143341 ... 0.473187 \n", | |
"safety 0.048707 0.131852 -0.418310 -0.020593 ... 0.291371 \n", | |
"\n", | |
" 91 92 93 94 95 96 \\\n", | |
"fda -0.177071 0.737522 -0.076948 -0.137575 -0.101029 -0.451010 \n", | |
"information -0.172287 0.063882 0.148540 -0.184837 -0.053577 0.385220 \n", | |
"drug -0.422613 -0.096184 -0.014281 -0.358708 0.127414 0.215810 \n", | |
"food -0.202776 0.423708 0.389340 -0.423926 -0.001822 -0.120035 \n", | |
"safety 0.527957 0.115536 -0.268350 -0.239712 -0.113702 -0.277039 \n", | |
"\n", | |
" 97 98 99 \n", | |
"fda 0.197468 -0.017319 0.105784 \n", | |
"information -0.080125 -0.290900 -0.150123 \n", | |
"drug -0.090970 -0.330258 -0.244218 \n", | |
"food -0.066002 0.069482 0.010907 \n", | |
"safety -0.497297 -0.129665 0.271672 \n", | |
"\n", | |
"[5 rows x 100 columns]" | |
] | |
}, | |
"execution_count": 85, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"tsne_input.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 86, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"tsne_filepath = os.path.join(intermediate_directory,\n", | |
" 'tsne_model')\n", | |
"\n", | |
"tsne_vectors_filepath = os.path.join(intermediate_directory,\n", | |
" 'tsne_vectors.npy')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 87, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Wall time: 2.99 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"\n", | |
"if 1 == 1:\n", | |
" \n", | |
" tsne = TSNE()\n", | |
" tsne_vectors = tsne.fit_transform(tsne_input.values)\n", | |
" \n", | |
" with open(tsne_filepath, 'wb') as f:\n", | |
" pickle.dump(tsne, f)\n", | |
"\n", | |
" pd.np.save(tsne_vectors_filepath, tsne_vectors)\n", | |
" \n", | |
"with open(tsne_filepath, \"rb\") as f:\n", | |
" tsne = pickle.load(f)\n", | |
" \n", | |
"tsne_vectors = pd.np.load(tsne_vectors_filepath)\n", | |
"\n", | |
"tsne_vectors = pd.DataFrame(tsne_vectors,\n", | |
" index=pd.Index(tsne_input.index),\n", | |
" columns=['x_coord', 'y_coord'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we have a two-dimensional representation of our data! Let's take a look." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 88, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>x_coord</th>\n", | |
" <th>y_coord</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>fda</th>\n", | |
" <td>36.795262</td>\n", | |
" <td>-2.148801</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>information</th>\n", | |
" <td>-0.608412</td>\n", | |
" <td>12.562070</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>drug</th>\n", | |
" <td>20.612260</td>\n", | |
" <td>-19.267174</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>food</th>\n", | |
" <td>6.174195</td>\n", | |
" <td>-2.198315</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>safety</th>\n", | |
" <td>11.311978</td>\n", | |
" <td>-6.762396</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" x_coord y_coord\n", | |
"fda 36.795262 -2.148801\n", | |
"information -0.608412 12.562070\n", | |
"drug 20.612260 -19.267174\n", | |
"food 6.174195 -2.198315\n", | |
"safety 11.311978 -6.762396" | |
] | |
}, | |
"execution_count": 88, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"tsne_vectors.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 89, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"tsne_vectors['word'] = tsne_vectors.index" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Plotting with Bokeh" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 90, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\aashis_tiwari\\AppData\\Local\\Continuum\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\bokeh\\core\\json_encoder.py:52: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future\n", | |
" NP_EPOCH = np.datetime64('1970-01-01T00:00:00Z')\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"\n", | |
" <div class=\"bk-root\">\n", | |
" <a href=\"http://bokeh.pydata.org\" target=\"_blank\" class=\"bk-logo bk-logo-small bk-logo-notebook\"></a>\n", | |
" <span id=\"9ff52526-b461-486b-b753-f6bc811d08ed\">Loading BokehJS ...</span>\n", | |
" </div>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"application/javascript": [ | |
"\n", | |
"(function(global) {\n", | |
" function now() {\n", | |
" return new Date();\n", | |
" }\n", | |
"\n", | |
" var force = true;\n", | |
"\n", | |
" if (typeof (window._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n", | |
" window._bokeh_onload_callbacks = [];\n", | |
" window._bokeh_is_loading = undefined;\n", | |
" }\n", | |
"\n", | |
"\n", | |
" \n", | |
" if (typeof (window._bokeh_timeout) === \"undefined\" || force === true) {\n", | |
" window._bokeh_timeout = Date.now() + 5000;\n", | |
" window._bokeh_failed_load = false;\n", | |
" }\n", | |
"\n", | |
" var NB_LOAD_WARNING = {'data': {'text/html':\n", | |
" \"<div style='background-color: #fdd'>\\n\"+\n", | |
" \"<p>\\n\"+\n", | |
" \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", | |
" \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", | |
" \"</p>\\n\"+\n", | |
" \"<ul>\\n\"+\n", | |
" \"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\\n\"+\n", | |
" \"<li>use INLINE resources instead, as so:</li>\\n\"+\n", | |
" \"</ul>\\n\"+\n", | |
" \"<code>\\n\"+\n", | |
" \"from bokeh.resources import INLINE\\n\"+\n", | |
" \"output_notebook(resources=INLINE)\\n\"+\n", | |
" \"</code>\\n\"+\n", | |
" \"</div>\"}};\n", | |
"\n", | |
" function display_loaded() {\n", | |
" if (window.Bokeh !== undefined) {\n", | |
" document.getElementById(\"9ff52526-b461-486b-b753-f6bc811d08ed\").textContent = \"BokehJS successfully loaded.\";\n", | |
" } else if (Date.now() < window._bokeh_timeout) {\n", | |
" setTimeout(display_loaded, 100)\n", | |
" }\n", | |
" }\n", | |
"\n", | |
" function run_callbacks() {\n", | |
" window._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", | |
" delete window._bokeh_onload_callbacks\n", | |
" console.info(\"Bokeh: all callbacks have finished\");\n", | |
" }\n", | |
"\n", | |
" function load_libs(js_urls, callback) {\n", | |
" window._bokeh_onload_callbacks.push(callback);\n", | |
" if (window._bokeh_is_loading > 0) {\n", | |
" console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", | |
" return null;\n", | |
" }\n", | |
" if (js_urls == null || js_urls.length === 0) {\n", | |
" run_callbacks();\n", | |
" return null;\n", | |
" }\n", | |
" console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", | |
" window._bokeh_is_loading = js_urls.length;\n", | |
" for (var i = 0; i < js_urls.length; i++) {\n", | |
" var url = js_urls[i];\n", | |
" var s = document.createElement('script');\n", | |
" s.src = url;\n", | |
" s.async = false;\n", | |
" s.onreadystatechange = s.onload = function() {\n", | |
" window._bokeh_is_loading--;\n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: all BokehJS libraries loaded\");\n", | |
" run_callbacks()\n", | |
" }\n", | |
" };\n", | |
" s.onerror = function() {\n", | |
" console.warn(\"failed to load library \" + url);\n", | |
" };\n", | |
" console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", | |
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n", | |
" }\n", | |
" };var element = document.getElementById(\"9ff52526-b461-486b-b753-f6bc811d08ed\");\n", | |
" if (element == null) {\n", | |
" console.log(\"Bokeh: ERROR: autoload.js configured with elementid '9ff52526-b461-486b-b753-f6bc811d08ed' but no matching script tag was found. \")\n", | |
" return false;\n", | |
" }\n", | |
"\n", | |
" var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.12.4.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.4.min.js\"];\n", | |
"\n", | |
" var inline_js = [\n", | |
" function(Bokeh) {\n", | |
" Bokeh.set_log_level(\"info\");\n", | |
" },\n", | |
" \n", | |
" function(Bokeh) {\n", | |
" \n", | |
" document.getElementById(\"9ff52526-b461-486b-b753-f6bc811d08ed\").textContent = \"BokehJS is loading...\";\n", | |
" },\n", | |
" function(Bokeh) {\n", | |
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.12.4.min.css\");\n", | |
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.12.4.min.css\");\n", | |
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.4.min.css\");\n", | |
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.4.min.css\");\n", | |
" }\n", | |
" ];\n", | |
"\n", | |
" function run_inline_js() {\n", | |
" \n", | |
" if ((window.Bokeh !== undefined) || (force === true)) {\n", | |
" for (var i = 0; i < inline_js.length; i++) {\n", | |
" inline_js[i](window.Bokeh);\n", | |
" }if (force === true) {\n", | |
" display_loaded();\n", | |
" }} else if (Date.now() < window._bokeh_timeout) {\n", | |
" setTimeout(run_inline_js, 100);\n", | |
" } else if (!window._bokeh_failed_load) {\n", | |
" console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", | |
" window._bokeh_failed_load = true;\n", | |
" } else if (force !== true) {\n", | |
" var cell = $(document.getElementById(\"9ff52526-b461-486b-b753-f6bc811d08ed\")).parents('.cell').data().cell;\n", | |
" cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", | |
" }\n", | |
"\n", | |
" }\n", | |
"\n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", | |
" run_inline_js();\n", | |
" } else {\n", | |
" load_libs(js_urls, function() {\n", | |
" console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", | |
" run_inline_js();\n", | |
" });\n", | |
" }\n", | |
"}(this));" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"from bokeh.plotting import figure, show, output_notebook\n", | |
"from bokeh.models import HoverTool, ColumnDataSource, value\n", | |
"\n", | |
"output_notebook()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 91, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"\n", | |
"\n", | |
" <div class=\"bk-root\">\n", | |
" <div class=\"bk-plotdiv\" id=\"fdf50d06-d538-436e-a838-bf1520a822ad\"></div>\n", | |
" </div>\n", | |
"<script type=\"text/javascript\">\n", | |
" \n", | |
" (function(global) {\n", | |
" function now() {\n", | |
" return new Date();\n", | |
" }\n", | |
" \n", | |
" var force = false;\n", | |
" \n", | |
" if (typeof (window._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n", | |
" window._bokeh_onload_callbacks = [];\n", | |
" window._bokeh_is_loading = undefined;\n", | |
" }\n", | |
" \n", | |
" \n", | |
" \n", | |
" if (typeof (window._bokeh_timeout) === \"undefined\" || force === true) {\n", | |
" window._bokeh_timeout = Date.now() + 0;\n", | |
" window._bokeh_failed_load = false;\n", | |
" }\n", | |
" \n", | |
" var NB_LOAD_WARNING = {'data': {'text/html':\n", | |
" \"<div style='background-color: #fdd'>\\n\"+\n", | |
" \"<p>\\n\"+\n", | |
" \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", | |
" \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", | |
" \"</p>\\n\"+\n", | |
" \"<ul>\\n\"+\n", | |
" \"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\\n\"+\n", | |
" \"<li>use INLINE resources instead, as so:</li>\\n\"+\n", | |
" \"</ul>\\n\"+\n", | |
" \"<code>\\n\"+\n", | |
" \"from bokeh.resources import INLINE\\n\"+\n", | |
" \"output_notebook(resources=INLINE)\\n\"+\n", | |
" \"</code>\\n\"+\n", | |
" \"</div>\"}};\n", | |
" \n", | |
" function display_loaded() {\n", | |
" if (window.Bokeh !== undefined) {\n", | |
" document.getElementById(\"fdf50d06-d538-436e-a838-bf1520a822ad\").textContent = \"BokehJS successfully loaded.\";\n", | |
" } else if (Date.now() < window._bokeh_timeout) {\n", | |
" setTimeout(display_loaded, 100)\n", | |
" }\n", | |
" }\n", | |
" \n", | |
" function run_callbacks() {\n", | |
" window._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", | |
" delete window._bokeh_onload_callbacks\n", | |
" console.info(\"Bokeh: all callbacks have finished\");\n", | |
" }\n", | |
" \n", | |
" function load_libs(js_urls, callback) {\n", | |
" window._bokeh_onload_callbacks.push(callback);\n", | |
" if (window._bokeh_is_loading > 0) {\n", | |
" console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", | |
" return null;\n", | |
" }\n", | |
" if (js_urls == null || js_urls.length === 0) {\n", | |
" run_callbacks();\n", | |
" return null;\n", | |
" }\n", | |
" console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", | |
" window._bokeh_is_loading = js_urls.length;\n", | |
" for (var i = 0; i < js_urls.length; i++) {\n", | |
" var url = js_urls[i];\n", | |
" var s = document.createElement('script');\n", | |
" s.src = url;\n", | |
" s.async = false;\n", | |
" s.onreadystatechange = s.onload = function() {\n", | |
" window._bokeh_is_loading--;\n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: all BokehJS libraries loaded\");\n", | |
" run_callbacks()\n", | |
" }\n", | |
" };\n", | |
" s.onerror = function() {\n", | |
" console.warn(\"failed to load library \" + url);\n", | |
" };\n", | |
" console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", | |
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n", | |
" }\n", | |
" };var element = document.getElementById(\"fdf50d06-d538-436e-a838-bf1520a822ad\");\n", | |
" if (element == null) {\n", | |
" console.log(\"Bokeh: ERROR: autoload.js configured with elementid 'fdf50d06-d538-436e-a838-bf1520a822ad' but no matching script tag was found. \")\n", | |
" return false;\n", | |
" }\n", | |
" \n", | |
" var js_urls = [];\n", | |
" \n", | |
" var inline_js = [\n", | |
" function(Bokeh) {\n", | |
" (function() {\n", | |
" var fn = function() {\n", | |
" var docs_json = {\"9d1b0cc3-8343-4641-a1f2-e22b2ae1751f\":{\"roots\":{\"references\":[{\"attributes\":{\"callback\":null},\"id\":\"0d41bcdd-8813-4c16-9a8e-bbeceb16a61b\",\"type\":\"DataRange1d\"},{\"attributes\":{\"callback\":null,\"overlay\":{\"id\":\"70a07a0c-cb68-4b32-a878-fb51d406f2db\",\"type\":\"BoxAnnotation\"},\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"renderers\":[{\"id\":\"70b6ed1b-41f8-4906-a8dc-ce38bb55e79f\",\"type\":\"GlyphRenderer\"}]},\"id\":\"2961d740-e3ed-4775-a544-b0efc460156e\",\"type\":\"BoxSelectTool\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_scroll\":{\"id\":\"71c8158f-21a0-4733-b509-32f386ee6ec2\",\"type\":\"WheelZoomTool\"},\"active_tap\":\"auto\",\"tools\":[{\"id\":\"bb20ca5d-fb46-42c6-a87e-a98761f3c786\",\"type\":\"PanTool\"},{\"id\":\"71c8158f-21a0-4733-b509-32f386ee6ec2\",\"type\":\"WheelZoomTool\"},{\"id\":\"d66b2a0a-646b-48b0-b453-92d1f0cadc4f\",\"type\":\"BoxZoomTool\"},{\"id\":\"2961d740-e3ed-4775-a544-b0efc460156e\",\"type\":\"BoxSelectTool\"},{\"id\":\"f0771732-e3f0-4f96-8f4c-827b16642637\",\"type\":\"ResizeTool\"},{\"id\":\"8181250d-2c13-4228-8e2a-1bb0ffbf20c8\",\"type\":\"ResetTool\"},{\"id\":\"c21f70a8-e74c-40de-9115-52dc1ecf6d73\",\"type\":\"HoverTool\"}]},\"id\":\"8d7423a4-48c0-4a0c-9907-e1d9da010a92\",\"type\":\"Toolbar\"},{\"attributes\":{\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"}},\"id\":\"71c8158f-21a0-4733-b509-32f386ee6ec2\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"54f7e5cb-9f7b-4823-9a5b-14a56476cf82\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"plot\":null,\"text\":\"t-SNE Word Embeddings\",\"text_font_size\":{\"value\":\"16pt\"}},\"id\":\"63dff529-5163-4eab-8df3-43845cb8b054\",\"type\":\"Title\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"size\":{\"units\":\"screen\",\"value\":10},\"x\":{\"field\":\"x_coord\"},\"y\":{\"field\":\"y_coord\"}},\"id\":\"ffce2547-d244-47b1-8d36-8c6697ccd1b1\",\"type\":\"Circle\"},{\"attributes\":{\"grid_line_color\":{\"value\":null},\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"01f0da90-dedb-44a9-ae7b-fdb498bfd1cb\",\"type\":\"BasicTicker\"}},\"id\":\"2bf2a076-45b8-4084-aa5f-a68ec2a41ec8\",\"type\":\"Grid\"},{\"attributes\":{\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"}},\"id\":\"f0771732-e3f0-4f96-8f4c-827b16642637\",\"type\":\"ResizeTool\"},{\"attributes\":{\"data_source\":{\"id\":\"81d3887f-b2cf-412b-bf9f-365d8dbf8ccc\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1c0cd207-1401-482d-b3dc-bfaa6f569a92\",\"type\":\"Circle\"},\"hover_glyph\":{\"id\":\"ffce2547-d244-47b1-8d36-8c6697ccd1b1\",\"type\":\"Circle\"},\"nonselection_glyph\":{\"id\":\"42b6555f-aa5c-46de-a99b-528d27186838\",\"type\":\"Circle\"},\"selection_glyph\":null},\"id\":\"70b6ed1b-41f8-4906-a8dc-ce38bb55e79f\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"01f0da90-dedb-44a9-ae7b-fdb498bfd1cb\",\"type\":\"BasicTicker\"},{\"attributes\":{\"formatter\":{\"id\":\"7f324ff7-68e2-4aba-9d3b-c76d368e0b56\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"01f0da90-dedb-44a9-ae7b-fdb498bfd1cb\",\"type\":\"BasicTicker\"},\"visible\":false},\"id\":\"475cafa7-12ca-4194-990c-4f184d896b02\",\"type\":\"LinearAxis\"},{\"attributes\":{\"formatter\":{\"id\":\"54f7e5cb-9f7b-4823-9a5b-14a56476cf82\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"f46513e2-5b59-467e-9e42-6898a83d0b97\",\"type\":\"BasicTicker\"},\"visible\":false},\"id\":\"1a7129a4-bbd7-451a-844c-b789d77dc809\",\"type\":\"LinearAxis\"},{\"attributes\":{\"overlay\":{\"id\":\"4c2dbec2-bd03-44dd-9c3d-e39972c3b5ec\",\"type\":\"BoxAnnotation\"},\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"}},\"id\":\"d66b2a0a-646b-48b0-b453-92d1f0cadc4f\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"}},\"id\":\"bb20ca5d-fb46-42c6-a87e-a98761f3c786\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"ad76bc53-8d30-46a8-8317-e10575affcc1\",\"type\":\"ToolEvents\"},{\"attributes\":{\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"}},\"id\":\"8181250d-2c13-4228-8e2a-1bb0ffbf20c8\",\"type\":\"ResetTool\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"70a07a0c-cb68-4b32-a878-fb51d406f2db\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"below\":[{\"id\":\"475cafa7-12ca-4194-990c-4f184d896b02\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"1a7129a4-bbd7-451a-844c-b789d77dc809\",\"type\":\"LinearAxis\"}],\"outline_line_color\":{\"value\":null},\"plot_height\":800,\"plot_width\":800,\"renderers\":[{\"id\":\"475cafa7-12ca-4194-990c-4f184d896b02\",\"type\":\"LinearAxis\"},{\"id\":\"2bf2a076-45b8-4084-aa5f-a68ec2a41ec8\",\"type\":\"Grid\"},{\"id\":\"1a7129a4-bbd7-451a-844c-b789d77dc809\",\"type\":\"LinearAxis\"},{\"id\":\"9c536907-1df7-45e6-aabb-ab1678ef92de\",\"type\":\"Grid\"},{\"id\":\"4c2dbec2-bd03-44dd-9c3d-e39972c3b5ec\",\"type\":\"BoxAnnotation\"},{\"id\":\"70a07a0c-cb68-4b32-a878-fb51d406f2db\",\"type\":\"BoxAnnotation\"},{\"id\":\"70b6ed1b-41f8-4906-a8dc-ce38bb55e79f\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"63dff529-5163-4eab-8df3-43845cb8b054\",\"type\":\"Title\"},\"tool_events\":{\"id\":\"ad76bc53-8d30-46a8-8317-e10575affcc1\",\"type\":\"ToolEvents\"},\"toolbar\":{\"id\":\"8d7423a4-48c0-4a0c-9907-e1d9da010a92\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"b6ce2da3-200d-4210-bd7e-7bef4237a285\",\"type\":\"DataRange1d\"},\"y_range\":{\"id\":\"0d41bcdd-8813-4c16-9a8e-bbeceb16a61b\",\"type\":\"DataRange1d\"}},\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"f46513e2-5b59-467e-9e42-6898a83d0b97\",\"type\":\"BasicTicker\"},{\"attributes\":{\"callback\":null,\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"tooltips\":\"@word\"},\"id\":\"c21f70a8-e74c-40de-9115-52dc1ecf6d73\",\"type\":\"HoverTool\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"blue\"},\"line_alpha\":{\"value\":0.2},\"line_color\":{\"value\":\"blue\"},\"size\":{\"units\":\"screen\",\"value\":10},\"x\":{\"field\":\"x_coord\"},\"y\":{\"field\":\"y_coord\"}},\"id\":\"1c0cd207-1401-482d-b3dc-bfaa6f569a92\",\"type\":\"Circle\"},{\"attributes\":{\"dimension\":1,\"grid_line_color\":{\"value\":null},\"plot\":{\"id\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"f46513e2-5b59-467e-9e42-6898a83d0b97\",\"type\":\"BasicTicker\"}},\"id\":\"9c536907-1df7-45e6-aabb-ab1678ef92de\",\"type\":\"Grid\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"size\":{\"units\":\"screen\",\"value\":10},\"x\":{\"field\":\"x_coord\"},\"y\":{\"field\":\"y_coord\"}},\"id\":\"42b6555f-aa5c-46de-a99b-528d27186838\",\"type\":\"Circle\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"4c2dbec2-bd03-44dd-9c3d-e39972c3b5ec\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"7f324ff7-68e2-4aba-9d3b-c76d368e0b56\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"callback\":null},\"id\":\"b6ce2da3-200d-4210-bd7e-7bef4237a285\",\"type\":\"DataRange1d\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"index\",\"word\",\"y_coord\",\"x_coord\"],\"data\":{\"index\":[\"fda\",\"information\",\"drug\",\"food\",\"safety\",\"product\",\"amp\",\"report\",\"use\",\"health\",\"program\",\"include\",\"office_of\",\"research\",\"recall\",\"consumer\",\"biologics\",\"patient\",\"public\",\"medical\",\"fda_'s\",\"regulatory\",\"approval\",\"products\",\"new\",\"guidance\",\"agency\",\"device\",\"submit\",\"page\",\"2017\",\"veterinary_cosmetics_tobacco_products\",\"regulation\",\"industry\",\"2016\",\"|\",\"cosmetic\",\"issue\",\"electronic\",\"human\",\"provide\",\"2014\",\"home\",\"document\",\"development\",\"news\",\"federal\",\"approve\",\"risk\",\"disease\",\"help\",\"section\",\"'s\",\"2015\",\"policy\",\"national\",\"update\",\"search\",\"blood\",\"emergency\",\"list\",\"science\",\"tobacco\",\"relate\",\"resource_for_you\",\"medical_device\",\"1\",\"pdf\",\"find\",\"act\",\"march\",\"how_to\",\"center_for\",\"page_last_updated\",\"more_in\",\"problem\",\"question\",\"comment\",\"animal\",\"international\",\"event\",\"accessibility\",\"if_you\",\"and_drug_administration\",\"law\",\"contact\",\"response\",\"treatment\",\"initiative\",\"drugs\",\"request\",\"meeting\",\"form\",\"kb\",\"view\",\"veterinary\",\"available\",\"medical_devices\",\"review\",\"time\",\"system\",\"office\",\"activity\",\"year\",\"organization\",\"tobacco_products\",\"regulatory_science\",\"know\",\"file\",\"2\",\"site\",\"fda_\\u2019s\",\"share_tweet_linkedin_pin\",\"need\",\"post\",\"market\",\"compliance\",\"for_industry\",\"public_health\",\"medwatch\",\"day\",\"petition\",\"datum\",\"technology\",\"therapy\",\"note\",\"address\",\"receive\",\"devices\",\"contact_fda\",\"test\",\"cdrh\",\"4\",\"child\",\"database\",\"email\",\"tissue\",\"dietary_supplement\",\"protect\",\"cancer\",\"medical_product\",\"vaccine\",\"website\",\"treat\",\"resources\",\"spotlight\",\"notice\",\"training\",\"safe\",\"rule\",\"official\",\"work\",\"conference\",\"clinical\",\"do_not\",\"february\",\"innovation\",\"require\",\"study\",\"complaint\",\"cholesterol\",\"access\",\"important\",\"video\",\"you_can\",\"people\",\"adverse_event\",\"web\",\"scientific\",\"management\",\"staff\",\"\\u2019s\",\"label\",\"clinical_trial\",\"cigarette\",\"standard\",\"requirement\",\"vaccines\",\"u.s.\",\"follow\",\"announcement\",\"april\",\"application\",\"way\",\"updates\",\"navigate_the\",\"service\",\"quality\",\"commissioner\",\"inspection\",\"share\",\"such_as\",\"guidance_document\",\"support\",\"related\",\"2012\",\"approvals\",\"online\",\"consumers\",\"federal_register\",\"final_rule\",\"process\",\"enforcement\",\"foreign\",\"format\",\"resource\",\"certain\",\"consumer_updates\",\"report_a_problem\",\"january\",\"labeling\",\"center\",\"evaluation\",\"cosmetics\",\"news_amp_event\",\"develop\",\"action\",\"contain\",\"submission\",\"medicine\",\"advisory_committee\",\"number\",\"3\",\"draft\",\"regulations\",\"radiation\",\"regulatory_information\",\"statin\",\"email_print\",\"budget\",\"late\",\"import\",\"link_to\",\"user_fee\",\"answer\",\"user\",\"m.d.\",\"radiological_health\",\"u.s._food\",\"standards\",\"change\",\"more_than\",\"foods\",\"administration\",\"potential\",\"date\",\"guide\",\"player\",\"programs\",\"icsrs\",\"globalization\",\"number_of\",\"colorectal_cancer\",\"forms\",\"history\",\"2013\",\"international_programs\",\"what_'_new\",\"control\",\"fdasia\",\"cell\",\"base\",\"sponsor\",\"manufacturer\",\"radiation_emitting_products\",\"reports\",\"propose\",\"state\",\"good\",\"instruction\",\"pediatric\",\"assistance\",\"opportunity\",\"print\",\"communication\",\"registration\"],\"word\":[\"fda\",\"information\",\"drug\",\"food\",\"safety\",\"product\",\"amp\",\"report\",\"use\",\"health\",\"program\",\"include\",\"office_of\",\"research\",\"recall\",\"consumer\",\"biologics\",\"patient\",\"public\",\"medical\",\"fda_'s\",\"regulatory\",\"approval\",\"products\",\"new\",\"guidance\",\"agency\",\"device\",\"submit\",\"page\",\"2017\",\"veterinary_cosmetics_tobacco_products\",\"regulation\",\"industry\",\"2016\",\"|\",\"cosmetic\",\"issue\",\"electronic\",\"human\",\"provide\",\"2014\",\"home\",\"document\",\"development\",\"news\",\"federal\",\"approve\",\"risk\",\"disease\",\"help\",\"section\",\"'s\",\"2015\",\"policy\",\"national\",\"update\",\"search\",\"blood\",\"emergency\",\"list\",\"science\",\"tobacco\",\"relate\",\"resource_for_you\",\"medical_device\",\"1\",\"pdf\",\"find\",\"act\",\"march\",\"how_to\",\"center_for\",\"page_last_updated\",\"more_in\",\"problem\",\"question\",\"comment\",\"animal\",\"international\",\"event\",\"accessibility\",\"if_you\",\"and_drug_administration\",\"law\",\"contact\",\"response\",\"treatment\",\"initiative\",\"drugs\",\"request\",\"meeting\",\"form\",\"kb\",\"view\",\"veterinary\",\"available\",\"medical_devices\",\"review\",\"time\",\"system\",\"office\",\"activity\",\"year\",\"organization\",\"tobacco_products\",\"regulatory_science\",\"know\",\"file\",\"2\",\"site\",\"fda_\\u2019s\",\"share_tweet_linkedin_pin\",\"need\",\"post\",\"market\",\"compliance\",\"for_industry\",\"public_health\",\"medwatch\",\"day\",\"petition\",\"datum\",\"technology\",\"therapy\",\"note\",\"address\",\"receive\",\"devices\",\"contact_fda\",\"test\",\"cdrh\",\"4\",\"child\",\"database\",\"email\",\"tissue\",\"dietary_supplement\",\"protect\",\"cancer\",\"medical_product\",\"vaccine\",\"website\",\"treat\",\"resources\",\"spotlight\",\"notice\",\"training\",\"safe\",\"rule\",\"official\",\"work\",\"conference\",\"clinical\",\"do_not\",\"february\",\"innovation\",\"require\",\"study\",\"complaint\",\"cholesterol\",\"access\",\"important\",\"video\",\"you_can\",\"people\",\"adverse_event\",\"web\",\"scientific\",\"management\",\"staff\",\"\\u2019s\",\"label\",\"clinical_trial\",\"cigarette\",\"standard\",\"requirement\",\"vaccines\",\"u.s.\",\"follow\",\"announcement\",\"april\",\"application\",\"way\",\"updates\",\"navigate_the\",\"service\",\"quality\",\"commissioner\",\"inspection\",\"share\",\"such_as\",\"guidance_document\",\"support\",\"related\",\"2012\",\"approvals\",\"online\",\"consumers\",\"federal_register\",\"final_rule\",\"process\",\"enforcement\",\"foreign\",\"format\",\"resource\",\"certain\",\"consumer_updates\",\"report_a_problem\",\"january\",\"labeling\",\"center\",\"evaluation\",\"cosmetics\",\"news_amp_event\",\"develop\",\"action\",\"contain\",\"submission\",\"medicine\",\"advisory_committee\",\"number\",\"3\",\"draft\",\"regulations\",\"radiation\",\"regulatory_information\",\"statin\",\"email_print\",\"budget\",\"late\",\"import\",\"link_to\",\"user_fee\",\"answer\",\"user\",\"m.d.\",\"radiological_health\",\"u.s._food\",\"standards\",\"change\",\"more_than\",\"foods\",\"administration\",\"potential\",\"date\",\"guide\",\"player\",\"programs\",\"icsrs\",\"globalization\",\"number_of\",\"colorectal_cancer\",\"forms\",\"history\",\"2013\",\"international_programs\",\"what_'_new\",\"control\",\"fdasia\",\"cell\",\"base\",\"sponsor\",\"manufacturer\",\"radiation_emitting_products\",\"reports\",\"propose\",\"state\",\"good\",\"instruction\",\"pediatric\",\"assistance\",\"opportunity\",\"print\",\"communication\",\"registration\"],\"x_coord\":[36.79526241672093,-0.6084115517077229,20.612259666559904,6.17419522431721,11.311978444902447,-17.220275706229987,3.9834154071186556,-37.40140925699767,52.558707253460774,-27.74570911952135,-7.940684835173961,39.296371336013415,-14.99338038202181,-82.19767557676084,-16.422826313613683,36.204959215359146,-35.84636974933975,-40.40793370640576,3.5984624929292575,20.664497699276904,135.11755260550328,-48.173443588995006,75.14017445223875,-7.149688680974811,18.643179290194855,8.836859656124393,-41.03000474649207,8.587656555231526,-28.34593474712203,-23.33556976720919,49.451127645009805,-89.0850367716734,-9.382012024272303,-5.775573923928772,24.775304899415186,-49.103023671945785,23.65556399698217,-41.647398279830355,29.83976856232998,-83.86223840914184,-26.493812909550645,35.46121977561761,71.90129181969115,-95.14480377425232,-24.418684380058878,-58.35001568024719,-4.547479057539734,65.08625537416745,-65.97256222467198,48.59962633787268,-22.270636116772028,0.6250052263114699,-54.77403887001563,32.474504972331864,-96.2528046773141,-47.16116834454645,27.15665529601955,-25.478139427719206,-101.95007543643555,-179.21197874998626,32.73824107057032,33.05600906526963,-84.18195350699368,-100.30635691110592,68.88298244354411,18.403112836155966,-27.210741501813477,-48.98561748119006,42.22038349172803,-66.72045788376728,-74.44793794708792,-15.966911344684274,-35.89056674787026,-7.83838609325336,46.5142769901522,-2.2083645857843526,-19.25432631309558,-63.63814820036086,79.9777856749852,84.4571343305333,107.22407452834311,-69.464881451059,51.400775150078346,-110.75991648340532,-36.321373836516145,7.734638458610831,-8.329002272376698,44.940907135399314,44.74035073371913,-159.75355827919188,-44.422855942269344,29.074753314915963,-46.683121946574765,-28.456907638743946,-11.438913047428523,35.456948125883585,38.892151108248726,74.41220365918537,37.635052017767535,33.925108779390996,40.752570032702664,-61.03371041573522,47.47482812322784,25.965190351747612,-66.89062396984669,11.891750552064984,22.576332851357858,34.36564337191908,-63.203104534578024,27.42748768740254,-128.35245779213628,-30.05730798880277,55.38149198001704,57.110007977598904,-6.270691946140692,-78.2582299516983,66.9734909461413,-68.49177639969369,-70.40191918923682,38.70730397040344,45.75215333213016,55.24996829030938,123.21723954221966,14.359001843314772,116.3646083986654,-4.5233757879567875,48.94292663008685,11.411223061589995,2.5133868802541013,-9.410015800821615,72.55475553652975,-60.059970871160886,30.31964551116014,-15.508443094169694,-150.8045600994725,90.4176509681034,140.78057018064914,-27.024815976242216,-41.8555187331145,-15.59937830336386,15.586900530042447,72.66702225261767,2.4442202222269214,60.85724576604183,25.82014218134471,65.3080031430386,-80.36869676097476,48.43589296833909,137.5282849149218,-49.412295255525464,13.480673686098106,-43.36547770699172,-7.283349313897143,-18.69830439803742,-38.82688399084915,52.543221538552814,47.82336282833334,5.348126312578858,-19.36051500640309,20.81803127828845,72.54750198813917,29.876749603569497,10.054294680003625,68.65813037744128,-72.61121480103573,-131.97023220082016,-71.60868639433438,-4.025725238453169,-38.68894220423148,-57.743592381366604,5.518849025417468,49.61384298642256,-74.44240399874606,4.248339643910993,36.96045476982145,27.607602753399036,-55.668334047989426,65.70054449693289,-37.810561554954646,-31.11493712487137,58.514434682472924,60.963674711096566,-26.848897064905724,-17.722501706500093,22.290029028593157,59.53541515159754,-88.21088577851306,77.2699462234936,-126.72163474334346,16.91723746366929,-26.57241568862486,48.67889110714831,-11.63113112960147,38.47649149375841,-5.663355779464563,88.71812175426749,22.92971562934462,20.19233324006043,-16.33873042171314,152.91418800176257,12.73902713188675,-30.692365504743425,-9.616807040432459,-41.468459742247816,-24.008421171116627,45.31755919363852,59.54336890689346,49.04937230159467,-68.02828049930174,57.59519291987925,15.277639368095564,-45.622653057172734,173.57559180821656,19.017828834241634,5.702223098848553,-2.350801349152257,-151.7364545145962,8.746208811054817,-67.00911860506086,-56.709697306470574,28.389062980808617,-20.67655231669565,33.32154018733024,4.214416642087825,54.09158649566163,-80.83913200745029,-0.6448117196779144,12.067853650999178,79.34146065338712,-69.7462564021706,-51.42256110908383,-56.64174730519882,-82.86227589893939,16.88542764207801,46.53780456910116,-93.6692556600177,62.8368696022437,0.05572608925607808,-10.96102265690717,78.22361789317443,20.004445034045894,57.05860887545082,12.74640633606202,10.884712938775971,-100.80763431206513,-63.283388391868456,-56.76546812931701,-6.3723447935396225,-8.240225412438846,-29.43172397858853,-16.755043885434088,12.407024804617691,48.24420155694262,-39.59647864607428,-112.71015150575374,26.162572512103278,-19.57684687558176,13.588031640148504,-45.89725543095684,6.651008429808158,65.17603494968589,69.17557320523112,55.07000862433236,80.45896470881536,17.92667660448721,-58.86507228115562,-53.40075519655574,-30.146310638524508,2.80803286397477,-35.98629904253914,29.998321694586686,-3.2914707051238565,-47.18752805080504,88.46935321601642,-54.510094132627145,47.92549018364463],\"y_coord\":[-2.148800620716362,12.56206956172203,-19.267174053086666,-2.198314870972887,-6.762395809605516,-15.470435493811754,-36.0595649476894,8.507020884739752,-185.23045141346245,-24.67505366413561,-62.68482790554669,49.62922684610518,86.99141744806887,-56.87443469196098,21.86059478639576,71.01984544858358,-65.27908611178434,105.55826367440393,33.58020774237338,15.860607927994423,103.9560196051252,-36.85751191369051,-9.137275696832317,-84.0609895239541,103.79153791807215,-61.765260039729036,-13.371542911253789,-27.869579576545306,69.14770966562251,-8.81768666151877,-42.00460427281244,-36.10682602871727,-51.34249546650208,57.438437079079534,-216.13249081380934,-63.15878126496767,7.526770623662116,-74.41911820758506,133.0641351683756,-85.81657920226108,57.817912825722544,-93.5757126151096,-49.05812539828386,12.210611527824327,-39.281947492115954,-29.347586978272258,126.3747940751577,-1.6773389847540712,66.14600641378487,33.10608945684205,-99.2090518736315,-57.666620445652995,93.34283950357612,-35.98761535341346,-10.466614877810487,20.40684741353767,-59.44841435400525,-0.24941875294578247,81.1898296969098,77.93954293898442,-25.39605431592446,-79.64741014818742,-18.95419283180958,33.290233039076675,-85.81439689227152,-95.37569496973818,-86.4484945651229,42.4908810070152,-126.9310414869381,-2.391221702302278,39.35948293479779,37.80364427279415,-3.7926377175466373,30.43029999163399,-22.273037538332876,-45.454920171903936,-65.78246860918864,54.26423609633515,-158.13082056578193,-8.971051650748928,-58.77484551134845,-42.097280089113255,50.74464344288632,12.772651277058646,-43.90535512605435,87.19880324737711,-29.07385834128183,-51.1640556260465,41.803203719207914,22.675322859294187,-0.8955021567680214,-6.253882713031273,-48.42994204377297,11.278980642138105,-20.960473423162153,-53.678403415919014,-14.834461043733572,-34.52297856056426,5.3801493313103865,14.75388878223425,-35.21790707283692,43.79615879037006,12.959088315575654,44.84761860272291,18.09060706094946,-42.643153984373235,-39.58507481807603,38.69361706356217,-18.508845133201888,63.22735215665121,-35.262931552490215,43.55921955752536,-18.713890549691587,44.363527732759465,78.03901702971679,83.65537340984535,23.987975074666387,120.25020349845852,-29.68052251171975,31.499231521010465,-160.26172384572592,105.51327321572043,-27.980990571544048,37.30804196591689,-7.855553135220685,-2.109553341704916,-93.84233978299876,189.8839583503646,-23.149482328597394,47.578610779532454,2.014890560132201,9.07160924515238,55.543454302861214,64.98971078820826,51.35593365404451,25.073681705341798,33.25971317677465,82.56765591545498,-30.034401864253255,-44.2004988770962,-79.84288048806926,45.2201290417868,-78.5313840556671,53.80143501688223,-70.7545132407963,-63.13284997091748,53.37677005786327,-11.841183531072632,-57.257575190670025,-19.602439528729363,-70.13866509318953,32.86674231767256,-37.32845975304842,3.100549562908569,-87.00838124488209,-34.915772743126304,120.17339853359299,-94.70184413037128,10.361119695884675,56.10697714620479,58.52411921101847,-46.07871296925631,61.85328224874108,34.92327306467134,-11.901451902889852,25.352727204663157,27.41886776294679,-17.17497276476369,23.241172227546542,-40.45105480686999,45.0411444450855,22.245489814117843,6.710803436300098,23.915927874558,23.642014798847303,21.40100858152056,22.831616959703254,69.76355381049187,-24.068565232838,-57.158869743665484,-28.26708898966066,-35.884387290287236,-48.70266839661377,-77.42964346606647,-110.90774970156095,-102.16539698398604,41.0550747587851,19.953622472253837,48.98096165777057,69.70206763134186,21.280845073980736,4.911192439798728,2.4866853041449026,-64.94604688998693,20.693199707393855,-87.34488526074388,-27.89289804183878,27.91691287043586,-56.520948781056305,-1.3044048447542511,5.174561924835972,-14.333644307304137,11.731060528791376,82.06793937188225,31.15152472075446,-5.243138246629208,16.5940818311722,-68.50263130240585,154.18967446800744,-43.05492625478067,-32.95746050897208,61.91107572877444,94.14043760986534,-0.9495911550417054,-49.92672359038136,40.670935116556485,-12.699841128832867,-145.16009024815912,-53.06616426336984,76.18058401164548,-0.5779619771992696,48.00612387947799,60.15342501248934,73.98334205666205,-56.6684707801139,68.22295799396089,-68.32809691471981,27.695561471424917,9.004446607955972,98.56088744034146,-87.2776707208973,-55.14067668796675,16.541350221948242,-57.15139342228419,-80.20848096255379,-65.22354456876158,8.678733155580508,3.5040541070961035,-8.248804138458581,-66.61935425085908,-10.582959290801398,33.34551652166759,15.990674252560785,-15.746533768474794,-43.264923424616164,-82.74249972485252,32.5990896313632,-11.551877657206784,-93.46950422633535,31.78268288867606,-31.382531302610772,51.42186274372668,73.57921006157004,45.74103403114116,42.43118794267442,34.2128359567792,-26.569396950448372,-22.368991752997044,135.5288765288077,10.069726043237942,-10.959002958852132,-19.682461194877106,-1.639094702014784,-22.510072523979407,-48.52936190468384,-68.99243771517494,-12.76287165610247,-31.916356580941756,-10.619793458167818,15.172802548147729,-14.800837585598181,63.357423906530585,10.937712513242008,9.824946671555036,-1.1408427638284153,61.132032111968414]}},\"id\":\"81d3887f-b2cf-412b-bf9f-365d8dbf8ccc\",\"type\":\"ColumnDataSource\"}],\"root_ids\":[\"49db60ff-11f7-4cf8-970b-9c3641d1276b\"]},\"title\":\"Bokeh Application\",\"version\":\"0.12.4\"}};\n", | |
" var render_items = [{\"docid\":\"9d1b0cc3-8343-4641-a1f2-e22b2ae1751f\",\"elementid\":\"fdf50d06-d538-436e-a838-bf1520a822ad\",\"modelid\":\"49db60ff-11f7-4cf8-970b-9c3641d1276b\"}];\n", | |
" \n", | |
" Bokeh.embed.embed_items(docs_json, render_items);\n", | |
" };\n", | |
" if (document.readyState != \"loading\") fn();\n", | |
" else document.addEventListener(\"DOMContentLoaded\", fn);\n", | |
" })();\n", | |
" },\n", | |
" function(Bokeh) {\n", | |
" }\n", | |
" ];\n", | |
" \n", | |
" function run_inline_js() {\n", | |
" \n", | |
" if ((window.Bokeh !== undefined) || (force === true)) {\n", | |
" for (var i = 0; i < inline_js.length; i++) {\n", | |
" inline_js[i](window.Bokeh);\n", | |
" }if (force === true) {\n", | |
" display_loaded();\n", | |
" }} else if (Date.now() < window._bokeh_timeout) {\n", | |
" setTimeout(run_inline_js, 100);\n", | |
" } else if (!window._bokeh_failed_load) {\n", | |
" console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", | |
" window._bokeh_failed_load = true;\n", | |
" } else if (force !== true) {\n", | |
" var cell = $(document.getElementById(\"fdf50d06-d538-436e-a838-bf1520a822ad\")).parents('.cell').data().cell;\n", | |
" cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", | |
" }\n", | |
" \n", | |
" }\n", | |
" \n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", | |
" run_inline_js();\n", | |
" } else {\n", | |
" load_libs(js_urls, function() {\n", | |
" console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", | |
" run_inline_js();\n", | |
" });\n", | |
" }\n", | |
" }(this));\n", | |
"</script>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"# add our DataFrame as a ColumnDataSource for Bokeh\n", | |
"plot_data = ColumnDataSource(tsne_vectors)\n", | |
"\n", | |
"# create the plot and configure the\n", | |
"# title, dimensions, and tools\n", | |
"tsne_plot = figure(title='t-SNE Word Embeddings',\n", | |
" plot_width = 800,\n", | |
" plot_height = 800,\n", | |
" tools= ('pan, wheel_zoom, box_zoom,'\n", | |
" 'box_select, resize, reset'),\n", | |
" active_scroll='wheel_zoom')\n", | |
"\n", | |
"# add a hover tool to display words on roll-over\n", | |
"tsne_plot.add_tools( HoverTool(tooltips = '@word') )\n", | |
"\n", | |
"# draw the words as circles on the plot\n", | |
"tsne_plot.circle('x_coord', 'y_coord', source=plot_data,\n", | |
" color='blue', line_alpha=0.2, fill_alpha=0.1,\n", | |
" size=10, hover_line_color='black')\n", | |
"\n", | |
"# configure visual elements of the plot\n", | |
"tsne_plot.title.text_font_size = value('16pt')\n", | |
"tsne_plot.xaxis.visible = False\n", | |
"tsne_plot.yaxis.visible = False\n", | |
"tsne_plot.grid.grid_line_color = None\n", | |
"tsne_plot.outline_line_color = None\n", | |
"\n", | |
"# engage!\n", | |
"show(tsne_plot);" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Conclusion" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Whew! Let's round up the major components that we've seen:\n", | |
"1. Text processing with **spaCy**\n", | |
"1. Automated **phrase modeling**\n", | |
"1. Topic modeling with **LDA** $\\ \\longrightarrow\\ $ visualization with **pyLDAvis**\n", | |
"1. Word vector modeling with **word2vec** $\\ \\longrightarrow\\ $ visualization with **t-SNE**\n", | |
"\n", | |
"#### Why use these models?\n", | |
"Dense vector representations for text like LDA and word2vec can greatly improve performance for a number of common, text-heavy problems like:\n", | |
"- Text classification\n", | |
"- Search\n", | |
"- Recommendations\n", | |
"- Question answering" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment