-
-
Save jrjames83/6b959a335232d830f9ce51c80dc8a4ae to your computer and use it in GitHub Desktop.
| { | |
| "cells": [ | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import pandas as pd\n", | |
| "from collections import Counter, defaultdict\n", | |
| "import itertools" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Search Query</th>\n", | |
| " <th>Clicks</th>\n", | |
| " <th>Impressions</th>\n", | |
| " <th>CTR</th>\n", | |
| " <th>Average Position</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>(other)</td>\n", | |
| " <td>5,937</td>\n", | |
| " <td>125,914</td>\n", | |
| " <td>4.72%</td>\n", | |
| " <td>20.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>youtube merchandise</td>\n", | |
| " <td>911</td>\n", | |
| " <td>2,413</td>\n", | |
| " <td>37.75%</td>\n", | |
| " <td>1.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>youtube merch</td>\n", | |
| " <td>895</td>\n", | |
| " <td>3,678</td>\n", | |
| " <td>24.33%</td>\n", | |
| " <td>1.7</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>youtube shop</td>\n", | |
| " <td>630</td>\n", | |
| " <td>1,398</td>\n", | |
| " <td>45.06%</td>\n", | |
| " <td>1.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>google t shirt</td>\n", | |
| " <td>476</td>\n", | |
| " <td>1,694</td>\n", | |
| " <td>28.10%</td>\n", | |
| " <td>1.0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " Search Query Clicks Impressions CTR Average Position\n", | |
| "0 (other) 5,937 125,914 4.72% 20.0\n", | |
| "1 youtube merchandise 911 2,413 37.75% 1.0\n", | |
| "2 youtube merch 895 3,678 24.33% 1.7\n", | |
| "3 youtube shop 630 1,398 45.06% 1.0\n", | |
| "4 google t shirt 476 1,694 28.10% 1.0" | |
| ] | |
| }, | |
| "execution_count": 8, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df = pd.read_csv('/Users/jeff/Downloads/Analytics 3 Raw Data View Queries 20180218-20180319.csv', \n", | |
| " skiprows=5)\n", | |
| "\n", | |
| "df.dropna(inplace=True)\n", | |
| "df.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Search Query</th>\n", | |
| " <th>Clicks</th>\n", | |
| " <th>Impressions</th>\n", | |
| " <th>CTR</th>\n", | |
| " <th>Average Position</th>\n", | |
| " <th>unigrams</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>(other)</td>\n", | |
| " <td>5,937</td>\n", | |
| " <td>125,914</td>\n", | |
| " <td>4.72%</td>\n", | |
| " <td>20.0</td>\n", | |
| " <td>[(other)]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>youtube merchandise</td>\n", | |
| " <td>911</td>\n", | |
| " <td>2,413</td>\n", | |
| " <td>37.75%</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>[youtube, merchandise]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>youtube merch</td>\n", | |
| " <td>895</td>\n", | |
| " <td>3,678</td>\n", | |
| " <td>24.33%</td>\n", | |
| " <td>1.7</td>\n", | |
| " <td>[youtube, merch]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>youtube shop</td>\n", | |
| " <td>630</td>\n", | |
| " <td>1,398</td>\n", | |
| " <td>45.06%</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>[youtube, shop]</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>google t shirt</td>\n", | |
| " <td>476</td>\n", | |
| " <td>1,694</td>\n", | |
| " <td>28.10%</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>[google, t, shirt]</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " Search Query Clicks Impressions CTR Average Position \\\n", | |
| "0 (other) 5,937 125,914 4.72% 20.0 \n", | |
| "1 youtube merchandise 911 2,413 37.75% 1.0 \n", | |
| "2 youtube merch 895 3,678 24.33% 1.7 \n", | |
| "3 youtube shop 630 1,398 45.06% 1.0 \n", | |
| "4 google t shirt 476 1,694 28.10% 1.0 \n", | |
| "\n", | |
| " unigrams \n", | |
| "0 [(other)] \n", | |
| "1 [youtube, merchandise] \n", | |
| "2 [youtube, merch] \n", | |
| "3 [youtube, shop] \n", | |
| "4 [google, t, shirt] " | |
| ] | |
| }, | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# What I would usually do: 1 check unigram frequency across all terms\n", | |
| "\n", | |
| "df['unigrams'] = df['Search Query'].map(lambda x: x.split())\n", | |
| "df.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 13, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('google', 881),\n", | |
| " ('t', 520),\n", | |
| " ('youtube', 438),\n", | |
| " ('store', 430),\n", | |
| " ('shirt', 385),\n", | |
| " ('shirts', 366),\n", | |
| " ('android', 272),\n", | |
| " ('black', 254),\n", | |
| " ('for', 238),\n", | |
| " ('buy', 203),\n", | |
| " ('merchandise', 195),\n", | |
| " ('shop', 193),\n", | |
| " ('baby', 157),\n", | |
| " ('merch', 148),\n", | |
| " ('cotton', 139),\n", | |
| " ('brand', 125),\n", | |
| " ('apparel', 120),\n", | |
| " ('logo', 117),\n", | |
| " ('cool', 117),\n", | |
| " ('men', 114)]" | |
| ] | |
| }, | |
| "execution_count": 13, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "merged = list(itertools.chain.from_iterable( df.unigrams.tolist() ))\n", | |
| "Counter(merged).most_common(20)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 14, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Search Query</th>\n", | |
| " <th>Clicks</th>\n", | |
| " <th>Impressions</th>\n", | |
| " <th>CTR</th>\n", | |
| " <th>Average Position</th>\n", | |
| " <th>unigrams</th>\n", | |
| " <th>startswith</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>(other)</td>\n", | |
| " <td>5,937</td>\n", | |
| " <td>125,914</td>\n", | |
| " <td>4.72%</td>\n", | |
| " <td>20.0</td>\n", | |
| " <td>[(other)]</td>\n", | |
| " <td>(other)</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>youtube merchandise</td>\n", | |
| " <td>911</td>\n", | |
| " <td>2,413</td>\n", | |
| " <td>37.75%</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>[youtube, merchandise]</td>\n", | |
| " <td>youtube</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>youtube merch</td>\n", | |
| " <td>895</td>\n", | |
| " <td>3,678</td>\n", | |
| " <td>24.33%</td>\n", | |
| " <td>1.7</td>\n", | |
| " <td>[youtube, merch]</td>\n", | |
| " <td>youtube</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>youtube shop</td>\n", | |
| " <td>630</td>\n", | |
| " <td>1,398</td>\n", | |
| " <td>45.06%</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>[youtube, shop]</td>\n", | |
| " <td>youtube</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>google t shirt</td>\n", | |
| " <td>476</td>\n", | |
| " <td>1,694</td>\n", | |
| " <td>28.10%</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>[google, t, shirt]</td>\n", | |
| " <td>google</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " Search Query Clicks Impressions CTR Average Position \\\n", | |
| "0 (other) 5,937 125,914 4.72% 20.0 \n", | |
| "1 youtube merchandise 911 2,413 37.75% 1.0 \n", | |
| "2 youtube merch 895 3,678 24.33% 1.7 \n", | |
| "3 youtube shop 630 1,398 45.06% 1.0 \n", | |
| "4 google t shirt 476 1,694 28.10% 1.0 \n", | |
| "\n", | |
| " unigrams startswith \n", | |
| "0 [(other)] (other) \n", | |
| "1 [youtube, merchandise] youtube \n", | |
| "2 [youtube, merch] youtube \n", | |
| "3 [youtube, shop] youtube \n", | |
| "4 [google, t, shirt] google " | |
| ] | |
| }, | |
| "execution_count": 14, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# Another route: phrase starts with\n", | |
| "df['startswith'] = df['Search Query'].map(lambda x: x.split()[0])\n", | |
| "df.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 15, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('google', 556),\n", | |
| " ('black', 219),\n", | |
| " ('android', 208),\n", | |
| " ('buy', 176),\n", | |
| " ('baby', 112),\n", | |
| " ('cool', 111),\n", | |
| " ('youtube', 99),\n", | |
| " ('brand', 72),\n", | |
| " ('best', 69),\n", | |
| " ('branded', 68),\n", | |
| " ('cotton', 63),\n", | |
| " ('blue', 62),\n", | |
| " ('apo', 52),\n", | |
| " ('free', 50),\n", | |
| " ('gift', 50),\n", | |
| " ('apparel', 48),\n", | |
| " ('100', 46),\n", | |
| " ('all', 45),\n", | |
| " ('99', 42),\n", | |
| " ('cheap', 42)]" | |
| ] | |
| }, | |
| "execution_count": 15, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "Counter(df.startswith.tolist()).most_common(20)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 20, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "startswith\n", | |
| "\"apparel\" [\"apparel\"]\n", | |
| "\"bags\" [\"bags\"]\n", | |
| "\"brand\" [\"brand\"]\n", | |
| "\"fun\" [\"fun\"]\n", | |
| "\"google [\"google mini\", \"google store\"]\n", | |
| "\"shop\" [\"shop\"]\n", | |
| "\"waze\" [\"waze\"]\n", | |
| "\"youtube\" [\"youtube\"]\n", | |
| "#ifam [#ifam merch]\n", | |
| "#shirt [#shirt]\n", | |
| "$ [$ 25 99]\n", | |
| "$10.00 [$10.00]\n", | |
| "$250.00 [$250.00]\n", | |
| "'waze ['waze]\n", | |
| "'שזק ['שזק]\n", | |
| "(apparel,) [(apparel,)]\n", | |
| "(other) [(other)]\n", | |
| "*.cap [*.cap]\n", | |
| "*.shop [*.shop]\n", | |
| "*you're [*you're sticker]\n", | |
| "+< [+< youtube]\n", | |
| "+outube [+outube]\n", | |
| "+y+outube [+y+outube]\n", | |
| "+yoiutiube [+yoiutiube]\n", | |
| "+yotub [+yotub]\n", | |
| "+yotube [+yotube]\n", | |
| "+you [+you tobe, +you tuve]\n", | |
| "+youtbe [+youtbe]\n", | |
| "+youtobe [+youtobe]\n", | |
| "+youtoube [+youtoube]\n", | |
| " ... \n", | |
| "21 [21 99, 21 apparel, 21 clothes shop, 21 jersey...\n", | |
| "22 [22 ounce bottle, 22 ounce mug]\n", | |
| "23/99 [23/99]\n", | |
| "24 [24 hours shipping, 24 merchandise]\n", | |
| "25 [25 99, 25 oz, 25 oz bottle]\n", | |
| "3 [3 for 99 shirts, 3 lines, 3 shirts for 99]\n", | |
| "360 [360 cafe near me, 360 t shirts]\n", | |
| "3lines [3lines]\n", | |
| "3xl [3xl sweater]\n", | |
| "4 [4 you tube]\n", | |
| "48 [48 hours video store]\n", | |
| "59 [59 99, 59 most popular women]\n", | |
| "6 [6 head best cap t shirt for sale]\n", | |
| "65 [65 * 4]\n", | |
| "7 [7 dog, 7 dog com]\n", | |
| "7dog [7dog, 7dog com, 7dog kid]\n", | |
| "7dog. [7dog. com]\n", | |
| "7dog.com [7dog.com]\n", | |
| "8 [8 kids clothes, 8 pc, 8 shirt, 8 sticker, 8 s...\n", | |
| "80g [80g woodfree inner paper notebook branded]\n", | |
| "9 [9, 9 11 decals, 9 brand, 9 pack]\n", | |
| "9184939 [9184939]\n", | |
| "99 [99 accessories, 99 and up store, 99 apparel, ...\n", | |
| "99$ [99$ laptop]\n", | |
| "99%is [99%is cap, 99%is hat]\n", | |
| "99*13 [99*13]\n", | |
| "<outube [<outube]\n", | |
| "=youtube [=youtube]\n", | |
| "?????youtube [?????youtube]\n", | |
| "?brand= [?brand=]\n", | |
| "Name: Search Query, dtype: object" | |
| ] | |
| }, | |
| "execution_count": 20, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# Aggregate under the starts with\n", | |
| "df.groupby('startswith')['Search Query'].apply(list)[:100]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 24, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>Search Query</th>\n", | |
| " <th>nbr_terms</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>startswith</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>google</th>\n", | |
| " <td>[google t shirt, google merchandise store, goo...</td>\n", | |
| " <td>556</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>black</th>\n", | |
| " <td>[black youtube, black, black action tees, blac...</td>\n", | |
| " <td>219</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>android</th>\n", | |
| " <td>[android merchandise, android stickers, androi...</td>\n", | |
| " <td>208</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>buy</th>\n", | |
| " <td>[buy google t shirt, buy youtube t shirt india...</td>\n", | |
| " <td>176</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>baby</th>\n", | |
| " <td>[baby merchandise, baby waze, baby, baby appar...</td>\n", | |
| " <td>112</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cool</th>\n", | |
| " <td>[cool mens tees, cool android accessories, coo...</td>\n", | |
| " <td>111</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>youtube</th>\n", | |
| " <td>[youtube merchandise, youtube merch, youtube s...</td>\n", | |
| " <td>99</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>brand</th>\n", | |
| " <td>[brand google, brand, brand account google, br...</td>\n", | |
| " <td>72</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>best</th>\n", | |
| " <td>[best in class youtube merch, best youtube mer...</td>\n", | |
| " <td>69</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>branded</th>\n", | |
| " <td>[branded baby clothes, branded baby products, ...</td>\n", | |
| " <td>68</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cotton</th>\n", | |
| " <td>[cotton bag in stock, cotton bag youtube, cott...</td>\n", | |
| " <td>63</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>blue</th>\n", | |
| " <td>[blue, blue amp white, blue android, blue baby...</td>\n", | |
| " <td>62</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>apo</th>\n", | |
| " <td>[apo address, apo address list, apo address sh...</td>\n", | |
| " <td>52</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>free</th>\n", | |
| " <td>[free google stickers, free google sticker, fr...</td>\n", | |
| " <td>50</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>gift</th>\n", | |
| " <td>[gift card google, gift card shop, gift, gift ...</td>\n", | |
| " <td>50</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>apparel</th>\n", | |
| " <td>[apparel store, apparel, apparel clothing stor...</td>\n", | |
| " <td>48</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>100</th>\n", | |
| " <td>[100 apparel, 100 cotton baby onesies, 100 cot...</td>\n", | |
| " <td>46</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>all</th>\n", | |
| " <td>[all about google, all about merch shopee, all...</td>\n", | |
| " <td>45</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>99</th>\n", | |
| " <td>[99 accessories, 99 and up store, 99 apparel, ...</td>\n", | |
| " <td>42</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cheap</th>\n", | |
| " <td>[cheap 100 cotton funny t shirts, cheap 100 co...</td>\n", | |
| " <td>42</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>backpack</th>\n", | |
| " <td>[backpack google, backpack, backpack bag brand...</td>\n", | |
| " <td>40</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>fun</th>\n", | |
| " <td>[fun google swag, fun googley swag, fun googly...</td>\n", | |
| " <td>38</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>gogle</th>\n", | |
| " <td>[gogle hat, gogle men, gogle t, gogle youtub, ...</td>\n", | |
| " <td>34</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>funny</th>\n", | |
| " <td>[funny car t shirts, funny computer stickers, ...</td>\n", | |
| " <td>34</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>a</th>\n", | |
| " <td>[a really cool google t-shirt, a really cool g...</td>\n", | |
| " <td>31</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>for</th>\n", | |
| " <td>[for all google, for all orders, for kids, for...</td>\n", | |
| " <td>26</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>merchandise</th>\n", | |
| " <td>[merchandise store, merchandise shop, merchand...</td>\n", | |
| " <td>26</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>gaming</th>\n", | |
| " <td>[gaming apparel and accessories, gaming appare...</td>\n", | |
| " <td>25</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>add</th>\n", | |
| " <td>[add a kid, add a tee, add accessories, add ap...</td>\n", | |
| " <td>25</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bottle</th>\n", | |
| " <td>[bottle google, bottle, bottle black, bottle b...</td>\n", | |
| " <td>22</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>...</th>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>child</th>\n", | |
| " <td>[child tee]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>children</th>\n", | |
| " <td>[children youtube]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>chill</th>\n", | |
| " <td>[chill waze]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>choicesgiveaway.top</th>\n", | |
| " <td>[choicesgiveaway.top google]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>chrome,</th>\n", | |
| " <td>[chrome, youtube, google maps, gmail,]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cick</th>\n", | |
| " <td>[cick ball]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>classroom.google.coms</th>\n", | |
| " <td>[classroom.google.coms]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>carry</th>\n", | |
| " <td>[carry a big sticker]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cardigan</th>\n", | |
| " <td>[cardigan hoodie men]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>carbon</th>\n", | |
| " <td>[carbon fiber coffee mug]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>brend</th>\n", | |
| " <td>[brend store]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bots</th>\n", | |
| " <td>[bots google]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bottleo</th>\n", | |
| " <td>[bottleo]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bottol</th>\n", | |
| " <td>[bottol]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>boulder</th>\n", | |
| " <td>[boulder hat store]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>box</th>\n", | |
| " <td>[box ships stock]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>boy</th>\n", | |
| " <td>[boy red tube]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bran</th>\n", | |
| " <td>[bran shop]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>brand-store.com</th>\n", | |
| " <td>[brand-store.com]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bucks</th>\n", | |
| " <td>[bucks youtube]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>car-t</th>\n", | |
| " <td>[car-t]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bunny's</th>\n", | |
| " <td>[bunny's boutique]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>but</th>\n", | |
| " <td>[but can you do this merch]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>button</th>\n", | |
| " <td>[button login google]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bybrand</th>\n", | |
| " <td>[bybrand]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>bytarifa</th>\n", | |
| " <td>[bytarifa brand shop]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cache</th>\n", | |
| " <td>[cache]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>canda</th>\n", | |
| " <td>[canda google]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>cannot</th>\n", | |
| " <td>[cannot use this product under a guest account]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>。youtube</th>\n", | |
| " <td>[。youtube]</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>986 rows × 2 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " Search Query \\\n", | |
| "startswith \n", | |
| "google [google t shirt, google merchandise store, goo... \n", | |
| "black [black youtube, black, black action tees, blac... \n", | |
| "android [android merchandise, android stickers, androi... \n", | |
| "buy [buy google t shirt, buy youtube t shirt india... \n", | |
| "baby [baby merchandise, baby waze, baby, baby appar... \n", | |
| "cool [cool mens tees, cool android accessories, coo... \n", | |
| "youtube [youtube merchandise, youtube merch, youtube s... \n", | |
| "brand [brand google, brand, brand account google, br... \n", | |
| "best [best in class youtube merch, best youtube mer... \n", | |
| "branded [branded baby clothes, branded baby products, ... \n", | |
| "cotton [cotton bag in stock, cotton bag youtube, cott... \n", | |
| "blue [blue, blue amp white, blue android, blue baby... \n", | |
| "apo [apo address, apo address list, apo address sh... \n", | |
| "free [free google stickers, free google sticker, fr... \n", | |
| "gift [gift card google, gift card shop, gift, gift ... \n", | |
| "apparel [apparel store, apparel, apparel clothing stor... \n", | |
| "100 [100 apparel, 100 cotton baby onesies, 100 cot... \n", | |
| "all [all about google, all about merch shopee, all... \n", | |
| "99 [99 accessories, 99 and up store, 99 apparel, ... \n", | |
| "cheap [cheap 100 cotton funny t shirts, cheap 100 co... \n", | |
| "backpack [backpack google, backpack, backpack bag brand... \n", | |
| "fun [fun google swag, fun googley swag, fun googly... \n", | |
| "gogle [gogle hat, gogle men, gogle t, gogle youtub, ... \n", | |
| "funny [funny car t shirts, funny computer stickers, ... \n", | |
| "a [a really cool google t-shirt, a really cool g... \n", | |
| "for [for all google, for all orders, for kids, for... \n", | |
| "merchandise [merchandise store, merchandise shop, merchand... \n", | |
| "gaming [gaming apparel and accessories, gaming appare... \n", | |
| "add [add a kid, add a tee, add accessories, add ap... \n", | |
| "bottle [bottle google, bottle, bottle black, bottle b... \n", | |
| "... ... \n", | |
| "child [child tee] \n", | |
| "children [children youtube] \n", | |
| "chill [chill waze] \n", | |
| "choicesgiveaway.top [choicesgiveaway.top google] \n", | |
| "chrome, [chrome, youtube, google maps, gmail,] \n", | |
| "cick [cick ball] \n", | |
| "classroom.google.coms [classroom.google.coms] \n", | |
| "carry [carry a big sticker] \n", | |
| "cardigan [cardigan hoodie men] \n", | |
| "carbon [carbon fiber coffee mug] \n", | |
| "brend [brend store] \n", | |
| "bots [bots google] \n", | |
| "bottleo [bottleo] \n", | |
| "bottol [bottol] \n", | |
| "boulder [boulder hat store] \n", | |
| "box [box ships stock] \n", | |
| "boy [boy red tube] \n", | |
| "bran [bran shop] \n", | |
| "brand-store.com [brand-store.com] \n", | |
| "bucks [bucks youtube] \n", | |
| "car-t [car-t] \n", | |
| "bunny's [bunny's boutique] \n", | |
| "but [but can you do this merch] \n", | |
| "button [button login google] \n", | |
| "bybrand [bybrand] \n", | |
| "bytarifa [bytarifa brand shop] \n", | |
| "cache [cache] \n", | |
| "canda [canda google] \n", | |
| "cannot [cannot use this product under a guest account] \n", | |
| "。youtube [。youtube] \n", | |
| "\n", | |
| " nbr_terms \n", | |
| "startswith \n", | |
| "google 556 \n", | |
| "black 219 \n", | |
| "android 208 \n", | |
| "buy 176 \n", | |
| "baby 112 \n", | |
| "cool 111 \n", | |
| "youtube 99 \n", | |
| "brand 72 \n", | |
| "best 69 \n", | |
| "branded 68 \n", | |
| "cotton 63 \n", | |
| "blue 62 \n", | |
| "apo 52 \n", | |
| "free 50 \n", | |
| "gift 50 \n", | |
| "apparel 48 \n", | |
| "100 46 \n", | |
| "all 45 \n", | |
| "99 42 \n", | |
| "cheap 42 \n", | |
| "backpack 40 \n", | |
| "fun 38 \n", | |
| "gogle 34 \n", | |
| "funny 34 \n", | |
| "a 31 \n", | |
| "for 26 \n", | |
| "merchandise 26 \n", | |
| "gaming 25 \n", | |
| "add 25 \n", | |
| "bottle 22 \n", | |
| "... ... \n", | |
| "child 1 \n", | |
| "children 1 \n", | |
| "chill 1 \n", | |
| "choicesgiveaway.top 1 \n", | |
| "chrome, 1 \n", | |
| "cick 1 \n", | |
| "classroom.google.coms 1 \n", | |
| "carry 1 \n", | |
| "cardigan 1 \n", | |
| "carbon 1 \n", | |
| "brend 1 \n", | |
| "bots 1 \n", | |
| "bottleo 1 \n", | |
| "bottol 1 \n", | |
| "boulder 1 \n", | |
| "box 1 \n", | |
| "boy 1 \n", | |
| "bran 1 \n", | |
| "brand-store.com 1 \n", | |
| "bucks 1 \n", | |
| "car-t 1 \n", | |
| "bunny's 1 \n", | |
| "but 1 \n", | |
| "button 1 \n", | |
| "bybrand 1 \n", | |
| "bytarifa 1 \n", | |
| "cache 1 \n", | |
| "canda 1 \n", | |
| "cannot 1 \n", | |
| "。youtube 1 \n", | |
| "\n", | |
| "[986 rows x 2 columns]" | |
| ] | |
| }, | |
| "execution_count": 24, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# Then check the most popular query roots - now you have some kind of legitimate \n", | |
| "# Root phrase analysis done, that could be contrasted with existing site architecture for gaps / holes\n", | |
| "out = df.groupby('startswith')['Search Query'].apply(list).to_frame().copy()\n", | |
| "out['nbr_terms'] = out['Search Query'].map(lambda x: len(x))\n", | |
| "out.sort_values(by=['nbr_terms'], ascending=False)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 27, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Clustering - getting fancy need to convert terms to a vectorized format (bag of words )\n", | |
| "from sklearn.feature_extraction.text import CountVectorizer\n", | |
| "\n", | |
| "# Creating Bag of Words\n", | |
| "count_vect = CountVectorizer(stop_words='english', ngram_range=(1,2))\n", | |
| "bag_of_words = count_vect.fit_transform(df['Search Query'].values)\n", | |
| "\n", | |
| "# There is lots of further tweaking you can do here like term frequency inverse document frequency, but since our terms are \n", | |
| "# So darn short, it's kinda stupid or http://jonathansoma.com/lede/algorithms-2017/classes/clustering/k-means-clustering-with-scikit-learn/\n", | |
| "# if you give a shit" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 28, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<5000x5740 sparse matrix of type '<class 'numpy.int64'>'\n", | |
| "\twith 19695 stored elements in Compressed Sparse Row format>" | |
| ] | |
| }, | |
| "execution_count": 28, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "bag_of_words # this is 5,000 obsersved queries, and 5740 unique words (the columns)\n", | |
| "# This is high dimensional, there are 5740 columns right? How do we reduce the dimensionality?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 31, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>00</th>\n", | |
| " <th>00 google</th>\n", | |
| " <th>00 shirts</th>\n", | |
| " <th>00 youtube</th>\n", | |
| " <th>006</th>\n", | |
| " <th>06</th>\n", | |
| " <th>06 tee</th>\n", | |
| " <th>10</th>\n", | |
| " <th>10 00</th>\n", | |
| " <th>10 99</th>\n", | |
| " <th>...</th>\n", | |
| " <th>рюкзак google</th>\n", | |
| " <th>такое</th>\n", | |
| " <th>что</th>\n", | |
| " <th>что такое</th>\n", | |
| " <th>ютуб</th>\n", | |
| " <th>ютуб магазин</th>\n", | |
| " <th>שזק</th>\n", | |
| " <th>गल</th>\n", | |
| " <th>デモアカウント</th>\n", | |
| " <th>日本</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>5</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>6</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>8</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>9</th>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " <td>0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>10 rows × 5740 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " 00 00 google 00 shirts 00 youtube 006 06 06 tee 10 10 00 10 99 \\\n", | |
| "0 0 0 0 0 0 0 0 0 0 0 \n", | |
| "1 0 0 0 0 0 0 0 0 0 0 \n", | |
| "2 0 0 0 0 0 0 0 0 0 0 \n", | |
| "3 0 0 0 0 0 0 0 0 0 0 \n", | |
| "4 0 0 0 0 0 0 0 0 0 0 \n", | |
| "5 0 0 0 0 0 0 0 0 0 0 \n", | |
| "6 0 0 0 0 0 0 0 0 0 0 \n", | |
| "7 0 0 0 0 0 0 0 0 0 0 \n", | |
| "8 0 0 0 0 0 0 0 0 0 0 \n", | |
| "9 0 0 0 0 0 0 0 0 0 0 \n", | |
| "\n", | |
| " ... рюкзак google такое что что такое ютуб ютуб магазин שזק गल \\\n", | |
| "0 ... 0 0 0 0 0 0 0 0 \n", | |
| "1 ... 0 0 0 0 0 0 0 0 \n", | |
| "2 ... 0 0 0 0 0 0 0 0 \n", | |
| "3 ... 0 0 0 0 0 0 0 0 \n", | |
| "4 ... 0 0 0 0 0 0 0 0 \n", | |
| "5 ... 0 0 0 0 0 0 0 0 \n", | |
| "6 ... 0 0 0 0 0 0 0 0 \n", | |
| "7 ... 0 0 0 0 0 0 0 0 \n", | |
| "8 ... 0 0 0 0 0 0 0 0 \n", | |
| "9 ... 0 0 0 0 0 0 0 0 \n", | |
| "\n", | |
| " デモアカウント 日本 \n", | |
| "0 0 0 \n", | |
| "1 0 0 \n", | |
| "2 0 0 \n", | |
| "3 0 0 \n", | |
| "4 0 0 \n", | |
| "5 0 0 \n", | |
| "6 0 0 \n", | |
| "7 0 0 \n", | |
| "8 0 0 \n", | |
| "9 0 0 \n", | |
| "\n", | |
| "[10 rows x 5740 columns]" | |
| ] | |
| }, | |
| "execution_count": 31, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# quick look in dataframe format\n", | |
| "pd.DataFrame(bag_of_words.toarray(), columns=count_vect.get_feature_names()).head(10)\n", | |
| "# See how the unique queries (up to 2 terms in length are on the top and each original row in the report\n", | |
| "# has an entry? " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 34, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Top terms per cluster:\n", | |
| "Cluster 0: baby board baby board onesie shirt clothes blue onesies sticker car baby onesie decal baby clothes gear window\n", | |
| "Cluster 1: sale shirts shirts sale shirt google shirt sale brand spring online bags sale google sale online sale android sale android book\n", | |
| "Cluster 2: cotton shirts 100 100 cotton cotton shirts men shirt shirts men mens buy tee cotton shirt price short cotton short\n", | |
| "Cluster 3: online shirts shopping shirts online online shopping store buy online store shop branded shop online store online 99 best shirt\n", | |
| "Cluster 4: android store toy shirt 21 android 21 logo android store robot laptop figure android robot android android black android laptop\n", | |
| "Cluster 5: shirt youtube tee shirt tee youtube shirt shirt youtube black shirt india india buy black tee logo online youtube logo google\n", | |
| "Cluster 6: shirts store cool men shirts men tee black tee shirts funny buy brand cool shirts play store play branded\n", | |
| "Cluster 7: sticker google google sticker review laptop pack sticker pack buy free emoji sticker emoji laptop sticker review google computer computer sticker\n", | |
| "Cluster 8: bags logo branded backpack store buy stickers book logo stickers book bags stock canvas goodie goodie bags bags stock\n", | |
| "Cluster 9: youtube merch black youtube merch logo merchandise shop free youtube merchandise black youtube store youtube logo free youtube buy buy youtube\n", | |
| "Cluster 10: car shirts shirts car cool car cool funny best car online shirts men shirts online sale funny car shirts sale best men\n", | |
| "Cluster 11: shirt black men cotton buy cool black shirt blue cool shirt mens brand blue shirt shirt men men shirt cotton shirt\n", | |
| "Cluster 12: apparel amp men amp apparel apparel men american black american apparel clothing store apparel amp apparel online baby apparel store women\n", | |
| "Cluster 13: merchandise store google merchandise store google merchandise shop merchandise shop uk gaming merchandise gaming online merchandise uk buy youtuber youtubers\n", | |
| "Cluster 14: google android google android android google help account com help google com google youtube cards google cards android robot store robot\n", | |
| "Cluster 15: black shop stickers buy apo brand gift bottle bag tube backpack clothing 99 decals gogle\n", | |
| "Cluster 16: mens shirts mens shirts buy cool cool mens buy mens shirts mens online tees mens tees funny mens tshirts apparel buy online\n", | |
| "Cluster 17: man shirt shirt man cotton cotton man day day man buy dan black buy man black man cheap man sale shirt sale\n", | |
| "Cluster 18: google store google store shirt logo gift buy youtube shop google shirt brand canada google logo card buy google\n", | |
| "Cluster 19: merch youtuber youtuber merch best buy youtubers merch store youtubers merch merch uk buy merch store uk fun merch youtube best merch\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# KMeans clustering a kind of clustering.\n", | |
| "from sklearn.cluster import KMeans\n", | |
| "\n", | |
| "number_of_clusters=20 # Somewhat arbitrary\n", | |
| "km = KMeans(n_clusters=number_of_clusters)\n", | |
| "km.fit(bag_of_words)\n", | |
| "\n", | |
| "print(\"Top terms per cluster:\")\n", | |
| "order_centroids = km.cluster_centers_.argsort()[:, ::-1]\n", | |
| "terms = count_vect.get_feature_names()\n", | |
| "for i in range(number_of_clusters):\n", | |
| " top_ten_words = [terms[ind] for ind in order_centroids[i, :15]]\n", | |
| " print(\"Cluster {}: {}\".format(i, ' '.join(top_ten_words)))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 35, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Last but not least, word vectors\n", | |
| "\n", | |
| "\n", | |
| "import gensim\n", | |
| "from gensim.models.word2vec import Word2Vec\n", | |
| "from sklearn.manifold import TSNE\n", | |
| "import pandas as pd\n", | |
| "from bokeh.io import output_notebook\n", | |
| "from bokeh.plotting import show, figure\n", | |
| "%matplotlib inline" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 36, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Train a skipgram model (not continuous bag of words, which learns word embeddings)\n", | |
| "model = Word2Vec(sentences=df.unigrams.tolist(), size=64, sg=1, window=10, min_count=2, seed=42, workers=8)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 40, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "array([ 3.90623398e-02, 3.80114503e-02, 2.09816948e-01,\n", | |
| " -4.32483777e-02, -3.31618339e-02, -1.54967338e-01,\n", | |
| " -1.37299791e-01, -3.00018877e-01, -6.85684010e-02,\n", | |
| " -9.34302583e-02, 2.82926427e-04, -3.10967892e-01,\n", | |
| " -3.80439341e-01, 1.23443812e-01, 1.17342416e-02,\n", | |
| " 1.00670390e-01, 9.45484787e-02, -3.14042389e-01,\n", | |
| " -2.06643015e-01, -3.45663309e-01, 2.60394737e-02,\n", | |
| " 1.41544849e-01, 8.37497562e-02, -5.17287254e-02,\n", | |
| " -2.66427845e-01, -1.94739942e-02, 4.85510081e-02,\n", | |
| " -1.85098365e-01, -2.03265160e-01, 1.58352286e-01,\n", | |
| " -1.03561627e-02, -1.11384660e-01, 1.75107196e-01,\n", | |
| " -4.28393960e-01, 1.05355702e-01, 4.28379774e-02,\n", | |
| " 1.40934452e-01, -4.05294925e-01, 1.29581377e-01,\n", | |
| " -3.46361921e-04, 9.38271582e-02, 3.26842308e-01,\n", | |
| " 1.28976613e-01, -2.19951440e-02, 2.50045247e-02,\n", | |
| " 1.50180221e-01, -1.02843665e-01, -2.15156898e-01,\n", | |
| " -1.17003910e-01, -1.93101496e-01, -2.47246936e-01,\n", | |
| " 6.85426369e-02, 1.02172114e-01, -8.88543427e-02,\n", | |
| " 2.28570297e-01, -1.57261118e-01, 1.85210884e-01,\n", | |
| " -2.30028614e-01, -3.20697457e-01, -5.83343804e-02,\n", | |
| " -2.16704950e-01, 2.75112339e-03, -7.26818666e-02,\n", | |
| " 1.66232765e-01], dtype=float32)" | |
| ] | |
| }, | |
| "execution_count": 40, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.wv['google'] # here is a word, it's represented in 64 dimensional space now" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 44, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "[('google', 0.9991686940193176),\n", | |
| " ('in', 0.9991583824157715),\n", | |
| " ('to', 0.99900221824646),\n", | |
| " ('best', 0.9989601969718933),\n", | |
| " ('store', 0.9989563822746277),\n", | |
| " ('pen', 0.9989402294158936),\n", | |
| " ('cheap', 0.9989197850227356),\n", | |
| " ('black', 0.9989176392555237),\n", | |
| " ('bottle', 0.998894214630127),\n", | |
| " ('apo', 0.9988889694213867)]" | |
| ] | |
| }, | |
| "execution_count": 44, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "model.wv.most_similar('merch') # most similar terms (again since the corpus is really short, it may be crappy)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 45, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stderr", | |
| "output_type": "stream", | |
| "text": [ | |
| "/Users/jeff/anaconda/envs/3point6/lib/python3.6/site-packages/ipykernel_launcher.py:2: DeprecationWarning: Call to deprecated `__getitem__` (Method will be removed in 4.0.0, use self.wv.__getitem__() instead).\n", | |
| " \n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Fancy chart for the client - convert the 64d space to 2d (kinda like principal component analysis)\n", | |
| "X = model[model.wv.vocab]\n", | |
| "tsne = TSNE(n_components=2, n_iter=1000) # 200 is minimum iter; default is 1000\n", | |
| "X_2d = tsne.fit_transform(X)\n", | |
| "\n", | |
| "coords_df = pd.DataFrame(X_2d, columns=['x','y'])\n", | |
| "coords_df['token'] = model.wv.vocab.keys()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 46, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "\n", | |
| " <div class=\"bk-root\">\n", | |
| " <a href=\"https://bokeh.pydata.org\" target=\"_blank\" class=\"bk-logo bk-logo-small bk-logo-notebook\"></a>\n", | |
| " <span id=\"6d4d2d4d-ae79-4576-9b02-03cd517f49f0\">Loading BokehJS ...</span>\n", | |
| " </div>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "application/javascript": [ | |
| "\n", | |
| "(function(root) {\n", | |
| " function now() {\n", | |
| " return new Date();\n", | |
| " }\n", | |
| "\n", | |
| " var force = true;\n", | |
| "\n", | |
| " if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n", | |
| " root._bokeh_onload_callbacks = [];\n", | |
| " root._bokeh_is_loading = undefined;\n", | |
| " }\n", | |
| "\n", | |
| " var JS_MIME_TYPE = 'application/javascript';\n", | |
| " var HTML_MIME_TYPE = 'text/html';\n", | |
| " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", | |
| " var CLASS_NAME = 'output_bokeh rendered_html';\n", | |
| "\n", | |
| " /**\n", | |
| " * Render data to the DOM node\n", | |
| " */\n", | |
| " function render(props, node) {\n", | |
| " var script = document.createElement(\"script\");\n", | |
| " node.appendChild(script);\n", | |
| " }\n", | |
| "\n", | |
| " /**\n", | |
| " * Handle when an output is cleared or removed\n", | |
| " */\n", | |
| " function handleClearOutput(event, handle) {\n", | |
| " var cell = handle.cell;\n", | |
| "\n", | |
| " var id = cell.output_area._bokeh_element_id;\n", | |
| " var server_id = cell.output_area._bokeh_server_id;\n", | |
| " // Clean up Bokeh references\n", | |
| " if (id !== undefined) {\n", | |
| " Bokeh.index[id].model.document.clear();\n", | |
| " delete Bokeh.index[id];\n", | |
| " }\n", | |
| "\n", | |
| " if (server_id !== undefined) {\n", | |
| " // Clean up Bokeh references\n", | |
| " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", | |
| " cell.notebook.kernel.execute(cmd, {\n", | |
| " iopub: {\n", | |
| " output: function(msg) {\n", | |
| " var element_id = msg.content.text.trim();\n", | |
| " Bokeh.index[element_id].model.document.clear();\n", | |
| " delete Bokeh.index[element_id];\n", | |
| " }\n", | |
| " }\n", | |
| " });\n", | |
| " // Destroy server and session\n", | |
| " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", | |
| " cell.notebook.kernel.execute(cmd);\n", | |
| " }\n", | |
| " }\n", | |
| "\n", | |
| " /**\n", | |
| " * Handle when a new output is added\n", | |
| " */\n", | |
| " function handleAddOutput(event, handle) {\n", | |
| " var output_area = handle.output_area;\n", | |
| " var output = handle.output;\n", | |
| "\n", | |
| " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", | |
| " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", | |
| " return\n", | |
| " }\n", | |
| "\n", | |
| " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", | |
| "\n", | |
| " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", | |
| " toinsert[0].firstChild.textContent = output.data[JS_MIME_TYPE];\n", | |
| " // store reference to embed id on output_area\n", | |
| " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", | |
| " }\n", | |
| " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", | |
| " var bk_div = document.createElement(\"div\");\n", | |
| " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", | |
| " var script_attrs = bk_div.children[0].attributes;\n", | |
| " for (var i = 0; i < script_attrs.length; i++) {\n", | |
| " toinsert[0].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", | |
| " }\n", | |
| " // store reference to server id on output_area\n", | |
| " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", | |
| " }\n", | |
| " }\n", | |
| "\n", | |
| " function register_renderer(events, OutputArea) {\n", | |
| "\n", | |
| " function append_mime(data, metadata, element) {\n", | |
| " // create a DOM node to render to\n", | |
| " var toinsert = this.create_output_subarea(\n", | |
| " metadata,\n", | |
| " CLASS_NAME,\n", | |
| " EXEC_MIME_TYPE\n", | |
| " );\n", | |
| " this.keyboard_manager.register_events(toinsert);\n", | |
| " // Render to node\n", | |
| " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", | |
| " render(props, toinsert[0]);\n", | |
| " element.append(toinsert);\n", | |
| " return toinsert\n", | |
| " }\n", | |
| "\n", | |
| " /* Handle when an output is cleared or removed */\n", | |
| " events.on('clear_output.CodeCell', handleClearOutput);\n", | |
| " events.on('delete.Cell', handleClearOutput);\n", | |
| "\n", | |
| " /* Handle when a new output is added */\n", | |
| " events.on('output_added.OutputArea', handleAddOutput);\n", | |
| "\n", | |
| " /**\n", | |
| " * Register the mime type and append_mime function with output_area\n", | |
| " */\n", | |
| " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", | |
| " /* Is output safe? */\n", | |
| " safe: true,\n", | |
| " /* Index of renderer in `output_area.display_order` */\n", | |
| " index: 0\n", | |
| " });\n", | |
| " }\n", | |
| "\n", | |
| " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", | |
| " if (root.Jupyter !== undefined) {\n", | |
| " var events = require('base/js/events');\n", | |
| " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", | |
| "\n", | |
| " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", | |
| " register_renderer(events, OutputArea);\n", | |
| " }\n", | |
| " }\n", | |
| "\n", | |
| " \n", | |
| " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", | |
| " root._bokeh_timeout = Date.now() + 5000;\n", | |
| " root._bokeh_failed_load = false;\n", | |
| " }\n", | |
| "\n", | |
| " var NB_LOAD_WARNING = {'data': {'text/html':\n", | |
| " \"<div style='background-color: #fdd'>\\n\"+\n", | |
| " \"<p>\\n\"+\n", | |
| " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", | |
| " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", | |
| " \"</p>\\n\"+\n", | |
| " \"<ul>\\n\"+\n", | |
| " \"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\\n\"+\n", | |
| " \"<li>use INLINE resources instead, as so:</li>\\n\"+\n", | |
| " \"</ul>\\n\"+\n", | |
| " \"<code>\\n\"+\n", | |
| " \"from bokeh.resources import INLINE\\n\"+\n", | |
| " \"output_notebook(resources=INLINE)\\n\"+\n", | |
| " \"</code>\\n\"+\n", | |
| " \"</div>\"}};\n", | |
| "\n", | |
| " function display_loaded() {\n", | |
| " var el = document.getElementById(\"6d4d2d4d-ae79-4576-9b02-03cd517f49f0\");\n", | |
| " if (el != null) {\n", | |
| " el.textContent = \"BokehJS is loading...\";\n", | |
| " }\n", | |
| " if (root.Bokeh !== undefined) {\n", | |
| " if (el != null) {\n", | |
| " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", | |
| " }\n", | |
| " } else if (Date.now() < root._bokeh_timeout) {\n", | |
| " setTimeout(display_loaded, 100)\n", | |
| " }\n", | |
| " }\n", | |
| "\n", | |
| "\n", | |
| " function run_callbacks() {\n", | |
| " try {\n", | |
| " root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", | |
| " }\n", | |
| " finally {\n", | |
| " delete root._bokeh_onload_callbacks\n", | |
| " }\n", | |
| " console.info(\"Bokeh: all callbacks have finished\");\n", | |
| " }\n", | |
| "\n", | |
| " function load_libs(js_urls, callback) {\n", | |
| " root._bokeh_onload_callbacks.push(callback);\n", | |
| " if (root._bokeh_is_loading > 0) {\n", | |
| " console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", | |
| " return null;\n", | |
| " }\n", | |
| " if (js_urls == null || js_urls.length === 0) {\n", | |
| " run_callbacks();\n", | |
| " return null;\n", | |
| " }\n", | |
| " console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", | |
| " root._bokeh_is_loading = js_urls.length;\n", | |
| " for (var i = 0; i < js_urls.length; i++) {\n", | |
| " var url = js_urls[i];\n", | |
| " var s = document.createElement('script');\n", | |
| " s.src = url;\n", | |
| " s.async = false;\n", | |
| " s.onreadystatechange = s.onload = function() {\n", | |
| " root._bokeh_is_loading--;\n", | |
| " if (root._bokeh_is_loading === 0) {\n", | |
| " console.log(\"Bokeh: all BokehJS libraries loaded\");\n", | |
| " run_callbacks()\n", | |
| " }\n", | |
| " };\n", | |
| " s.onerror = function() {\n", | |
| " console.warn(\"failed to load library \" + url);\n", | |
| " };\n", | |
| " console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", | |
| " document.getElementsByTagName(\"head\")[0].appendChild(s);\n", | |
| " }\n", | |
| " };var element = document.getElementById(\"6d4d2d4d-ae79-4576-9b02-03cd517f49f0\");\n", | |
| " if (element == null) {\n", | |
| " console.log(\"Bokeh: ERROR: autoload.js configured with elementid '6d4d2d4d-ae79-4576-9b02-03cd517f49f0' but no matching script tag was found. \")\n", | |
| " return false;\n", | |
| " }\n", | |
| "\n", | |
| " var js_urls = [\"https://cdn.pydata.org/bokeh/dev/bokeh-0.12.15dev2.min.js\", \"https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.15dev2.min.js\", \"https://cdn.pydata.org/bokeh/dev/bokeh-tables-0.12.15dev2.min.js\", \"https://cdn.pydata.org/bokeh/dev/bokeh-gl-0.12.15dev2.min.js\"];\n", | |
| "\n", | |
| " var inline_js = [\n", | |
| " function(Bokeh) {\n", | |
| " Bokeh.set_log_level(\"info\");\n", | |
| " },\n", | |
| " \n", | |
| " function(Bokeh) {\n", | |
| " \n", | |
| " },\n", | |
| " function(Bokeh) {\n", | |
| " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-0.12.15dev2.min.css\");\n", | |
| " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-0.12.15dev2.min.css\");\n", | |
| " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.15dev2.min.css\");\n", | |
| " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.15dev2.min.css\");\n", | |
| " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-tables-0.12.15dev2.min.css\");\n", | |
| " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-tables-0.12.15dev2.min.css\");\n", | |
| " }\n", | |
| " ];\n", | |
| "\n", | |
| " function run_inline_js() {\n", | |
| " \n", | |
| " if ((root.Bokeh !== undefined) || (force === true)) {\n", | |
| " for (var i = 0; i < inline_js.length; i++) {\n", | |
| " inline_js[i].call(root, root.Bokeh);\n", | |
| " }if (force === true) {\n", | |
| " display_loaded();\n", | |
| " }} else if (Date.now() < root._bokeh_timeout) {\n", | |
| " setTimeout(run_inline_js, 100);\n", | |
| " } else if (!root._bokeh_failed_load) {\n", | |
| " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", | |
| " root._bokeh_failed_load = true;\n", | |
| " } else if (force !== true) {\n", | |
| " var cell = $(document.getElementById(\"6d4d2d4d-ae79-4576-9b02-03cd517f49f0\")).parents('.cell').data().cell;\n", | |
| " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", | |
| " }\n", | |
| "\n", | |
| " }\n", | |
| "\n", | |
| " if (root._bokeh_is_loading === 0) {\n", | |
| " console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", | |
| " run_inline_js();\n", | |
| " } else {\n", | |
| " load_libs(js_urls, function() {\n", | |
| " console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", | |
| " run_inline_js();\n", | |
| " });\n", | |
| " }\n", | |
| "}(window));" | |
| ], | |
| "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"<div style='background-color: #fdd'>\\n\"+\n \"<p>\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"</p>\\n\"+\n \"<ul>\\n\"+\n \"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\\n\"+\n \"<li>use INLINE resources instead, as so:</li>\\n\"+\n \"</ul>\\n\"+\n \"<code>\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"</code>\\n\"+\n \"</div>\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"6d4d2d4d-ae79-4576-9b02-03cd517f49f0\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n }\n finally {\n delete root._bokeh_onload_callbacks\n }\n console.info(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(js_urls, callback) {\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = js_urls.length;\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var s = document.createElement('script');\n s.src = url;\n s.async = false;\n s.onreadystatechange = s.onload = function() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: all BokehJS libraries loaded\");\n run_callbacks()\n }\n };\n s.onerror = function() {\n console.warn(\"failed to load library \" + url);\n };\n console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.getElementsByTagName(\"head\")[0].appendChild(s);\n }\n };var element = document.getElementById(\"6d4d2d4d-ae79-4576-9b02-03cd517f49f0\");\n if (element == null) {\n console.log(\"Bokeh: ERROR: autoload.js configured with elementid '6d4d2d4d-ae79-4576-9b02-03cd517f49f0' but no matching script tag was found. \")\n return false;\n }\n\n var js_urls = [\"https://cdn.pydata.org/bokeh/dev/bokeh-0.12.15dev2.min.js\", \"https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.15dev2.min.js\", \"https://cdn.pydata.org/bokeh/dev/bokeh-tables-0.12.15dev2.min.js\", \"https://cdn.pydata.org/bokeh/dev/bokeh-gl-0.12.15dev2.min.js\"];\n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n \n function(Bokeh) {\n \n },\n function(Bokeh) {\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-0.12.15dev2.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-0.12.15dev2.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.15dev2.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.15dev2.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-tables-0.12.15dev2.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-tables-0.12.15dev2.min.css\");\n }\n ];\n\n function run_inline_js() {\n \n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"6d4d2d4d-ae79-4576-9b02-03cd517f49f0\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(js_urls, function() {\n console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| } | |
| ], | |
| "source": [ | |
| "output_notebook() # output bokeh plots inline in notebook\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 48, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "p = figure(plot_width=800, plot_height=800)\n", | |
| "_ = p.text(x=coords_df.x, y=coords_df.y, text=coords_df.token)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 49, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "\n", | |
| "<div class=\"bk-root\">\n", | |
| " <div class=\"bk-plotdiv\" id=\"49eee00e-5aee-4dd6-84f2-85aa053b8697\"></div>\n", | |
| "</div>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "application/javascript": [ | |
| "(function(root) {\n", | |
| " function embed_document(root) {\n", | |
| " \n", | |
| " var docs_json = {\"4007d922-a980-4a65-a6fb-e8d28537fda9\":{\"roots\":{\"references\":[{\"attributes\":{\"overlay\":{\"id\":\"67244f87-eebf-4a38-bc94-d5a72d8cddbe\",\"type\":\"BoxAnnotation\"}},\"id\":\"d54061d9-7c6a-41b8-99db-35bb7669c32e\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"67244f87-eebf-4a38-bc94-d5a72d8cddbe\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"660320a5-4ab0-4cee-b35e-038ba1b7cb20\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"e7e940aa-b571-4a5f-ad17-dbbd53a9aa4b\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"4c620534-67c7-456a-8db6-01a7987ab321\",\"type\":\"HelpTool\"},{\"attributes\":{\"source\":{\"id\":\"38a2b923-9ecd-497c-9fc1-d4982c583d9b\",\"type\":\"ColumnDataSource\"}},\"id\":\"4a26fdb2-ec30-435c-8396-90b29d4a53b3\",\"type\":\"CDSView\"},{\"attributes\":{\"below\":[{\"id\":\"62565003-6f37-4abd-b79a-fdc565c4480a\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"384d5dc7-80ff-466c-8482-6172a2f811d8\",\"type\":\"LinearAxis\"}],\"plot_height\":800,\"plot_width\":800,\"renderers\":[{\"id\":\"62565003-6f37-4abd-b79a-fdc565c4480a\",\"type\":\"LinearAxis\"},{\"id\":\"8fc8fab9-7391-4a01-adc2-c1d97393d772\",\"type\":\"Grid\"},{\"id\":\"384d5dc7-80ff-466c-8482-6172a2f811d8\",\"type\":\"LinearAxis\"},{\"id\":\"55a4cad2-a6cc-4232-8c73-8178e8cb7247\",\"type\":\"Grid\"},{\"id\":\"67244f87-eebf-4a38-bc94-d5a72d8cddbe\",\"type\":\"BoxAnnotation\"},{\"id\":\"f4ba4047-1978-40c8-9fd0-852021d8ef4c\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"d25221c1-caae-4812-acad-73850a912811\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"e5c00639-86d6-4b9f-aa59-43f34abcae89\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"32177461-1c51-46c6-8a1c-b61d2099e7b8\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"d6eaf2a8-c6ca-443b-9b16-3483ead81b91\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"858aa8a5-72ad-44c2-a59c-652d2823efb3\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"11f88bbc-6b1d-4e95-bf1f-b3a8dd8f0b3b\",\"type\":\"LinearScale\"}},\"id\":\"031c534a-dbb3-47b6-b00d-7a94d806867f\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"43dcadb7-baa9-475f-ab2b-6a0c63900c2e\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"da9d0c2e-302c-4dde-ae5a-b28df011d06e\",\"type\":\"PanTool\"},{\"id\":\"ba20369f-6e6f-4af6-aaae-89ec9e06dd13\",\"type\":\"WheelZoomTool\"},{\"id\":\"d54061d9-7c6a-41b8-99db-35bb7669c32e\",\"type\":\"BoxZoomTool\"},{\"id\":\"660320a5-4ab0-4cee-b35e-038ba1b7cb20\",\"type\":\"SaveTool\"},{\"id\":\"e7e940aa-b571-4a5f-ad17-dbbd53a9aa4b\",\"type\":\"ResetTool\"},{\"id\":\"4c620534-67c7-456a-8db6-01a7987ab321\",\"type\":\"HelpTool\"}]},\"id\":\"e5c00639-86d6-4b9f-aa59-43f34abcae89\",\"type\":\"Toolbar\"},{\"attributes\":{\"callback\":null},\"id\":\"32177461-1c51-46c6-8a1c-b61d2099e7b8\",\"type\":\"DataRange1d\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x\",\"y\",\"text\"],\"data\":{\"text\":[\"youtube\",\"merchandise\",\"merch\",\"shop\",\"google\",\"t\",\"shirt\",\"store\",\"stickers\",\"backpack\",\"apparel\",\"hoodie\",\"tshirt\",\"water\",\"bottle\",\"jacket\",\"sticker\",\"hat\",\"t-shirt\",\"shirts\",\"clothing\",\"bag\",\"socks\",\"sweatshirt\",\"tshirts\",\"sunglasses\",\"sale\",\"android\",\"cap\",\"mountain\",\"view\",\"mug\",\"free\",\"swag\",\"clothes\",\"accessories\",\"beanie\",\"gift\",\"yt\",\"youtuber\",\"official\",\"googleplex\",\"hats\",\"bags\",\"sweater\",\"employee\",\"firebase\",\"backpacks\",\"rucksack\",\"uk\",\"laptop\",\"t-shirts\",\"tee\",\"cup\",\"review\",\"us\",\"on\",\"partners\",\"fun\",\"partner\",\"thermal\",\"campus\",\"for\",\"card\",\"san\",\"francisco\",\"how\",\"to\",\"get\",\"you\",\"tube\",\"toy\",\"cloud\",\"hours\",\"sweat\",\"waze\",\"merchandising\",\"india\",\"hoodies\",\"logo\",\"notebook\",\"webshop\",\"figure\",\"find\",\"goat\",\"mugs\",\"polo\",\"goodies\",\"business\",\"decals\",\"men\",\"stylus\",\"pen\",\"shops\",\"ave\",\"buy\",\"alpine\",\"style\",\"ball\",\"bookbag\",\"shopping\",\"shipping\",\"time\",\"online\",\"tote\",\"spring\",\"autocollant\",\"cups\",\"onesie\",\"what\",\"is\",\"jackets\",\"company\",\"center\",\"printed\",\"jose\",\"line\",\"stores\",\"a\",\"really\",\"cool\",\"product\",\"coffee\",\"developer\",\"cards\",\"diego\",\"headgear\",\"mogz\",\"rate\",\"tensorflow\",\"yotube\",\"check\",\"out\",\"chills\",\"decal\",\"hoody\",\"paper\",\"in\",\"sweatshirts\",\"maze\",\"gaming\",\"figurine\",\"accesories\",\"cycling\",\"jersey\",\"io\",\"pay\",\"pens\",\"alto\",\"tees\",\"bottles\",\"google.merchandise\",\"nelk\",\"merchendise\",\"stor\",\"boosted\",\"figures\",\"back\",\"pack\",\"baseball\",\"chrome\",\"mens\",\"local\",\"guide\",\"merchadise\",\"my\",\"pulli\",\"dublin\",\"snapback\",\"packs\",\"return\",\"policy\",\"design\",\"googly\",\"tumbler\",\"items\",\"mascot\",\"robot\",\"googel\",\"accessory\",\"assistant\",\"baby\",\"brand\",\"g\",\"it\",\"maps\",\"analytics\",\"messenger\",\"souvenir\",\"california\",\"sf\",\"usa\",\"swags\",\"yellow\",\"googler\",\"i\",\"love\",\"like\",\"youtubers\",\"flamingo\",\"flow\",\"security\",\"golang\",\"caps\",\"gear\",\"lifestyle\",\"merchandize\",\"notebooks\",\"plushie\",\"canada\",\"near\",\"me\",\"storefront\",\"sun\",\"glasses\",\"toddler\",\"yuotube\",\"head\",\"herschel\",\"home\",\"men's\",\"boys\",\"ok\",\"with\",\"led\",\"perfectlaughs\",\"sketch\",\"shart\",\"tripolar\",\"brands\",\"creator\",\"fan\",\"jacke\",\"21\",\"doll\",\"soft\",\"toys\",\"apparels\",\"as\",\"seen\",\"bearing\",\"best\",\"class\",\"black\",\"boutique\",\"corporate\",\"famous\",\"flower\",\"gogle\",\"youtub\",\"goigle\",\"googe\",\"googie\",\"blue\",\"branded\",\"cart\",\"clearance\",\"cloth\",\"cloths\",\"doodle\",\"emoji\",\"garments\",\"certificate\",\"giftcard\",\"gifts\",\"goods\",\"mini\",\"instruments\",\"deutschland\",\"demo\",\"account\",\"merchandiser\",\".com\",\"login\",\"souvenirs\",\"sheet\",\"exchange\",\"order\",\"physical\",\"products\",\"sign\",\"tracking\",\"stroe\",\"sweaters\",\"thermos\",\"ti\\u015f\\u00f6rt\",\"zip\",\"up\",\"hey\",\"here\",\"journal\",\"kid\",\"malibu\",\"kids\",\"nest\",\"new\",\"women's\",\"office\",\"please\",\"rolltop\",\"seekrz\",\"information\",\"at\",\"by\",\"now\",\"small\",\"para\",\"pc\",\"com\",\"store?trackid=sp-006\",\"light\",\"that\",\"family\",\"the\",\"track\",\"unspeakable\",\"ninja\",\"are\",\"wind\",\"writing\",\"eshop\",\"shop.com\",\"wear\",\"\\u043c\\u0430\\u0433\\u0430\\u0437\\u0438\\u043d\",\"yutube\",\"99\",\".store\",\".youtube\",\"\\\"google\",\"+you\",\"25\",\"0\",\"00\",\"1\",\"100\",\"cotton\",\"onesies\",\"funny\",\"shorts\",\"short\",\"sleeve\",\"price\",\"made\",\"white\",\"original\",\"plain\",\"100%\",\"13\",\"17\",\"oz\",\"18\",\"19\",\"page\",\"1981\",\"drive\",\"22\",\"ounce\",\"24\",\"3\",\"lines\",\"360\",\"4\",\"video\",\"59\",\"women\",\"7\",\"dog\",\"7dog\",\"8\",\"9\",\"and\",\"99%is\",\"e\",\"of\",\"red\",\"day\",\"goo\",\"window\",\"z\",\"about\",\"play\",\"above\",\"accessoires\",\"phones\",\"sports\",\"achat\",\"acheter\",\"add\",\"toddlers\",\"channel\",\"address\",\"apo\",\"your\",\"questions\",\"addresses\",\"question\",\"adidas\",\"aliexpress\",\"admin\",\"ae\",\"discount\",\"african\",\"american\",\"agua\",\"ai\",\"world\",\"aldosworld\",\"store.com\",\"tv\",\"alfie\",\"all\",\"shopee\",\"jakarta\",\"logos\",\"navy\",\"orders\",\"aluminum\",\"drinkwares\",\"amazon\",\"express\",\"ship\",\"amp\",\"designs\",\"an\",\"anastasia\",\"from\",\"name\",\"women39s\",\"andoid\",\"andorid\",\"andrid\",\"andriod\",\"-\",\"16\",\"redesign\",\"chest\",\"hot\",\"action\",\"atore\",\"dash\",\"de\",\"fashion\",\"game\",\"help\",\"green\",\"man\",\"headquarters\",\"item\",\"grey\",\"size\",\"matrix\",\"no\",\"notepads\",\"or\",\"pages\",\"phone\",\"plush\",\"badge\",\"support\",\"woman\",\"androids\",\"angular\",\"angularjs\",\"certification\",\"anroid\",\"ant\",\"real\",\"fleece\",\"any\",\"other\",\"ap\",\"box\",\"apa\",\"itu\",\"api\",\"country\",\"state\",\"fpo\",\"boxes\",\"delivery\",\"locations\",\"mail\",\"restrictions\",\"marketing\",\"service\",\"stock\",\"take\",\"apo/fpo\",\"app\",\"games\",\"&\",\"definition\",\"womens\",\"apparel,\",\"apperal\",\"armed\",\"forces\",\"articles\",\"were\",\"ask\",\"asked\",\"ordering\",\"classroom\",\"austin\",\"australia\",\"australian\",\"authentic\",\"youth\",\"auto\",\"click\",\"web\",\"refresh\",\"views\",\"board\",\"awesome\",\"guys\",\"babies\",\"bib\",\"cappy\",\"gamer\",\"bibs\",\"googles\",\"cling\",\"car\",\"little\",\"windows\",\"vinyl\",\"suit\",\"onsie\",\"stuff\",\"school\",\"supplies\",\"book\",\"canvas\",\"computer\",\"roll\",\"top\",\"styles\",\"goog\",\"ballpoint\",\"band\",\"indonesia\",\"philippines\",\"tr\\u00f6ja\",\"there\",\"payment\",\"basic\",\"be\",\"beach\",\"bear\",\"beautiful\",\"belgium\",\"berlin\",\"tubes\",\"keep\",\"drinks\",\"cold\",\"give\",\"place\",\"removable\",\"selling\",\"laptops\",\"companies\",\"channels\",\"bicycle\",\"print\",\"fabric\",\"themed\",\"big\",\"bike\",\"icon\",\"download\",\"books\",\"boy\",\"brain\",\"glass\",\"drinkware\",\"extended\",\"friday\",\"goggle\",\"goodie\",\"googl\",\"google.com\",\"hardcover\",\"zipper\",\"female\",\"journals\",\"m\",\"metal\",\"note\",\"girls\",\"childrens\",\"collar\",\"transmission\",\"tumblers\",\"drink\",\"winter\",\"blacked\",\"blank\",\"blend\",\"bottled\",\"close\",\"cheap\",\"bot\",\"wall\",\"boulder\",\"gogal\",\"braille\",\"accounts\",\"www\",\"x\",\"can\",\"gmail\",\"low\",\"swell\",\"pakistan\",\"branding\",\"brandonio\",\"brat\",\"break\",\"hours,\",\"full\",\"do\",\"this\",\"flask\",\"mental\",\"fitness\",\"flashlight\",\"code\",\"high\",\"kit\",\"pads\",\"po\",\"u\",\"windshield\",\"subscribe\",\"buying\",\"bye\",\"callux\",\"send\",\"cookies\",\"go\",\"canadian\",\"cancel\",\"canda\",\"cannot\",\"shoulder\",\"cara\",\"filter\",\"fiber\",\"certificates\",\"cash\",\"causing\",\"centre\",\"chance\",\"charges\",\"chances\",\"cecelicious\",\"cell\",\"cellphone\",\"certified\",\"cgoogle\",\"charge\",\"nyc\",\"leather\",\"colored\",\"drawstring\",\"child\",\"childs\",\"wikipedia\",\"youtube,\",\"chubbs\",\"chunky\",\"classic\",\"clava\",\"gym\",\"clear\",\"insulated\",\"clockwork\",\"goggles\",\"coldplay\",\"college\",\"sheets\",\"come\",\"faq\",\"wiki\",\"does\",\"comprar\",\"contact\",\"converse\",\"pride\",\"engineering\",\"material\",\"coolest\",\"coral\",\"crew\",\"neck\",\"crew-neck\",\"crewneck\",\"current\",\"favorite\",\"custom\",\"customer\",\"customized\",\"cute\",\"dark\",\"ddg\",\"set\",\"des\",\"earth\",\"dev\",\"good\",\"dgoogle\",\"diary\",\"did\",\"work\",\"not\",\"draw\",\"due\",\"regional\",\"differences,\",\"trying\",\"sent\",\"recipient's\",\"eclipse\",\"edu\",\"education\",\"elastic\",\"closure\",\"electronic\",\"embossed\",\"en\",\"engineer\",\"es\",\"eso\",\"est\",\"excellent\",\"facebook\",\"fake\",\"fandroid\",\"fender\",\"fgoogle\",\"pricing\",\"flutter\",\"fold\",\"food\",\"foogle\",\"forgot\",\"password\",\"request\",\"2018\",\"freedom\",\"french\",\"frequently\",\"furry\",\"futbol\",\"gle\",\"o\",\"l\",\"gboard\",\"geek\",\"gents\",\"geogle\",\"germany\",\"ggle\",\"ggogle\",\"ggole\",\"ggoogle\",\"ideas\",\"today\",\"giigle\",\"giogle\",\"gioogle\",\"girl\",\"gloogle\",\"gloom\",\"gopher\",\"le\",\"outside\",\"gogel\",\"gogl\",\"gogoel\",\"gogogle\",\"gogole\",\"gogoole\",\"going\",\"gole\",\"golf\",\"gooble\",\"goodgle\",\"goodle\",\"gooe\",\"gooel\",\"goofle\",\"goog;e\",\"googal\",\"googele\",\"googgle\",\"googke\",\"googkle\",\"googl;e\",\"googla\",\"googld\",\".\",\"+\",\"services\",\"apps\",\"building\",\"buys\"],\"x\":{\"__ndarray__\":\"DVtvQVxhC0Gu/yNBeEs2QZffdEH0n0VBe0BMQd8maUEu1ihBlZwTQcJQ4UAn0MU/syaiwP4jGEGDYzZB58qKwe3JOUHHV//Al06HQDwYX0GzLZZAlptFQaC7W8GcQsjB3sBkwbFWu8AkeatAI1hEQQ/ECMAxCNLBmWGnwU0Mm8BevKTAlEtywXCvykBeA0jBPkeowQ71OEFOIzvAp4NMwAc2KcHuzEpB7/LUwVU71kBy6tPBugjNwb0SusEMb3TBZhPBwaclyMHAzqZAT/AcwYrsvT/iCxfACKifwfopEsBrtEZBs1OrQcIGf8DPnvS+4pK0waJb28EjO19BPzA0wSyozsG/7mfA0MDUwQQvekFZb4/AiuIOQQd9jEARI2fBVFBjwXN2zMFeFOFBCFTLwSlTy8Hmz7bBF0HdQLFffj/L1j/AFQMaQO35jsE9jr/BgBSMQZsXwcC1+tjBjByyQVXXUMEPyAdBNXK3QJE8WsFP7RpBo42JwG0LVz+fG25BJugVP4YRycEsnYDBGSfhQU2E0kBwRNO/i37PwQprwkBa4KbBC4M8QTXprEH6WbXBHgYWwfT5EMHpKqbB6UijwM3lFMEzDcdB03aowf/4/8AwYtHBQZYlQbwaaEEWEABBmCV8QATszsEPj5BA7VvLwYHBicBrtW9ADUsPwVYG/UGi667BF4nUQckl3kGiLmbBR7mCwTdme0EanDFA4PfmQVPshsE1L3xBSDzQwDlRnEDNOEHBpjPmQRwxDEIngJnBwdTIwQZ7EELXWqFBJxjdwYPD3kEJHiPBKiYZQfsM+UGD3snBDQAQQrgq0MFhLbJBPiT6QdQzhcEB5wvBRfOAwaxvWb/y4dc+fGWSQU9lU0GW0AJCvRrOwWkLqkHTZBhC2P3pQJIgi72CwErBxmt7wTxWmsF41qDB1ru8wWajt8E7fulBFyjVwcFws8E/LZdBaFqaQQe0VUFfcsBAFROowHtuvsEfKxxAxt16QCwM4cEQnZrBrq/zwKS41kC1GwO/TIeHQfyjxMEEnnlBFWQdwRmCskGvtM3BoyLYwZNSasExmMjBIlOXwbnKpcHppYTBDhvUwQZb90Eq+wFCy1oav8MwBkIvJUVA9htkQfviVkGhm8BBX17/v2kaucHxIb9B0EQCQt9SdsEYJbZBGEa/wfWFhsEfBYXBEV5ZQWs5EEENupjBGwMUQtX0C0JzK51BJwkHQrVn+L8bo/ZBbGudQL7BlkFGqcbBUJtWwYyZo8HieUrBk1E3QSFvtsHpOeY/5wYvQYEnJEEwAjtB2A9nQd1Fc8E1rR1B/0/OwdV6uEE3gU/BE/G3QbjhyUHP9rvBAB4JQQf75EBOTztBKigswefLRkGO7NrByBUXQj6SjkFiZd7BsDgRQn3uqUEGA5ZB4HWPQYQ92kFD6HXB9QSOQXtS10FgH1zBDnmswdVGzEGUxqM/OuW2QHdvq8HSN/VBnhK7Qamo3cAIKQxCRyNewb0ukcHW/JE/UzIUQp+qqkHLUdTB9bfkQfLtwsGT9LXBY27BQXDSOsFqfoDB+R6ywSWzl0FcPRY+utvVwYbzYsFp2F7BvWcnwTeKzMF2aKpBCPPVQQ+RlD3ACCHBPDmYwSCBCUFC8ZjBja2eQTL4kcH1wM3B+buMwW0ZmsGUyGFAw9ixwbZLX0FgpJLBwL28QefAKUGhMjjACbLcwUhOnECu88VBspEmQb0k3cFTIgpCzJvIQZa1kMBL+gNCBs4KQv01EEJ4YRVC1BbVQPeIjEHc0gBC48R/QXp+L8DlmgZBZKB6weUYqj/3IA/Bq5/ewA9u0MDDrylB6pDVwV6ZzEB56eO/ZVu8wSRaiMFR4YhBGaiewfWrjMExVzZAQm+PwcSEcz8GEQXA4VSzwXYbDkEyq/ZA/nqwQRC4vcE/xwVAIYS1PscLcEF2MyDBFIEIQnV1dsGRlz6/PzQxPo+5qkEkypfBCX2lwfZrVUG4YU9BxjAlQUkryMDuCALBosvMwUfna8Fzllq+sba/we6Anr/ZWjrB26qVQEB1FkJAeYTByuNBweQ/oUHjA7dB9CwswXwUn8A1lMDBvRDpwKwN+EAwti3ByiZtQKlc2sFUdhFBaMbPQVmQ+kH6KwJCqeiNwZdHTD6CMmlByV9SwetwzD8ZkBlB4umSwXFhmcErPKPBm/uDwc6Wu0G0e0e//4EqQIDE4sAnbnTBClbQwcUGaEFf3NXBb+7LwfMWucExz7xArlNrwFDh4sD6fqVAqIH9QTH95UGSHmjAwPvFwWytdEC0kbtBjXERQvB0C0LTQ+FA1WuZQYXMr8DpjlhB0RmZwW/EfsExIXfATnsJQjXiDEJ/lkRBv3dqQQVfuMA7N6NA1a/JwUk1ocEo28VBi3gqQQiJHcEgq8nBVm+gQZO9gMFm20RBS5S/wdNY9kEM5VnBIAjAwYqxgEED4YtBxkpDv7e0zkFyuKJB9gInQZWDwMH4NuBBQPPNwEiS3ME/o8FB+cLKQJg938BFk/LASC6Rv0HHucH516vBCoe6QfVh/0AcSojBMWeSwV+f5UGBWczB52ikQAYVvcFqE9/BKPl2vzAs+EDSHkdBqy1UwTIHVkEM6cjB8Vh2QYjjh8CVlaRAwBE3QHg+2EHXWaZB6ssvwZzYRcEeA4BBXjw2wcOmM8Hx4T1AL3fSwagNAUJ9FNzAshiWwFgZ60F6MifBJcfGQXu8mMFKa9fB6c+KwQ/W18HYLsfBBfbsQAB3acD/s63B60IBwU7VasEMO/bAKiG+v/0XukG1arhBeQJSwc/jQj/LIspB7chJQVLkrMFQEAw/XC8CQplVssG5vtHBjJPXwTFa98Bkh9dABkOkv83MX8GOraHBNLKJQcn64sE51bjB3ZHHwUrutkH3n9rBT9yrQWi928GYT9vBzrevQC9zd8FzcPLA+uCnwRkFQsHO5IxBVrOBQTkUu8EHIn7BNAVXwaHUDsG5F1vBK/TFwZZGwMBP+ytBq3QrwaNYj8EeEhDBo2CtwYRMu8GTasHBM7PsQHoyv8GAd8DBcgIGQRfxuMEguTPBfwZlwWZMdcGgiEVA9K6ywUhepcHJqX3AMiqxwdvHk8FV6s/B9/jHQSJI2sCXIp/AovNpwIGpqsHVOfpBezmlwYFIsMGJJ0LB2QPOwbc5LsHNxqNBIeMIwROIQ0D2euu/DptHwafuiEF7g+FB07wlwdMfYkEX11BBr3XzQfLmyUFATUnAh+CMQcjZl0E6twtCf7N5Qdt8XkF55F5ARtQUwfsp28EmHKHBIoEmwaQHrUHzGgFC6q8CwSc5okFkG8PB2eOJwO6hwsHbRszBn5bJwWEx1MHs3m1BCg/HwRXMfUEFIQXAuucBQbrLwMFBKILAdZZOQR3b8UDTCqhBxuNUQJro8kGa6gdC/r6iQK3PycEFMnfBZtqxwWwxkcHAzgJCzBqlwc9lscGTJprB2ZAFQalJHsGYm37B0dCgwaXWxsEmUb7BWiDaweQBi8HtHsfBv6/CQUlQKMG3NbVBMS6YwaE6DkKl8QRCRvCbwU2mi8HENdLBA+K2wadDdEFFzBJAhkXGwEnKqkFAFedBCGFkQcGo5cAA/SO+i4hrwWgkUcG5T83B4AZcvbFIbEEGvSjAUiM2wH76s8F1njXBHYnjQEOfqUD/1rPB2CyrwRnFV8CBKNvB2sDiwSeAvL6W5Z1B47r+QWwn7kEvrcPBPbjSwUGk0EGy6rg/mvdOwcb6CECS33bAOL31Qev3kcBObBdBT/ZKQX0wEULh695BV//QQSJbDUKC2W5B6tEMwQZiCkKXF4lB1zSrQQuea0BgEbVAn+7SQfSK08FmqNDBc2rTwSK5s8EZ1tHB+Y+JwRBdk0H7pdNBq3O9Qa5EqUEAL1JBqmKVQT9XMkE74dhBQxmaQJ19zkHiOBBCzb9PQX9UBcGjywBCeDXcQc3jZ0FO/OdBMVoIQjTinkErP5VBl3k/P4wpl0FpzxJCPVTnQe5HhUEIwLBB1J8EwOvenkGVaOpBZDBDwZZ+m0GvSopBVve3QOxS1sFTabDBs+qEQLJB9UFFIdXAktxEQUKtE0LoNRhCf/H5wBBCB0LlmAZCqirNwdX2hEEnkv9B0pwNQl/cBUKek9JB3iQPQhDWw8AmeAXBWeYWwQVhwEEy5cFBHoyCQUWKCEKtBg9CBCSBQUB0FEJ1qDlAjrMKQlTlEUE5lHtBnpUEQkNEbUAH5NpBjLsSQoM8zEFgLwdCwtPQQQxZFELMN99BaOLLQTwhEkJq/hdClWzvQVEk/EFpds5AyVOXQVzHkUB+oaxALqRQQUbaSEE=\",\"dtype\":\"float32\",\"shape\":[809]},\"y\":{\"__ndarray__\":\"qu4cQnWlEkKpOBdCfEYYQimhG0JLcBpChjcYQknRGULM0BRC3t8VQpaoD0JMnwdCH9T4QUNEE0K6vxZCmaO7QahBGUIvcOpBdYcLQm0jGUIChQ1CflgXQkwOgcEghmpBBJ3OQVO/gsHCCg9C3zIZQiVjAUKKGw9BZT+TQZSJ/EEnZvdBrHLIQagwEULOuNJBFc2QQeggGUK5mnjBlm/+QWIxYsEd2RLBAes+QPmxEUJUuQPAn6otQEUBAcGxmMRBSZzkwDBSQkGfiAxCkm/kQQeaBkJjDgBC1/GZQbz0AkJe3RpCMQkRwSOe9UFlsIjBCu6NQZeSm0AUexlCGtzXQbQmokB73Y/B49zmQMt3HEIRI/hB6PIVQta3CkKipsNBcEjHQRBwA0GlGBXBCr0vQdq4TkFFvYBBpU1DwZB5B0JvovxBIq6PwYVVPsHlr2/A0MhNwaKb80HEtrNAvaWrwMjSzUFvfRRCtBMPQtLCyUF9lxVCQxH6QQ7HlMH/hBpCUJ6NwdKrWUHhOcVBLdgZwURQDkLprf9BzOrxQPnwDULYpizBJ1SCwVyGNcE9N3lB33bhQVVShMFBKJBBzmmCwcgR4UE1mHDAY/XswHCbZMHWNyZBLXYXQp6dHEK+HivBBAgNQq/0QEFaiA1CKOMJvfd5/kG8YYfBXkJKwQ4l28Ccw7fA+WUNwWsf1cDNoVvB0ltuwd6CHsF6UQlCuPOjwMx9x0HnJBxCEzKIwQHPisHuKNZBYQBGvvSx/D4qRhDBlBajwAXa7b//OCfBVFAlQCuZiMDahN1BeMETQlD4lMDDYqS/9Fe5vz4sc8BxhhzBeCEywLnZwkFqUeRBiAi9QYykBELokwNC1XfCwOLsRcGVXqjAZ0hKQLmAiMCZ/gDAOaZbwar0X8EFsj/BnK1FwTCVMMFi9SfBHUlYQVI9kkG200zAOt0lP7GztsCulGvAazL0wCpRG0IPbRBCOw30QZkHgUGmY5XBHDqCwf0TMD3/UKhBFEnvQRn6e8HSnQFCb5rQwLmYpr/472PBqvvhQUk6T8A4Y9hAMtNCwMohTcGTIF5Bl/kfweZtmkEUkhvBtmzEQArrCsBz3zrAM5cDQiazoMChRwtCz9McQmF1HEK7pLnACUeVwS9NxsDbO37AWZJywKTJYsGHWvDAe1X1v0njuEEMob5B0GAswbEGFEIMSxzBDaeZv5buNr8iuEXBzv16v64TAEKr3oTAfzaBwSBgA8HDWS9BoT1EwaekE8GaZU7BdcplwZYsiUEv8InBIEIewYmKFUICmmTBBXoaQglwZ8Fhhm3BsisEQWFKz8DcAc9BOP3mwMutU8BJU63A++5swazuD0L/XxZC6jc8wS4sOcGyGuU/50YfwA59NsFP3Os/IjomwHCFVsFRYFHBiDYDwcQWkcDWyULBDCemwHxWGcDdX3TBxsubQXeU8sAC2H3BHmZCwTIPmEH/zrLARHiiwDG870GsKa/A7TDSQRImtUG5DnPBu8eRwDXvI8GYYd0/E6H+wIZ4YUGXuWZB8WQwwRi6S8HXg8dBxEKHQcbTZ8ALywNC69MBQW+OxUH0EnXBOFPkQdyXr0CxALrA/hyQwPKQgMHFFN1BQjCrQTBZhsFqVlLBPrwmwWYkOMEKUCZBwAoWwY26q0EovwtCw58MwcP9G0J94zDBo2bWwOYEfsFHmgBCJeckO3OFhcHyICrBGQ1Ywc5ySEBV2pDALLSbwIAf9EH+eCa/heAYwOi3tMBlDJy/Gj6GwYIVFsGZQnnA+oBfwVPa90H9ghJCImi9QcLgBUISN0PBqiPrQYpq8kFuJhZCi1uOPrq2DkK0gWTB/1l8QfQmtkFsSfTArMv8wCG1X8EhpnjBg7BJwSbzZcEet4LB8rmCQW8tScH61CzBEjhvwOOMYkHPF23B5m1xwcjOKsHByE3BPM8QwOwayEEM9YjBUhWGweGP9sA5tU/BjjAvwR7TGEIBig7B0ROIwRkn70HNEetB0uFoQGetzEEmwgVCREqHQTiMAEKD8tpB4hSZwW+fOj/FETzBj310wW1w0sCpDlfAie7bQUcfj8Eu1QPBCWXrQZzREUKXHOJB5FsJQtb8wj9ixWTBLjzCwBwY8sBPYqY/vYqwQSSnBULn2wPBw/HXQegCfsGGwUPBBKgjwVS9TcH/9R3BZFVAwUcVOsGHjwFCLWyJwcifb8FI4cxBHld9v3u4RsGxukZAGCFQwK+HikGBn1/BY+n7QXMN7UGlzo/BHxxTwIiO08CW9vZBqn5UQe/DbMGzi8fAjrWMwJvz3r+4AlvB8YopwSP7YsG+MlzB6icKwZz3UcGY6ZTB8gZuwJW+iL/JrDjBREE6wfm8bsEyaUHBE+iLwAaeoEE+xsjAG5AewXL1W8EBqoDAHyKkwGYpM8HsCUTBzbVPQTyzpj8i6tJB82I/wMlKCMFcsDfB7dNkwRmRGMEw6iPB9DNHwceBysAMAp3ATJRfwVXQhEBdrqTAmHF8wWcHcsFOYkXBfFgDQjHJA8Ff+LjAMyMRwQv4RsEu6UzBZT+yQSjIy8AAzNE/w4tCwQD1Z0G9es6/cFR0wezBjsFbLRdCsHpywYgcZMFKLIzAu11SwYITgMFeN3LBq5RzwSTv+MCxrPPAAOqBwZJSbsGoXBvBiBl7wWXwbMGm/wdCsZA2wJKprj8AdHXB8u90wbluRMCOlIbBx5MEwTY5qEG5fxlALmqzQblr47/8r2VBYQ0TQktLh8GFPAXBk8hjwbbqSsE5n0/BWdl8wV9uYsDeXh/BINWEwWDQBUJGWcbAOAJgwZ4WiEFWT5LBRuabwBZV2cA83v9Agr4uv3ba5kH+CRJCq4gDQlJlYsHLGx7B12ZjwR/Dbz/QSIxB9ZV3wLqhBMEEjFxALe7VwNtr3kAZhE0/mICHwRVjvkF3BIvBbHoHwU+QYsEksGTBOp8swXkXkMAL+b9BOh/PQTmI6EHKWHrBH41CQb8jccGslXzB+a/fQXFdukGmpY7BBiiSQYHzd0HjT3FBhgeFwUvP6cCh5I3A8o5wwfkrdsAEPHbB0kY6wQw0N8HM6wlCTKIiwSA39MCDJVnBOeKUQUgJMsGFu5o/errQwOhdgsFv5pLBb7SDwQf7nkHXcDi/MpaaQc2TikFwKYbBBfAUQfqnZsHKKtTAJhxdwYTXC0JVy4jBQqFcwRmjHcEzG0HAbS5/wftwTsFe0BlCYKp7wJdEF8EelW3BPH3GwDfK/sCrwT7AvXfGwM3aHMHUhglCv15xwdOv50ClOxHBSBpYwVZ1NsEj2dy/ty9ywVjYPMGzNb3AhuLzQYNxLcAbdpJATuijP6729EBKwQ3BNzgiwDvwF8GhHIbBpopJwZHqlsA5vfpBk/l4wQJhccE4+VnBQw5MwcGqtMD20Dc/bUgMQoj0Q0EkHMxBVxyXwKfBrkEvBjW/vwybQQn78sBhu6NBFCV1wRWHe8GUAFTBHFqjQUZTIcCHCnxBx8aLPliHuUHUBgbAaRoBwVP33EGSpkzBD64SwXILccB1SQjABP46wYP7XcEmqJg/E+u2wFkWRcF05ILByRFFwaENwsB4AIvApCh1wa/MYMG5aZjBPl50waZL0kGBclFB5rRuwUeIKsGWEV7BYViRwcNr+8D+OVDBNMlgwSmMjcF0e31BXpQjwbLKgsG/e71A68wWQBMIj8F92ujAiJZ/wMBwqsDS/lVBQfYKQVrWcMAIe1fBYQBrwXL/ZcEyeFnBMITFv7w5i8HSmVvBOLtLwaI8yD7rJwXBsTX+wK29TcAchzrBzo1qweSHjb2cYPbAR/2vwI+RjsEAJnnB/2E3wSlIl0D36Rc+a9aDQAsugkFhLTFA1NAqwTt5F8HUDOPArjM7wVrQEMEWegPB+ttSwZaRO8Et4rPAe3p7wcPzXsDVb8K/a7VgwWvoVcHEY82+Lsmuv121J8Gc/u7A4+S2wLs0OcF7c63AQmqIwas6C8Gozby9Xi6hwObSRcEexSDBXXCSwfbQD8HDMhTAOn3XQWG9y8AHARrB4dCNwdKmsUBGVOrA/9GKwaJCwb5QH0XBoXgowVxJccDBh5q/aGeCwfhUHD+Hp+m/W8wwQeg1JsEpqsPAyvGNwJwba8DTF77AGgHBPu6CfsGWRYDBskBvwRGD9cDI35jANNBLwQ8igr96rLC+CPQ9wSWnNsCMiWnBiGpkwAe0e8EGge3Arr5SwJV+jcEzGbDAeYwpwJGbIcCDBdPADCvbwLsCSb9pFTfAki6mwBR9wr+JalzAKQHZwMqxlr/iEETB8zknweLskMFfC5fBxkIywSWhfcE=\",\"dtype\":\"float32\",\"shape\":[809]}},\"selected\":null,\"selection_policy\":null},\"id\":\"38a2b923-9ecd-497c-9fc1-d4982c583d9b\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null},\"id\":\"858aa8a5-72ad-44c2-a59c-652d2823efb3\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"d6eaf2a8-c6ca-443b-9b16-3483ead81b91\",\"type\":\"LinearScale\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"d25221c1-caae-4812-acad-73850a912811\",\"type\":\"Title\"},{\"attributes\":{},\"id\":\"11f88bbc-6b1d-4e95-bf1f-b3a8dd8f0b3b\",\"type\":\"LinearScale\"},{\"attributes\":{\"plot\":{\"id\":\"031c534a-dbb3-47b6-b00d-7a94d806867f\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"16a9a7a0-15f9-4eb3-9631-634bb57d6b32\",\"type\":\"BasicTicker\"}},\"id\":\"8fc8fab9-7391-4a01-adc2-c1d97393d772\",\"type\":\"Grid\"},{\"attributes\":{\"text_color\":{\"value\":\"black\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"7b45b8fe-ad19-4636-b168-6bcc93db5496\",\"type\":\"Text\"},{\"attributes\":{},\"id\":\"716a7179-f86d-4c58-afb9-58a4e52200a3\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"formatter\":{\"id\":\"716a7179-f86d-4c58-afb9-58a4e52200a3\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"031c534a-dbb3-47b6-b00d-7a94d806867f\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"16a9a7a0-15f9-4eb3-9631-634bb57d6b32\",\"type\":\"BasicTicker\"}},\"id\":\"62565003-6f37-4abd-b79a-fdc565c4480a\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"16a9a7a0-15f9-4eb3-9631-634bb57d6b32\",\"type\":\"BasicTicker\"},{\"attributes\":{\"formatter\":{\"id\":\"43dcadb7-baa9-475f-ab2b-6a0c63900c2e\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"031c534a-dbb3-47b6-b00d-7a94d806867f\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"7fb809e5-4a38-4dd4-86ca-9f30c8a47f5b\",\"type\":\"BasicTicker\"}},\"id\":\"384d5dc7-80ff-466c-8482-6172a2f811d8\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"7fb809e5-4a38-4dd4-86ca-9f30c8a47f5b\",\"type\":\"BasicTicker\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"031c534a-dbb3-47b6-b00d-7a94d806867f\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"7fb809e5-4a38-4dd4-86ca-9f30c8a47f5b\",\"type\":\"BasicTicker\"}},\"id\":\"55a4cad2-a6cc-4232-8c73-8178e8cb7247\",\"type\":\"Grid\"},{\"attributes\":{\"data_source\":{\"id\":\"38a2b923-9ecd-497c-9fc1-d4982c583d9b\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"7b45b8fe-ad19-4636-b168-6bcc93db5496\",\"type\":\"Text\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"d3f85cb0-fd09-4cd3-ad29-03b025771eaa\",\"type\":\"Text\"},\"selection_glyph\":null,\"view\":{\"id\":\"4a26fdb2-ec30-435c-8396-90b29d4a53b3\",\"type\":\"CDSView\"}},\"id\":\"f4ba4047-1978-40c8-9fd0-852021d8ef4c\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"text_alpha\":{\"value\":0.1},\"text_color\":{\"value\":\"black\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"d3f85cb0-fd09-4cd3-ad29-03b025771eaa\",\"type\":\"Text\"},{\"attributes\":{},\"id\":\"da9d0c2e-302c-4dde-ae5a-b28df011d06e\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"ba20369f-6e6f-4af6-aaae-89ec9e06dd13\",\"type\":\"WheelZoomTool\"}],\"root_ids\":[\"031c534a-dbb3-47b6-b00d-7a94d806867f\"]},\"title\":\"Bokeh Application\",\"version\":\"0.12.15dev2\"}};\n", | |
| " var render_items = [{\"docid\":\"4007d922-a980-4a65-a6fb-e8d28537fda9\",\"elementid\":\"49eee00e-5aee-4dd6-84f2-85aa053b8697\",\"modelid\":\"031c534a-dbb3-47b6-b00d-7a94d806867f\"}];\n", | |
| " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", | |
| "\n", | |
| " }\n", | |
| " if (root.Bokeh !== undefined) {\n", | |
| " embed_document(root);\n", | |
| " } else {\n", | |
| " var attempts = 0;\n", | |
| " var timer = setInterval(function(root) {\n", | |
| " if (root.Bokeh !== undefined) {\n", | |
| " embed_document(root);\n", | |
| " clearInterval(timer);\n", | |
| " }\n", | |
| " attempts++;\n", | |
| " if (attempts > 100) {\n", | |
| " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", | |
| " clearInterval(timer);\n", | |
| " }\n", | |
| " }, 10, root)\n", | |
| " }\n", | |
| "})(window);" | |
| ], | |
| "application/vnd.bokehjs_exec.v0+json": "" | |
| }, | |
| "metadata": { | |
| "application/vnd.bokehjs_exec.v0+json": { | |
| "id": "031c534a-dbb3-47b6-b00d-7a94d806867f" | |
| } | |
| }, | |
| "output_type": "display_data" | |
| } | |
| ], | |
| "source": [ | |
| "show(p)\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.6.1" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
My bad, just saw your comment. Regarding reducing terms to their basic form (de-pluralization, etc...) there are 2 thoughts. "Stemming", which applies a crude heuristic to terms to reduce them, then lemmatization, is more complex and may change the word entirely depending on the word.
To just use a plain old stemmer without NLTK, you can just google "porter stemmer python" and find a singular implementation. NLTK is easy enough however, unless you're trying to keep an env clean from bloated packages or something.
Stop words definitely normal [x for x in my_list if x not in ['a', 'the', 'and', 'in', 'at', etc...] ]
Yeah the starts with or ends with approximations are usually good enough for grouping. To your point about grouping and then counting the metric, yes, agreed.
you could also group by a lexical sort: if you had a bunch of phrases, for each phrase, split to list, sort alpha wise, add the key to a defaultdict(list) then append the unaffected term which was used for the split and sort operation, to get a lexical ordered grouping.
@jsma ^^
Thanks! I had no idea gists would render a notebook like this.
For the unigrams, it looks like you're just doing a raw count of how many rows had "youtube" in the search phrase. In my case I took the pageviews associated with the origin search phrase and then sum these together, which gives me a better sense of how many times a word was used in search. Using your data set, I'd sum the pageviews for every row that contains 'youtube' to come up with the number.
I'll give your startswith stuff a try up through "Out[24]:". I'll have to dig into the clustering stuff some other day.
I'm using a custom stop words list to just filter out in python
if word in stop_list: continue, etc. but any thoughts on merging singular vs plural forms without bringing in nltk? In my data set I see "fast facts" as the top search phrase (well, of phrases that exactly match, only represents a tiny fraction of actual search volume, the data is all long tail) and "fast fact" is #20 in the list. I may just doif word.endswith('s'): # remove the trailing 's' and see if this results in a word that appears elsewhereor some other hacky approximation.Thanks again!