Created
November 13, 2017 17:45
-
-
Save Keerthivasan-A/88a461c32ab1a39d6afc3a4a994fa175 to your computer and use it in GitHub Desktop.
Countvectorizer and TFIDF
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Natural Language Processing Project\n", | |
"\n", | |
"Source: [Yelp Review Data Set from Kaggle](https://www.kaggle.com/c/yelp-recsys-2013).\n", | |
"\n", | |
"Each observation in this dataset is a review of a particular business by a particular user.\n", | |
"\n", | |
"The \"stars\" column is the number of stars (1 through 5) assigned by the reviewer to the business. (Higher stars is better.) In other words, it is the rating of the business by the person who wrote the review.\n", | |
"\n", | |
"The \"cool\" column is the number of \"cool\" votes this review received from other Yelp users. \n", | |
"\n", | |
"All reviews start with 0 \"cool\" votes, and there is no limit to how many \"cool\" votes a review can receive. In other words, it is a rating of the review itself, not a rating of the business.\n", | |
"\n", | |
"The \"useful\" and \"funny\" columns are similar to the \"cool\" column." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Imports\n", | |
" **Import the usual suspects. :) **" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## The Data\n", | |
"\n", | |
"**Read the yelp.csv file**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"yelp = pd.read_csv('yelp.csv')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"** Check the head, info , and describe methods on yelp.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>business_id</th>\n", | |
" <th>date</th>\n", | |
" <th>review_id</th>\n", | |
" <th>stars</th>\n", | |
" <th>text</th>\n", | |
" <th>type</th>\n", | |
" <th>user_id</th>\n", | |
" <th>cool</th>\n", | |
" <th>useful</th>\n", | |
" <th>funny</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>9yKzy9PApeiPPOUJEtnvkg</td>\n", | |
" <td>2011-01-26</td>\n", | |
" <td>fWKvX83p0-ka4JS3dc6E5A</td>\n", | |
" <td>5</td>\n", | |
" <td>My wife took me here on my birthday for breakf...</td>\n", | |
" <td>review</td>\n", | |
" <td>rLtl8ZkDX5vH5nAx9C3q5Q</td>\n", | |
" <td>2</td>\n", | |
" <td>5</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>ZRJwVLyzEJq1VAihDhYiow</td>\n", | |
" <td>2011-07-27</td>\n", | |
" <td>IjZ33sJrzXqU-0X6U8NwyA</td>\n", | |
" <td>5</td>\n", | |
" <td>I have no idea why some people give bad review...</td>\n", | |
" <td>review</td>\n", | |
" <td>0a2KyEL0d3Yb1V6aivbIuQ</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>6oRAC4uyJCsJl1X0WZpVSA</td>\n", | |
" <td>2012-06-14</td>\n", | |
" <td>IESLBzqUCLdSzSqm0eCSxQ</td>\n", | |
" <td>4</td>\n", | |
" <td>love the gyro plate. Rice is so good and I als...</td>\n", | |
" <td>review</td>\n", | |
" <td>0hT2KtfLiobPvh6cDC8JQg</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>_1QQZuf4zZOyFCvXc0o6Vg</td>\n", | |
" <td>2010-05-27</td>\n", | |
" <td>G-WvGaISbqqaMHlNnByodA</td>\n", | |
" <td>5</td>\n", | |
" <td>Rosie, Dakota, and I LOVE Chaparral Dog Park!!...</td>\n", | |
" <td>review</td>\n", | |
" <td>uZetl9T0NcROGOyFfughhg</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>6ozycU1RpktNG2-1BroVtw</td>\n", | |
" <td>2012-01-05</td>\n", | |
" <td>1uJFq2r5QfJG_6ExMRCaGw</td>\n", | |
" <td>5</td>\n", | |
" <td>General Manager Scott Petello is a good egg!!!...</td>\n", | |
" <td>review</td>\n", | |
" <td>vYmM4KTsC8ZfQBg-j5MWkw</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" business_id date review_id stars \\\n", | |
"0 9yKzy9PApeiPPOUJEtnvkg 2011-01-26 fWKvX83p0-ka4JS3dc6E5A 5 \n", | |
"1 ZRJwVLyzEJq1VAihDhYiow 2011-07-27 IjZ33sJrzXqU-0X6U8NwyA 5 \n", | |
"2 6oRAC4uyJCsJl1X0WZpVSA 2012-06-14 IESLBzqUCLdSzSqm0eCSxQ 4 \n", | |
"3 _1QQZuf4zZOyFCvXc0o6Vg 2010-05-27 G-WvGaISbqqaMHlNnByodA 5 \n", | |
"4 6ozycU1RpktNG2-1BroVtw 2012-01-05 1uJFq2r5QfJG_6ExMRCaGw 5 \n", | |
"\n", | |
" text type \\\n", | |
"0 My wife took me here on my birthday for breakf... review \n", | |
"1 I have no idea why some people give bad review... review \n", | |
"2 love the gyro plate. Rice is so good and I als... review \n", | |
"3 Rosie, Dakota, and I LOVE Chaparral Dog Park!!... review \n", | |
"4 General Manager Scott Petello is a good egg!!!... review \n", | |
"\n", | |
" user_id cool useful funny \n", | |
"0 rLtl8ZkDX5vH5nAx9C3q5Q 2 5 0 \n", | |
"1 0a2KyEL0d3Yb1V6aivbIuQ 0 0 0 \n", | |
"2 0hT2KtfLiobPvh6cDC8JQg 0 1 0 \n", | |
"3 uZetl9T0NcROGOyFfughhg 1 2 0 \n", | |
"4 vYmM4KTsC8ZfQBg-j5MWkw 0 0 0 " | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"yelp.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<class 'pandas.core.frame.DataFrame'>\n", | |
"RangeIndex: 10000 entries, 0 to 9999\n", | |
"Data columns (total 10 columns):\n", | |
"business_id 10000 non-null object\n", | |
"date 10000 non-null object\n", | |
"review_id 10000 non-null object\n", | |
"stars 10000 non-null int64\n", | |
"text 10000 non-null object\n", | |
"type 10000 non-null object\n", | |
"user_id 10000 non-null object\n", | |
"cool 10000 non-null int64\n", | |
"useful 10000 non-null int64\n", | |
"funny 10000 non-null int64\n", | |
"dtypes: int64(4), object(6)\n", | |
"memory usage: 781.3+ KB\n" | |
] | |
} | |
], | |
"source": [ | |
"yelp.info()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>stars</th>\n", | |
" <th>cool</th>\n", | |
" <th>useful</th>\n", | |
" <th>funny</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>count</th>\n", | |
" <td>10000.000000</td>\n", | |
" <td>10000.000000</td>\n", | |
" <td>10000.000000</td>\n", | |
" <td>10000.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>mean</th>\n", | |
" <td>3.777500</td>\n", | |
" <td>0.876800</td>\n", | |
" <td>1.409300</td>\n", | |
" <td>0.701300</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>std</th>\n", | |
" <td>1.214636</td>\n", | |
" <td>2.067861</td>\n", | |
" <td>2.336647</td>\n", | |
" <td>1.907942</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>min</th>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25%</th>\n", | |
" <td>3.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50%</th>\n", | |
" <td>4.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>75%</th>\n", | |
" <td>5.000000</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>2.000000</td>\n", | |
" <td>1.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>max</th>\n", | |
" <td>5.000000</td>\n", | |
" <td>77.000000</td>\n", | |
" <td>76.000000</td>\n", | |
" <td>57.000000</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" stars cool useful funny\n", | |
"count 10000.000000 10000.000000 10000.000000 10000.000000\n", | |
"mean 3.777500 0.876800 1.409300 0.701300\n", | |
"std 1.214636 2.067861 2.336647 1.907942\n", | |
"min 1.000000 0.000000 0.000000 0.000000\n", | |
"25% 3.000000 0.000000 0.000000 0.000000\n", | |
"50% 4.000000 0.000000 1.000000 0.000000\n", | |
"75% 5.000000 1.000000 2.000000 1.000000\n", | |
"max 5.000000 77.000000 76.000000 57.000000" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"yelp.describe()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Create a new column called \"text length\" which is the number of words in the text column.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"yelp['text length'] = yelp['text'].apply(len)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# EDA\n", | |
"\n", | |
"Let's explore the data\n", | |
"\n", | |
"## Imports\n", | |
"\n", | |
"**Import the data visualization libraries**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import matplotlib.pyplot as plt\n", | |
"import seaborn as sns\n", | |
"sns.set_style('white')\n", | |
"%matplotlib inline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Use FacetGrid from the seaborn library to create a grid of 5 histograms of text length based off of the star ratings.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<seaborn.axisgrid.FacetGrid at 0x1b20a14c240>" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAABC0AAADQCAYAAAAjz5vfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAFSVJREFUeJzt3X+wpXddH/D3XVaIaJJCRYWJdDvG\nfqrTDhLQxDYhmy4Q4840iK2itAxoSrGBNpaW7EAw0IGZMAP4o0BwIhhFHZgG0HEygbTbJiwQiIZg\nodCvgkkxlLSizY8iDWxy+8c50etm7+7de797zvec83rN7Ow5z3l+fJ+b5527932fH2vr6+sBAAAA\nGM2ueQ8AAAAA4GiUFgAAAMCQlBYAAADAkJQWAAAAwJCUFgAAAMCQlBYAAADAkJQWC6KqXjqHbT6q\nqq6rqh+c9bZhFmadq6raV1W3VNWHptl67Cy3D7Mwh1ydV1Ufr6qPVdXPznLbMCvz+HfgdLuvqqp3\nz2PbcLLN4fvVc6vq81V10/TP+bPc/iJTWiyOK2a5sar6ziQ3J/m+WW4XZmymuUrytiTPaa09I8kf\nJrlkxtuHWZh1rn4+yfNaa+ckuaCqnjrj7cMszDpXqaqLklw06+3CDM06V2cleUVrbe/0z80z3v7C\n2j3vAfBXVdXfSnJtkq8nOZzkBUlemOTxVfW2JAeS/HKSv5bkW5Jc01q7uqpuSvInSR6X5NIkv7Jx\nHa21L27YxkuT/KMjNv2C1toXNrz/5iT/LMnlffcQZm+gXO1trf2v6evdSf5fv72E2RooV2e31g5X\n1TcnOT3Jn/bdU5idUXJVVWcm+edJXhMFOwtulFwleVqSp1bVZUluTXJ5a+1w151dUkqL8TwryW1J\n/nWS85I8rrX2+qp6WWvtX1TVWUne3Vp7X1U9KZOzIa6eLvubrbX3V9WlR64jyV+EqrX2liRvOdYg\nWmu/nyRV1XfvYD5GydWXkqSqfjjJBUle3XMnYcZGydXhqjonybuTfCaTf2DCopp7rqYF4Fsz+cHu\nu7vvIcze3HM19R+T/FaSO5K8PclLtrAMUVqM6B2ZnN3wgST3JnnlEZ/fneSyqnpukvuSfMOGz9pW\n1rHFJhCWyTC5qqqfmc73g601Z1qwyIbJVWvtY0n2VNXrMvmN2ZXb3CeYtxFy9ewk357kPZn85vlJ\nVXWgtXbVDvYL5mmEXCXJO1tr90zn/+0kP7LdHVo1SovxXJzkUGvttVX145mE40VJ1qaf/5skt0xP\nWbogyf4Nyz50nHUk2XITCMtkiFxV1asyOTXwma21r3bYL5inueeqqtaSfCjJP2yt/Z8k9yc5pcve\nwXzMPVettfcleV+SVNXeJC9RWLDg5p6r6fer/1pVf6+1dleSfZmcucEWKC3G83tJfr2qDmcSkp+Z\nTv9MVf16Ji3f1VX1/Eyu2z1cVY/Z4jpgVc09V1X1bZn89vcTSW6YXnr1ntba1cdcEMY191y11tar\n6o2ZZOqBJF+K6+9ZbHPPFSyhuedq+v3qkiTvq6qvZnI54zXb3qMVs7a+vj7vMQAAAAA8gkeeAgAA\nAENSWgAAAABDUloAAAAAQ5r7jTiraneSM5Lc1Vo7PO/xwDKQK+hPrqAvmYL+5IplNPfSIpNQ3XHw\n4MF5jwNO1NrxZ5kbuWJRyRX0N2quZIpFJlfQ31Fz5fIQAAAAYEhKCwAAAGBISgsAAABgSEoLAAAA\nYEhKCwAAAGBISgsAAABgSEoLAAAAYEi75z0AAAAAZmPPget3tPydV+3vNBLYGmdaAAAAAENSWgAA\nAABDUloAAAAAQ1JaAAAAAENSWgAAAABDUloAAAAAQ1JaAAAAAENSWgAAAABDUloAAAAAQ1JaAAAA\nAENSWgAAAABDUloAAAAAQ1JaAAAAAEPaPe8BAADA0ew5cP2Olr/zqv2dRgLAvDjTAgAAABiS0gIA\nAAAYktICAAAAGJLSAgAAABiS0gIAAAAY0paeHlJVZyd5Q2ttb1WdleR3kvzh9OOrW2vvqaork+xP\ncjjJZa21W6vqzCTXJllP8ukkl7bWHuq9E7CI5Ar6kyvoT66gP7mCrTtuaVFVr0jyT5N8ZTrprCRv\nbq29acM8ZyU5P8nZSb4jyXuTfF+SNye5orV2U1W9PcnFSd7fdQ9gAckV9CdX0J9cQX9yBSdmK2da\nfD7Jc5O8a/r+aUmqqi7OpA28LMm5SW5sra0n+UJV7a6qJ0znvXm63A1Jnh2hgkSu4GSQK+hPrqA/\nuYITcNx7WrTW3pvk6xsm3Zrk37bWnpHkj5JcmeS0JPdumOf+JKcnWZsGbeM0WHlyBf3JFfQnV9Cf\nXMGJ2c6NON/fWrvt4ddJnprkviSnbpjn1CT3JHnoKNOAR5Ir6E+uoD+5gv7kCo5hO6XFB6vq+6ev\n9yW5LclHklxYVbuq6slJdrXWvpzk9qraO533oiSHdjpgWFJyBf3JFfQnV9CfXMExbOnpIUf46SRv\nqaqvJbk7yYtba/dV1aEkt2RShFw6nfflSa6pqkcn+WyS6zqMGZaRXEF/cgX9yRX0J1dwDGvr6+vH\nn+skqqo9Se44ePBgzjjjjLmOBU7Q2rwHsBm5YoHJFfQ3ZK62kqk9B67f0TbuvGr/jpaHY5Ar6O+o\nudrO5SEAAAAAJ53SAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAA\nABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAA\nGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAY\nktICAAAAGJLSAgAAABjS7nkPAAAAToY9B67f0fJ3XrW/00gA2C5nWgAAAABDUloAAAAAQ1JaAAAA\nAENSWgAAAABDUloAAAAAQ1JaAAAAAENSWgAAAABD2r2Vmarq7CRvaK3traozk1ybZD3Jp5Nc2lp7\nqKquTLI/yeEkl7XWbt1s3v67AYtHrqA/uYL+5Ar6kyvYuuOeaVFVr0jyy0lOmU56c5IrWmvnJVlL\ncnFVnZXk/CRnJ3lekrduNm/f4cNikivoT66gP7mC/uQKTsxWLg/5fJLnbnj/tCQ3T1/fkOSZSc5N\ncmNrbb219oUku6vqCZvMC8gVnAxyBf3JFfQnV3ACjltatNbem+TrGyattdbWp6/vT3J6ktOS3Lth\nnoenH21eWHlyBf3JFfQnV9CfXMGJ2c6NODdeM3VqknuS3Dd9feT0o80LPJJcQX9yBf3JFfQnV3AM\n2yktbq+qvdPXFyU5lOQjSS6sql1V9eQku1prX95kXuCR5Ar6kyvoT66gP7mCY9jS00OO8PIk11TV\no5N8Nsl1rbUHq+pQklsyKUIu3WzeDmOGZSRX0J9cQX9yBf0tVK72HLh+R8vfedX+TiNhVaytr68f\nf66TqKr2JLnj4MGDOeOMM+Y6FjhBa/MewGbkigUmV9DfkLnaSqZ2+sPRTvnhimOQq22SK47hqLna\nzuUhAAAAACed0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAY\nktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS\n0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABiS0gIAAAAYktICAAAAGJLS\nAgAAABiS0gIAAAAYktICAAAAGJLSAgAAABjS7nkPAACWwZ4D1+9o+Tuv2t9pJAAAy8OZFgAAAMCQ\nlBYAAADAkJQWAAAAwJCUFgAAAMCQlBYAAADAkLb99JCquj3JvdO3dyT5pSS/kORwkhtba6+tql1J\n3pbkKUkeSHJJa+1zOxsyLC+5gv7kCvqTK+hPruDotlVaVNUpSdJa27th2ieT/EiSP0pyfVWdlWRP\nklNaaz9QVeckeVOSi3c4ZlhKcgX9yRX0J1fQn1zB5rZ7psVTkjy2qm6cruM1SR7TWvt8klTVB5Ps\nS/LEJB9Iktbax6rq6TseMSwvuYL+FiZXew5cv6Pl77xqf6eRwHEtTK5ggcgVbGK797T48yRvTHJh\nkpck+ZXptIfdn+T0JKflL09xSpIHq2rbl6TAkpMr6E+uoD+5gv7kCjax3QP8D5J8rrW2nuQPqure\nJI/f8PmpSe5J8tjp64ftaq0d3uY2YdnJFfQnV9CfXEF/cgWb2O6ZFj+ZyfVTqaonZRKer1TVd1bV\nWiYN4aEkH0nyQ9P5zknyqR2PGJaXXEF/cgX9yRX0J1ewie2eafGOJNdW1YeTrGcSsoeS/EaSR2Vy\nd9uPV9XvJnlWVX00yVqSF3UYMywruYL+5Ar6W5lcudcMM7QyuYITta3SorX2tSQ/cZSPzjlivocy\nuSYLOA65gv7kCvqTK+hPrmBz2708BAAAAOCkUloAAAAAQ1JaAAAAAENSWgAAAABDUloAAAAAQ9ru\nI08BAADghHiUMCdKaQEryjcMAABgdC4PAQAAAIaktAAAAACGpLQAAAAAhqS0AAAAAIbkRpzAtuzk\nRp5u4gkAAGzFQpQWfjgCAACA1ePyEAAAAGBISgsAAABgSEoLAAAAYEhKCwAAAGBIC3Ejzp3Y7k08\n3cATAAAA5suZFgAAAMCQlv5MCwBYBTt5PHjiDEM4GeQSYOecaQEAAAAMSWkBAAAADElpAQAAAAxJ\naQEAAAAMSWkBAAAADMnTQ4CZczd1AABgK5QWAAAALAS//Fo9Lg8BAAAAhuRMi03spMHT3gEAAMDO\nOdMCAAAAGJIzLYCF41pG6E+uAIARKS1OApeWAACwU8pE6M/PaovH5SEAAADAkJxpAawcv7mC/vzm\nCgA4GU56aVFVu5K8LclTkjyQ5JLW2udO9nYX1Xb/0ecffKtFrqA/uYL+5Gq+lInLSa5YNbM40+I5\nSU5prf1AVZ2T5E1JLt7w+aOS5O677958DV/5s5M5vqWw52Xvmvk2P3z5BTPf5kj27du3J8ldrbXD\nc9i8XM3RTvK26rk5noXOlUxt206/h8nVsQ2cK9+rBiaXxyZXq2keP3NttKq5mkVpcW6SDyRJa+1j\nVfX0Iz5/YpI8//nP33QFjzlpQ2Mn9t34unkPYd7uSPI3k9w5h23L1YKSm+Na2FzJ1PzI1XGNmivf\nq5bYCuRSrpi5Vc3VLEqL05Lcu+H9g1W1e0N78rtJzkvypSQPzmA80NNdc9quXLHM5Ar6GzFXMsWi\nkyvo7xG5mkVpcV+SUze837XxdI/W2gNJPjyDccAykSvoT66gv01zJVOwbXLFSpnFI08/kuSHkmR6\nzdWnZrBNWHZyBf3JFfQnV9CfXLFSZnGmxfuTPKuqPppkLcmLZrBNWHZyBf3JFfQnV9CfXLFS1tbX\n1+c9hqNa5kf5VNU3JHlnkj2Z3AvndUk+k+TaJOtJPp3k0tbaQ1V1ZZL9SQ4nuay1dmtVnXm0eWe8\nGztWVd+a5LYkz8pk/67NCu3/PMjV8h9XcjV7crXcx5VMzYdcLfexJVfzIVfLfWwtc65mcXnIdv3F\no3ySHMjkUT7L4p8k+dPW2nlJLkryliRvTnLFdNpakour6qwk5yc5O8nzkrx1uvwj5p3x+Hds+j+W\nX0ry1emkldr/OZKrJT6u5Gpu5GpJjyuZmiu5WtJjS67mSq6W9Nha9lyNXFr8lUf5JDny0XOL7D8k\nefWG94eTPC3JzdP3NyR5ZiZfgxtba+uttS8k2V1VT9hk3kXzxiRvT/I/p+9Xbf/nRa6W+7iSq/mQ\nq+U9rmRqfuRqeY8tuZofuVreY2upczVyaXHUR/nMazA9tdb+b2vt/qo6Ncl1Sa5IstZae/hanfuT\nnJ5Hfg0enn60eRdGVb0wyZ+01j64YfLK7P+cydWSHldyNVdytYTHlUzNnVwt4bElV3MnV0t4bK1C\nrkYuLY756LlFV1XfkeS/JHlXa+03k2y8bujUJPfkkV+Dh6cfbd5F8pOZ3DzopiTfm+TXknzrhs+X\nff/nSa6W97iSq/mRq+U8rmRqvuRqOY8tuZovuVrOY2vpczVyabG0j/Kpqm9LcmOSy1tr75xOvr2q\n9k5fX5TkUCZfgwuraldVPTmT/7F8eZN5F0Zr7RmttfNba3uTfDLJC5LcsCr7P2dytaTHlVzNlVwt\n4XElU3MnV0t4bMnV3MnVEh5bq5CrkU8HWuZH+bwyyeOSvLqqHr726l8l+cWqenSSzya5rrX2YFUd\nSnJLJgXTpdN5X57kmo3zznT0J8cj9mnF9n9W5Gq1jiu5mg25Wp3jSqZmR65W59iSq9mRq9U5tpYq\nV8M+8hQAAABYbSNfHgIAAACsMKUFAAAAMCSlBQAAADAkpQUAAAAwJKUFAAAAMCSlxQCq6pSqumQb\ny/1wVT3piGkvrKqreo+rql5TVS/psV6YBbmC/uQK+pIp6E+ulo/SYgzfnuSEg5XJs4dP6zyWjbY7\nLhiBXEF/cgV9yRT0J1dLZve8B0CS5FVJvqeqfjbJLyR5R5K/Pv3sXya5J8l/TvKMJN+d5LVJ3pjk\ne5P8WlWd21r72pErraqXJfmJJOtJ3t1a+8WqujbJA0n2JHlikhe21j5RVT+V5KVJ/izJ15K8J8nf\n3zCuJLm4qv7xdGyvbq39TtevAvQlV9CfXEFfMgX9ydWScabFGF6f5DOttX+X5JVJDrbWLkjy4iRX\nt9b+OMkrkvxqkp9L8uOttd9O8skkL9gkVN+T5MeSnDv985yqqunH/6O1dmGSf5/kxVX1LUkuzyRI\nz07yTUcZV5J8sbW2L8llSX6661cA+pMr6E+uoC+Zgv7kask402I8fzfJP6iqH5u+f9z079/K5ED/\nT621u7awnr+T5G8kObhhPWdOX98+/fuPMwnTmZkE6M+TpKo+usk6b5v+fXeSx25hDDAKuYL+5Ar6\nkinoT66WgDMtxvBQ/vK/xX9P8nOttb1JfjTJb0ynvzzJjUmeXlXnHGW5I7Uk/y3JBdN1XZvkU9PP\n1o+Y93NJ/nZVfWNV7Ury/Zus/8jlYGRyBf3JFfQlU9CfXC0ZpcUY/neSR1fVGzJp/H60qm5K8oEk\nn66qp2dy/dTlSX4qyTur6vQkH83kuqvHH7nC1trvZ9IEfriqfi/JdyX54tE23lr7cpI3JDk03eY3\nJvn6EeOCRSNX0J9cQV8yBf3J1ZJZW19X8Ky6qtqd5PLW2uun7z+U5IrW2ofmOzJYXHIF/ckV9CVT\n0J9c9eeeFqS1driqvqmqPpHJ3W0/nkkzCGyTXEF/cgV9yRT0J1f9OdMCAAAAGJJ7WgAAAABDUloA\nAAAAQ1JaAAAAAENSWgAAAABDUloAAAAAQ/r/gzPb7yk99UIAAAAASUVORK5CYII=\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x1b20abfee48>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"g = sns.FacetGrid(yelp,col='stars')\n", | |
"g.map(plt.hist,'text length')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Create a boxplot of text length for each star category.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<matplotlib.axes._subplots.AxesSubplot at 0x1b208fdc128>" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEBCAYAAACaHMnBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3XuUnHWd5/F39SWXDkk6pQmJEAyR\nM9/F2V3xmkSSdOdCFFDhsFogEGZADqLxrGAGiYArnhU4jlzOOF4HUewBZuzBC3p2hD7k0glJIMrg\nMgh8WaC5ExLSdghJgKS794+q6uqnqK5UV6fqeaqez+ucPvTv6V+lvv3QVd/63RODg4OIiEi8NYQd\ngIiIhE/JQERElAxERETJQEREUDIQERGgKewAymFm44EPAy8D/SGHIyJSKxqBWcAf3P3N4T+oyWRA\nOhFsCjsIEZEatQi4b/iFWk0GLwPcfvvtzJw5M+xYRERqwvbt2znnnHMg8x46XK0mg36AmTNncvTR\nR4cdi4hIrXlb97oGkEVERMlARESUDEREhAqOGZjZQ8DuTLEH+DHwD8BBoMvdv2lmDcAPgPcBbwIX\nuvuTZjY/v26l4hSRyujt7eX666/nsssuY9q0aWGHE6pauBcVaRmY2QQAd2/PfJ0P/Ag4G1gIzDOz\nDwCnAxPcfQGwBrgh808UqisR1dvbyxVXXMFf/vKXsEMJne5FTkdHB48++igdHR1hhxK6WrgXleom\neh/QYmZdZrbOzBYD4939KXcfBO4BlpF+s78bwN3vBz5kZlNGqCsR1dnZyWOPPcYvfvGLsEMJne5F\nWm9vL93d3QBs2LAh1smxt7eXjRs3AtDd3R3Ze1GpZLAPuB74GHAx8LPMtaw9wFRgCrmuJEhPd5oC\nvFagrkRQb28va9euZXBwkLVr10b2D70aent7WbduHYODg6xbty7W96Kjo4PsWSmDg4OR/kRcaR0d\nHQwMDAAwMDAQ2XtRqWTwBHCbuw+6+xOk3/CTw34+Gegj/aY/OS+e/GvZuhJBnZ2dHDx4EICDBw/G\n+hNxZ2dn4EUf53uR/SSclW0lxNGmTcHNEvLvTVRUKhlcQKb/38zeBbQAe83sPWaWIN1i2ARsBk7J\n1JsP/Ke7vwa8VaCuRNCGDRsCnwA3bNgQbkAh6u7uDiTGOL8BZpPiSGWJnkolg1uAVjO7D/gF6eRw\nIXA7sA14yN0fAH4NvGFmW4CbgEszj7+4QF2JoOnTpxctx0lbWxtNTekJek1NTbS1tYUcUXgaGxuL\nluNk8eLFgXJU/y4qMrXU3d8iPRso3/y8egOk3/jzH39/fl2Jpp07dxYtx0kqlWLdunUANDQ0cOaZ\nZ4YcUXjmzZvHli1bhsrz58f35bxy5Uq6u7sZGBigoaGBlStXhh1SQVp0JmOS/yJfsGBBSJGEL5lM\nsnTpUhKJBEuXLo3sfPJqGDduXNFynCSTyaHXyYIFCyL7d6FkIGOSSCSKluNmxYoVTJw4kY9//ONh\nhxKqBx4I9uzef//9IUUSDdlkGOWkqGQgY5L/It+6dWtIkURDV1cX+/fv5+677w47lFC1tbUNjRM0\nNjZGtp+8Gnp7e4e6zDZv3hzZKcdKBjImetHnaJ1BTiqVCvxdxHn8pFamHCsZyJjoRZ9TKy/6atD4\nSU6tTDlWMpAx0Ys+p1Ze9NWSSqU4/vjjY/0BAWpnyrGSgYyZXvRptfKir5a+vj6eeeYZdu/efejK\ndSyVStHQkH6rjfKUYyWDMml3ypxkMsm1114b61YB1M6Lvlpuuukm9u3bxw033HDoynWsVlrPSgZl\n0u6Ukq9WXvTV8PTTT/P8888D8Pzzz/PMM8+EG1DIaqH1rGRQBs0akZHUwou+Gm666aZAWa2D6Lee\nlQzKoFkjMpJaeNFXQ7ZVMFJZokfJoAyaNSJS3KRJk4qWJXqUDMrQ1tYWGCiM+6wRkXzZD0sjleOm\nFiacKBmUIZVKBbqJ4t4/LJJvyZIlRctxUwsTTpQMytDXFzx4Le7zqEXypVKpoTUXzc3Nsf7AVCsT\nTpQMyqCZEiLFJZNJli9fTiKRYNmyZbEeUK+VCSdKBmXQTAmRQ9M027RamXCiZFCGlpaWomWJr6ef\nfpqzzz479ousQNNss9ra2obO+UgkEpGdcKJkUIY33nijaFniS1swSL4VK1YwODgIwODgYGQPPlIy\nKINO95JCtAWDFNLV1RVoGUT14CMlgzIsXrw4UI5qs0+qSxMLpJDu7u5Ay0BjBnVk5cqVgUVnK1eu\nDDkiiQJNLJBCauU0QCWDMiSTyaH/oe3t7bEfIJO02bNnFy1LPKVSqUDLIKqzq5QMyrRy5Ure+973\nqlUgQy699NJAefXq1SFFIjJ6SgZl0rQ5ydfa2jr0fSKRYOrUqSFGI1HR2dkZGEDWojOROtfZ2TnU\nN9zQ0BDZF71UV3d3N/39/QD09/drAFnqVy3syFgNtfKirxb9XaRpAFlioxZ2ZKwGbW0epL+LNA0g\nSyzUyo6M1aCtzXP0d1F7lAxkTDo7OwNdI3H+FKitzXNqZafOatAAssSC+slztAI5p1Z26qyGWnmN\nKBnImMyfPz9QXrBgQUiRhE8rkHPa2tqGDrdpamqK9fhJrdyLpkr9w2Y2A3gQOAk4CNwKDAKPAKvc\nfcDMvgGcmvn5Je6+zcyOK1S3UnHK2GgH15zZs2cHEkCcVyCnUinWrVsHpAfT4zx+Uiv3oiItAzNr\nBn4M7M9cuhG4yt0XAQngNDP7ANAGzAPOAr4/Ut1KxCiHx7Zt2wLlBx54IKRIwqcVyDnJZJKlS5eS\nSCRYunRprBdnJpNJTjzxRAAWLlwY2XtRqW6i64EfAS9lyh8Esh1lvweWAwuBLncfdPfngCYzmz5C\nXZHImzt37lBrYPbs2cyZMyfcgEK2YsUKJk6cGNn9+6tp+NTSqDrsycDM/hbY6e73DLuccPfsXdgD\nTAWmAMOnW2SvF6orEaXtvIMuvfRSWlpaYt0qyOrq6mL//v2R3b+/Wnp7e9m8eTMA9913X2Sn2Vai\nZXABcJKZbQBOADqAGcN+PhnoA17LfJ9/faDANYkobecdNHfuXO64447Ytwq0ziCns7MzMLMqNlNL\n3X2xu7e5ezvwJ+A84Pdm1p6pcjKwCdgMfMzMGszsGKDB3V8FHipQVyJK23lLIVpnkLNhw4ZAN9GG\nDRvCDWgE1Zpauhr4ppltBcYBd7r7g6Tf6LcCvwRWjVS3SjFKmbSdt+TTOoOc6dOnFy1HRcWmlgJk\nWgdZb+tMdvergavzrj1RqK6I1I62tjbuueceBgcHSSQSsR5L2rlzZ9FyVGjRmYyZNiSTfCtWrAh0\njcR5RlF7e3tgO4r29vZwAxqBkoGMSW9vL2vXrmVwcJC1a9fGeqBQcn77298GynfddVdIkYQvlUoF\nyrFadCbxUSszJaS6Nm0KzvvYuHFjSJFEQyzXGUi81MpMCamu7Eyikcpx0tHRUbQcFUoGMia1MlNC\nqiu79mSkcpzUSispvv+HxkhH+qXVykwJqa6PfOQjgfK8efNCikRKpWRQJs2gScufGRHVmRJSXRMm\nTChajpP8RJi/7XtUKBmUQUvtc1KpVGDaXFRnSkh1bd26NVDesmVLSJGEb/z48UXLUaFkUAYttQ8a\nngxEQGNJw91///2Bcn6ijAolgzJoqX1OZ2dnYDZR3BOjpGksKadWTgNUMihD/tL6OC+119RSKaRW\nVt1WQ/7agqiuNVAyKMOKFSsC5TgvtVd3gBSisaSc/NP/8ruNokLJoAy/+93vAuX8pfdxou4AkeI0\nm6iO5S8aifOYgboDpJDOzs5AOc5jSfkTK6I60ULJQMYklUrR1JTeCb2pqSnW3QGS093dHZhxF+cP\nTJpNVMeOPPLIouU4SSaTLFy4EIBFixbppDMBamcGTTW0tbUFWs9RnXCiZFCGXbt2FS3HTS3syCjV\n9eabbxYtx0mtnO2gZFCGGTNmFC3HSW9v79Dq0s2bN8d6Nbbk1MoMmmro6uoKtAzuvvvukCMqTMmg\nDDt27ChajhOtxg7SBoZp/f39Rctx0t3dHWgZRHX8RMmgDGoZ5Gg1dlBHRwePPvpoZPesl+pra2uj\nsbERgMbGRo0Z1BPNrc9pa2sLzCaK6h96NfT29g4lww0bNsS6daDzDHJSqVSg9RzVGXfx/T80Bppb\nn5NKpYZe6A0NDZH9Q6+Gjo6OQHdAnFsH+TPsZs6cGVIk0VALkyyUDMqgufU5yWSSpUuXkkgkWLp0\naaynltbKiVbV0NvbGyjHecadjr2sY8lkkmXLlpFIJFi2bFms3wAhnRyPP/74WCdF0Lm/w2U/LI1U\njpNa+ZCgZFCmFStWMHHixMjOGa6mZDLJtddeG/ukWCu7U1bD3r17i5YlepQMytTV1cX+/fsjO2e4\nmjSdMk2Dpjn5x1xOnDgxpEjCt3jx4kA5qpMs4vvXOgY69jJI50GnaQuGnDfeeCNQ3r9/f0iRhO+T\nn/xkoPypT30qpEiKUzIoQ2dn59Aimv7+/li/CSox5qibSArp6uoKlKPam6BkUIbu7u5AMojzQisl\nxpxt27YFyvlbMkg85Z/+F9XTAJUMyqDugBwlRilk/PjxgXL+GEKcJJPJouWoUDIoQ35/aH45TpQY\nc7TQKid/l9I4v0a2b99etBwVSgZlUHdAjvrJc7TQSgqplfUnFVkJYmaNwM2AAf3A+UACuBUYBB4B\nVrn7gJl9AzgVOAhc4u7bzOy4QnUrEWs5tCNjTqFTnL785S+HFE24FixYwPr164fKH/3oR0OMRmR0\nDtkyMLNTzez/mNm67FcJ/+4nAdz9ROB/ATdmvq5y90WkE8NpZvYBoA2YB5wFfD/z+LfVHeXvJVUy\nffr0ouU4USspJ7tL50hliZ5Suon+N/D3wBeGfRXl7r8BLsoU3w28AnwQyI4u/h5YDiwEutx90N2f\nA5rMbPoIdSWCdLZDjg50yfnsZz8bKJ977rkhRRK+WhlLKiUZ9Lp7tw9Tyj/s7gfN7OfAPwJ3Agl3\nz35U2gNMBaYAu4c9LHu9UF2JIH0CzDnhhBMC5fe///0hRRK+deuCHQj33ntvSJGE7/LLLw+U16xZ\nE1IkxY04ZmBm2U/2b5nZPwEPku7Dx93/qZR/3N3/xswuBx4Ahq9Hnwz0Aa9lvs+/PlDgWmQ0Nzdz\n4MCBofK4ceNCjCZc+/btK1qOk2effbZoOU5eeumlQPnFF18MKZLwtba2BspTp0bzs22xlsGszNcD\nwIvAzEz5kG0cM1tpZl/LFPeRfnP/o5m1Z66dDGwCNgMfM7MGMzsGaHD3V4GHCtSNjOGJAOCtt94K\nKRKJEr0BSiGdnZ2Bk86iujBzxJaBu38TwMyucvdvZa+b2XUl/Lu/An5mZhuBZuAS4DHgZjMbl/n+\nTnfvN7NNwFbSiWlV5vGr8+uO+jeroIkTJwb2WonzJly6FzktLS2BllFLS0uI0YRrwoQJgbUFcV50\nVmhh5sUXXxxyVG9XrJvoc8CFwPFmdkrmciPpN/evjfQ4AHffC6QK/Oht2/W5+9XA1XnXnihUNyry\nF9Tkl+MkfwOyOG9IpsWIOboXOfPnzw9MOY7qwsxi6wxuA9YCVwDXZK4NAPGdLpKRPfJypHKcqGUg\nUlz+4sP8xYlRMeKYgbu/6e7PkO7Xb8t8LQHOMLOF1QkvmubNmxco52/JECdqJeXUyhRCqa6HH344\nUP7Tn/4UUiTFlbIC+UxgErAF+AgwAThoZv/h7pdWMrioyh8wjvMAcq0sta8GbUchtayUdQbNwBJ3\n/xpwErDH3bOrhmPpwQcfDJT/+Mc/hhSJREn+FMI4HwOq9Se1p5Rk8A7SCYHMf7P7r44vXL3+adsB\nKeSVV14pWo4TtRhzamWMsZRk8H3gYTP7FfAQ8AMzuwKI5nE9VfCud72raDlO8jdjO/HEE0OKJHri\n/CFBH5hyZsyYUbQcFYdMBu5+C/BR4NvAIne/Ffi2u19Z4dgi66KLLgqUozhnuFo+/elPB8qf+cxn\nQopEJJp27txZtBwVpexaegLpzeo+D3zHzH7q7vHdsxnedppXVI+xq4aurq6hZm8ikYjs+a4iYamV\nVlIp3US3Av8B/GLYV6xt3LgxUI7zUY/d3d1Df9yDg4OxvhcihdRKMihlaul2d/9JxSOpITrcJqdW\nVleKhKWhoSEwgB7VmVWlJINnzGwN6cHj7K6lXRWNKuLy/+c2NMT39FAtOsvR3kRSSP5Mqqh+eCzl\nXWw86eMrzwI+m/lvrLW1BbdNam9vDyeQCNCBLjlf+ELw3KdVq1aNUFMkekqZTXQ+cB3QCXyd9OZ1\nsaZkkKMus5wtW7YEyps3bw4pEpHRK2U20ZeAHwLXAv8D+G6lg4q6W265JVC++eabQ4pEomTr1q2B\ncn5ykHgaPz64Pjeq23mX0k10FukziPvc/R+I8TYUWc8//3zRcpzkj5fEefxEpJD8cbSobuddyis3\nWyc7Hyq+I4QZs2fPLlqOk6am4ByE5ubmEWqKSJSVkgzuADYCx5nZvwO/qWxI0XfKKacEyp/4xCdC\niiR8+Tu2xnk2kUgtK2UA+XvARaSPolzj7tdXPKqI6+joCJRvvfXWcAIRETlMih17eR25rqGs95vZ\nWe5+RWXDijYd9Sgi9abYorPHqxaF1Kxx48YFuorGjRsXYjQiUq4Rk4G7/7yagUht0qlvIvVB8wBF\nRKSkRWdNeeXWkerGRf7B5/llEZFaU2wAeSYwBegws5VAgnTy6AA+Up3womn37t1FyxJPiUQisD1x\nVI83FCmkWMtgPvBj0pvU/Tjz9X3gnirEFWnt7e2BA13ivDeR5NTKvvUihRQbQP4N8Bsz+5S7/zZ7\n3cwmVyWyCEulUqxdu5YDBw7Q1NTEmWeeGXZIIiJjUsp5BqvN7A/u/rKZzQNuAf5rheOqqvXr13Pv\nvfeO6jHjxo3jwIEDHHHEEVx//ejW4S1fvpwlS5aM6jHVUs69yHfllaUfjx3leyESJ6Ukg28C/25m\n3cCHgE8fon4sDAwM0NDQwPTp08MOJVTNzc0cOHAgUBaR2lNKMvgzsAM4ifR4wVMVjSgES5YsGfWn\n0+yn32uuuaYSIYVmtPfi6aef5itf+cpQ+Tvf+Q5z5sypQGQiUkmlrDPYBPzA3f8aeAnYeoj6EiNz\n584dag3MnDlTiUCkRpWSDJa6+10AmU3qPl/ZkKTWzJ49m4aGBtasWRN2KCJSplK6iaaa2b8ArcDt\nwCPFKptZM/BTYA7p85O/BTwK3Ep647tHgFXuPmBm3wBOBQ4Cl7j7NjM7rlDdUf9mUjUTJ07k+OOP\nV6tApIaV0jL4LnA+8CrpmURXH6L+ucAud18EnAx8D7gRuCpzLQGcZmYfANpIn5x2Fuk1DBSqO5pf\nSERERq+kvYnc/Ulg0N13AnsOUf3fgK8PKx8EPgh0Z8q/J32M5kKgy90H3f05oMnMpo9QV0REKqiU\nbqJeM/s8MMnMzgL6ilV299dhaHHancBVwPXunl2OuQeYSnqri13DHpq9nihQV0REKqiUlsHngGNJ\ndxN9CLjgUA8ws9nAeuCf3f0OYHif/2TSCeW1zPf51wvVFRGRCiolGfxPd1/j7qe6+98BXy1W2cyO\nBLqAy939p5nLD5lZe+b7k0lPV90MfMzMGszsGKDB3V8doa6IiFRQsV1LPwdcCBxvZtkT4BuAccDX\nivybVwDTgK+bWXbs4MvAd81sHPAYcKe795vZJtLrFhqAVZm6q4Gbh9ct6zcTGSNtzZGje1H/io0Z\n3AasJf3mnl1mO0B6NfKI3P3LpN/887UVqHs1ebOT3P2JQnVFom7SpEns3bs3UBapFcV2LX0TeAa4\nqGrRiETIaLfm6O3t5YILckNq3/ve95g2bVolQqu60d6L008//W3X6m3rlnqjYy9FDpNkMjnUGvjw\nhz9cN4mgHKtXrw6UL7vsspAikVIpGYgcRkcddRQtLS188YtfDDuUUC1atChQPvHEE0OKREqlZCBy\nGDU3N3PsscfGulWQddRRRwFqFdSKUhadiYiMWmtrK62trWoV1Ai1DERERC0DEZHRqNc1F2oZiIiI\nWgYiIqNRr2su1DIQEamgM844I1BOpVIhRVKckoGISAWdd955gfLZZ58dUiTFKRmIiFRYMpkEotsq\nAI0ZiIhU3KxZs5g1a1ZkWwWgloGIiKBkICIiKBmIiAhKBiIigpKBiIigZCAiIigZiIgISgYiIoKS\ngYiIoGQgIiIoGYiICEoGIiKCkoGIiKBkICIiKBmIiAhKBiIigpKBiIigk87q1k9+8hN6enqq8lzZ\n57nyyiur8nzHHnssF154YVWeSyQu6i4ZVOtNMOpvgD09PTzy5JP0z5hRwajSEhMmAPB/X3ut4s/V\nuGNHxZ9DJI4qlgzMbB7wbXdvN7PjgFuBQeARYJW7D5jZN4BTgYPAJe6+baS6pT5vT08Pjzz+FAMT\njzq8v1CeRP8RADz87BsVfR6Ahv0vlvW4/hkz2HfOOYc5mnC13H572CGI1KWKJAMz+yqwEtibuXQj\ncJW7bzCzHwGnmdmzQBswD5gN/BL4cKG6wK9H8/wDE4/izf+y6vD8MhEw/vHvhx2CiNS5Sg0gPwWc\nMaz8QaA78/3vgeXAQqDL3Qfd/Tmgycymj1BXREQqqCItA3f/pZnNGXYp4e6Dme/3AFOBKcCuYXWy\n1wvVFSmbBtNFDq1aA8jD+/wnA33Aa5nv868XqitStp6eHvzp/8cRx7yj8k82pRmAFw/2VvypXn9u\n16Er5VFilJFUKxk8ZGbt7r4BOBlYDzwJ/L2ZXQ8cDTS4+6tmVqiuyJgcccw7eP/XTgs7jMPqoevu\nGvVjenp6ePKpxzly5qQKRBQ0oSX9uW7P3ucr/lyvbN976EpSVLWSwWrgZjMbBzwG3Onu/Wa2CdhK\neuxi1Uh1qxSjSCwcOXMS513w38MO47Dq+OnDYYdQ8yqWDNz9GWB+5vsnSM8cyq9zNXB13rWCdUVE\npHK0HYWIiNTfCmQRkVJoMD1IyUBEYqmnp4enH/8zR7dUvoNk8kB6tvxbzz1W8ed6YV/JGzYEKBmI\nSGwd3dLA6r+eEHYYh9UNfy5vixyNGYiIiJKBiIgoGYiICEoGIiKCBpDrVl9fH42vvlp3+/837thB\n30B5syVEZGRqGYiISP21DPr6+mjYt6uuDoRp2PcifX2j23GztbWVZxsa6vKks9YpU0b1mL6+Pvb0\n7iprY7co2/PcLvqS+jwnh4f+kkREpP5aBq2trTy3e0LdHXvZ2lpfC2OqqbW1lb1HDNTlFtatTa2j\nekxfXx+vvrq37nb5fOXlvfS/U0efjIVaBiIiUn8tAxEZWWtrK43Ne+ryPIPJk0bXSpIgJQMRiaW+\nvj569w2UvZdPVL2wb4Bk3+i7zNRNJCIiahmISDy1trbS8trLdblr6bjW0XeZqWUgIiJqGdSzxh07\nqrIdRWLvXgAGJ02q+HM17tgBo1x0JiKHpmRQp4499tiqPVfPrl3p55w1q/JPNmVKWb/b689VZwXy\nW7v3ATBuakvFn+v153bB3GTFn0fioS6TQcP+Fyu+HUXiwB4ABpsnV/R5IP37wHtG9ZjRnn86Ftlz\nXa+55pqqPedoVDUxvrYbgKPeUYU36bnJsn63V7ZXZ9HZ66+/BcARR4yr+HO9sn0vk0f3EpE8dZcM\nqvXC7+l5Of18755ehWd7T1Xf0OqNEmNONf+OXt2RPgR+1pGzK/5ck99T3d+tHtVdMqjWCz/qL3qR\nQpQYZSR1lwxEREr1QpUWnb12YBCAKc2Jij/XC/sGmFvG45QMRCSWqtmttKcn3WX2zmMq/5xzKe93\nUzIQkVhSl1mQFp2JiIiSgYiIKBmIiAhKBiIiQkQHkM2sAfgB8D7gTeBCd38y3KhEROpXVFsGpwMT\n3H0BsAa4IeR4RETqWmJwcDDsGN7GzG4Etrn7v2bKL7r7UcN+PgfoWbt2LUcfffSYn2/9+vXce++9\no3pMT2becDnzeZcvX86SJUtG/bhq0L3I0b3I0b3IqeV78cILL7Bs2TKAY939meE/i2Q3ETAF2D2s\n3G9mTe5+MKyA8k2bNi3sECJD9yJH9yJH9yKnFu5FlFsG97t7Z6b8grsfPezncziMLQMRkTgo1jKI\n6pjBZuAUADObD/xnuOGIiNS3qHYT/Ro4ycy2AAng/JDjERGpa5FMBu4+AFwcdhwiInER1W4iERGp\nIiUDERFRMhARESUDEREhogPIJWgE2L59e9hxiIjUjGHvmY35P6vVZDAL4Jxzzgk7DhGRWjQLeGr4\nhVpNBn8AFgEvA/0hxyIiUisaSSeCP+T/IJLbUYiISHVpAFlERGq2mygSzGwe8G13bw87lrCYWTPw\nU2AOMB74lrv/NtSgQmJmjcDNgJHuvjzf3Z8q/qj6ZWYzgAeBk9z98bDjCZOZPURuJ+Yed4/cFjtK\nBmUys68CK4G9YccSsnOBXe6+0szeATwExDIZAJ8EcPcTzawduBE4LdSIQpL5kPBjYH/YsYTNzCYA\nRP1Do7qJyvcUcEbYQUTAvwFfH1aOzJkT1ebuvwEuyhTfDbwSYjhhux74EfBS2IFEwPuAFjPrMrN1\nmZ2YI0fJoEzu/kvgQNhxhM3dX3f3PWY2GbgTuCrsmMLk7gfN7OfAP5K+H7FjZn8L7HT3e8KOJSL2\nkU6OHyO9AeftZha5XhklAxkzM5sNrAf+2d3vCDuesLn73wB/BdxsZpPCjicEF5Degn4DcALQYWYz\nww0pVE8At7n7oLs/Aewis1YqSiKXnaS2mNmRQBfwJXdfG3Y8YTKzlcDR7n4d6U+DA8RwHYy7L85+\nn0kIF7t7nLcLuAD4b8AXzexdpI/1fTnckN5OyUDG6gpgGvB1M8uOHZzs7nEcOPwV8DMz2wg0A5e4\n+xshxyThuwW41czuAwaBC6J0nnuWFp2JiIjGDERERMlARERQMhAREZQMREQEJQMREUHJQGRMzOxL\nYccgcjgoGYiMTay335D6oXUGIiUys78CbiW9J9VBYB3wDeAnwJrMf1uBdwI3u/sPMytwd5JemLcK\n+Nmwx5/n7i9W97cQKUwtA5HSnUR6f/7lwDXAXUCvu38ROA74V3dfAXwC+Mqwx93h7sszjxv++GlV\njF2kKCUDkdLdArwK3A18ieACVXOhAAAAkElEQVR23duB083sNtJdR83DfuYlPF4kVEoGIqU7Ddjk\n7stIn+NwOZDI/OzvgK3ufm7mZ4lhjxso8niRSNBGdSKl+yNwm5kdJP0GfykwJ9MauAX4oZmdQ3qL\n4oNmNr6Ex4tEggaQRURE3UQiIqJkICIiKBmIiAhKBiIigpKBiIigZCAiIigZiIgISgYiIgL8f9SZ\nkjJL/PI5AAAAAElFTkSuQmCC\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x1b20ab90080>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"sns.boxplot(x='stars',y='text length',data=yelp,palette='rainbow')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Create a countplot of the number of occurrences for each type of star rating.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<matplotlib.axes._subplots.AxesSubplot at 0x1b20b46b1d0>" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEBCAYAAACaHMnBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEY5JREFUeJzt3XuwXWV9xvHvOQmX2glILRUcQEYZ\nnn/aUoMaVJAot0Zq49ipgwK1MA5jDS2peJ/QpB16sQMZb4g2ArFcRmogtrYDZKYKBIQGaTqVkf4i\nQaVYooCFBKjoSU7/2CvD5nACG8ja61y+n5nMWetd73vyO2eS/ex3vWutPTI+Po4kaXYb7boASVL3\nDANJkmEgSTIMJEkYBpIkYG7XBbwQSfYCXgc8AGzvuBxJmi7mAAcCd1TVk/0HpmUY0AuC9V0XIUnT\n1DHALf0N0zUMHgC48sorOeCAA7quRZKmhS1btnDqqadC8xrab7qGwXaAAw44gIMOOqjrWiRpunnG\n6XUXkCVJhoEkyTCQJGEYSJIwDCRJGAaSJFq6tDTJHGAVEHqXMJ0B7At8Hfhe0+3iqro6yXLgZGAM\nWFpVG5IcBqwGxoG7gCVVtaONWiVJ7d1n8HaAqnpTkoXASnpBsLKqLtzZKcl84FhgAXAwcA29u4tX\nAsuq6sYkXwAWA2tbqlWaVb79nxd0XUIrXvubH+q6hGmtlTCoqq8l+edm95XAj4EjgSRZTG92sBQ4\nGlhXVePAfUnmJtm/6XtTM/464EQMA0lqTWtrBlU1luTLwGeBNcAG4MNV9WbgXmA5sA/waN+wbfRO\nJ400AdHfJklqSasLyFX1XuBweusH66rqzubQWuA1wFZgXt+QecAjwI5J2iRJLWklDJKcnuTjze4T\n9F7cr03y+qbtOOBO4FbgpCSjSQ4BRqvqIWBjs9YAsAifUCpJrWprAfla4LIkNwN70Fsf+G/gc0l+\nDmwBzqqqrUnWA7fRC6YlzfhzgVVJ9gTupneaSZLUkrYWkB8H3jXJoTdO0ncFsGJC2yZ6VxlJkobA\nm84kSYaBJMkwkCRhGEiSmL4feylJL9qWv3lP1yW04oCPXfW8xzgzkCQZBpIkw0CShGEgScIwkCRh\nGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkWvpwmyRzgFVAgO3AGcAI\nsBoYB+4CllTVjiTLgZOBMWBpVW1IcthkfduoVZLU3szg7QBV9Sbgz4CVzZ9lVXUMvWBYnGQ+cCyw\nADgFuKgZ/4y+LdUpSaKlMKiqrwFnNbuvBH4MHAnc1LRdBxwPHA2sq6rxqroPmJtk/130lSS1pLU1\ng6oaS/Jl4LPAGmCkqsabw9uAfYF9gEf7hu1sn6yvJKklrS4gV9V7gcPprR/8Ut+hecAjwNZme2L7\njknaJEktaSUMkpye5OPN7hP0Xty/nWRh07YIWA/cCpyUZDTJIcBoVT0EbJykrySpJa1cTQRcC1yW\n5GZgD2ApcDewKsmezfaaqtqeZD1wG71gWtKMP3di35bqlCTRUhhU1ePAuyY5dOwkfVcAKya0bZqs\nrySpHd50JkkyDCRJhoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQ\nJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgSQLm7u5vmGQP4FLgUGAv4HzgfuDrwPeabhdX1dVJ\nlgMnA2PA0qrakOQwYDUwDtwFLKmqHbu7TknSU9qYGZwGPFxVxwCLgM8B84GVVbWw+XN1kvnAscAC\n4BTgomb8SmBZM34EWNxCjZKkPrt9ZgB8FVjTtz8GHAkkyWJ6s4OlwNHAuqoaB+5LMjfJ/k3fm5qx\n1wEnAmtbqFOS1NjtM4OqeqyqtiWZRy8UlgEbgA9X1ZuBe4HlwD7Ao31DtwH7AiNNQPS3SZJa1MoC\ncpKDgW8Cl1fVVcDaqrqzObwWeA2wFZjXN2we8AiwY5I2SVKLdnsYJHk5sA74aFVd2jTfkOT1zfZx\nwJ3ArcBJSUaTHAKMVtVDwMYkC5u+i4D1u7tGSdLTtbFm8AlgP+C8JOc1bR8EPpXk58AW4Kyq2ppk\nPXAbvVBa0vQ9F1iVZE/gbp6+/iBJasFuD4OqOgc4Z5JDb5yk7wpgxYS2TfSuMpIkDYk3nUmSDANJ\nkmEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaB\nJAnDQJKEYSBJwjCQJGEYSJIwDCRJwNzd/Q2T7AFcChwK7AWcD3wXWA2MA3cBS6pqR5LlwMnAGLC0\nqjYkOWyyvru7TknSUwaaGSR534T9P3mW7qcBD1fVMcAi4HPASmBZ0zYCLE4yHzgWWACcAlzUjH9G\n38F/HEnSC/GsM4Mk7wZ+F3hLkrc2zXOAXwc+s4thXwXW9O2PAUcCNzX71wEnAgWsq6px4L4kc5Ps\nv4u+a5/PDyVJen6e6zTR9cADwMuALzZtO4DNuxpQVY8BJJlHLxSWARc0L/oA24B9gX2Ah/uG7mwf\nmaSvJKlFz3qaqKr+t6purKoTgbuB7wM/5LlnFAcD3wQur6qr6AXITvOAR4CtzfbE9sn6SpJaNOia\nwUXABuArwNXN1131fTmwDvhoVV3aNG9MsrDZXgSsB24FTkoymuQQYLSqHtpFX0lSiwa9mmgB8KoB\nr+r5BLAfcF6S85q2c4DPJNmT3gxjTVVtT7IeuI1eKC1p+p4LrOrvO2CNkqQXaNAwuAfYG3jiuTpW\n1Tn0XvwnOnaSviuAFRPaNk3WV5LUnkHD4BDgh0nuafbHq+qNLdUkSRqyQcPg3a1WIUnq1KBh8N5J\n2v5idxYiSerOoGHw4+brCDAfn2kkSTPKQGFQVV/s309yXTvlSJK6MFAYJDm8b/dAegvKkqQZYtDT\nRP0zg58BH2qhFklSRwY9TfSWJC8DXg3c29wpLEmaIQZ9HMXvA9+id3fx7UlOa7UqSdJQDXpV0AeB\nI6vqHcBrmPwOY0nSNDVoGOzY+WjqqtpGb91AkjRDDLqAvDnJhcDNwDE8y+cZSJKmn0FnBn8H/BQ4\nATiD3kdZSpJmiEHDYCWwtqrOBl7X7EuSZohBw2Csqr4LUFX38vRPI5MkTXODrhn8MMlf0fsgmtcD\nP2qvJEnSsA06MzgD+AnwNuBB4MzWKpIkDd2gdyD/DPhUy7VIkjrio6glSYaBJMkwkCQx+NVEz1uS\nBcAnq2phkvnA14HvNYcvrqqrkywHTgbGgKVVtSHJYcBqYBy4C1hSVV7KKkktaiUMknwEOB14vGma\nD6ysqgv7+swHjgUWAAcD1/DUDW3LqurGJF8AFgNr26hTktTT1sxgM/BO4PJm/0ggSRbTmx0sBY4G\n1lXVOHBfkrlJ9m/63tSMuw44EcNAklrVyppBVV0D/KKvaQPw4ap6M3AvsBzYB3i0r882YF9gpAmI\n/jZJUouGtYC8tqru3LlN7zMRtgLz+vrMAx7h6Y+62NkmSWpRawvIE9yQ5I+ragNwHHAncCvwt0ku\nAA4CRqvqoSQbkyysqhuBRcA3h1SjZrBPP3J91yW04pyX/nbXJWiGGFYY/BHwuSQ/B7YAZ1XV1iTr\n6T3vaBRY0vQ9F1iVZE/gbmDNkGqUpFmrtTCoqh8ARzXb/w68cZI+K4AVE9o20bvKSJI0JN50Jkky\nDCRJhoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJ\nGAaSJAwDSRKGgSQJw0CShGEgSQLmtvWNkywAPllVC5McBqwGxoG7gCVVtSPJcuBkYAxYWlUbdtW3\nrTolSS3NDJJ8BPgSsHfTtBJYVlXHACPA4iTzgWOBBcApwEW76ttGjZKkp7R1mmgz8M6+/SOBm5rt\n64DjgaOBdVU1XlX3AXOT7L+LvpKkFrUSBlV1DfCLvqaRqhpvtrcB+wL7AI/29dnZPllfSVKLhrWA\n3H/Ofx7wCLC12Z7YPllfSVKLhhUGG5MsbLYXAeuBW4GTkowmOQQYraqHdtFXktSi1q4mmuBcYFWS\nPYG7gTVVtT3JeuA2eqG0ZFd9h1SjJM1arYVBVf0AOKrZ3kTvyqGJfVYAKya0TdpXktSeYc0M1IG3\n3n571yW04htHHdV1CdKM4x3IkiTDQJJkGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEliht6BfOL5\n93Zdwm63btmrui5B0gzmzECSZBhIkgwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kS\nQ342UZKNwKPN7veBLwKfBsaAdVX150lGgc8DRwBPAu+rqnuGWackzTZDC4MkewNU1cK+tv8Afg+4\nF/iXJPOBQ4G9q+oNSY4CLgQWD6tOSZqNhjkzOAJ4SZJ1zd+7AtirqjYDJLkBOA44ELgeoKpuT/La\nIdYoSbPSMNcMngAuAE4C3g9c1rTttA3YF9iHp04lAWxPMiMftS1JU8UwX2Q3AfdU1TiwKcmjwK/0\nHZ8HPAK8pNneabSqxoZXpiTNPsOcGZxJ7/w/SV5B70X/8SSvTjJCb8awHrgVeFvT7yjgO0OsUZJm\npWHODC4BVie5BRinFw47gCuBOfSuJvq3JHcAJyT5FjACnDHEGiVpVhpaGFTVz4H3THLoqAn9dtBb\nU5AkDYk3nUmSDANJkmEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAM\nJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJKAuV0XMJkko8DngSOAJ4H3VdU93VYlSTPXVJ0Z\nvAPYu6reAHwMuLDjeiRpRpuSMwPgaOB6gKq6PclrJxyfA7Bly5ZJB4899uNWi+vC/ffv+bzHjD34\nYAuVdO/+++9/3mO2bn24hUq6d/9jz/938eBPtrVQSfdeyL+LB7c92UIl3Rvbxe+i7zVzzsRjUzUM\n9gEe7dvfnmRuVY01+wcCnHrqqUMvrCvHre26gqnjuK4LmEKu6LqAKeUrXRcwdax5zv8lBwKb+xum\nahhsBeb17Y/2BQHAHcAxwAPA9mEWJknT2Bx6QXDHxANTNQxuBd4O/EOSo4Dv9B+sqieBW7ooTJKm\nuc2TNU7VMFgLnJDkW8AIcEbH9UjSjDYyPj7edQ3TVpIFwCeramHXtXQlyR7ApcChwF7A+VX1T50W\n1ZEkc4BVQOidvjyjqiZ9FzYbJPk14E7ghKr6r67r6VKSjTy1Dvr9qppyb3Cn6sxgykvyEeB04PGu\na+nYacDDVXV6kpcBG4FZGQb0Tm1SVW9KshBYCSzutKKONG8Svgj8X9e1dC3J3gBT/U3jVL3PYDrY\nDLyz6yKmgK8C5/Xtj+2q40xXVV8Dzmp2XwnMvGucB3cB8AXgf7ouZAo4AnhJknVJvtGsg045hsEL\nVFXXAL/ouo6uVdVjVbUtyTxgDbCs65q6VFVjSb4MfJbe72PWSfKHwINVdUPXtUwRT9ALx5OA9wNX\nJplyZ2UMA71oSQ4GvglcXlVXdV1P16rqvcDhwKokv9x1PR04k94FIDcCvwX8fZIDui2pU5uAK6pq\nvKo2AQ/T3Cs1lUy5dNL0kuTlwDrg7Kr6167r6VKS04GDquqv6b0b3MEsvA+mqt68c7sJhPdX1eSP\nC5gdzgR+A/hAklfQu6n2gW5LeibDQC/WJ4D9gPOS7Fw7WFRVs3Hh8FrgsiQ3A3sAS6vqZx3XpO5d\nAqxOcgswDpw54SbaKcFLSyVJrhlIkgwDSRKGgSQJw0CShGEgScIwkF6UJGd3XYO0OxgG0oszqx+/\noZnD+wykASU5HFhN75lUY8A3gOXAl4CPNV9fCvwqsKqqLm7uwH2Q3o15S4DL+sb/QVX9aLg/hTQ5\nZwbS4E6g93z+44G/BP4R+GlVfQA4DPhKVZ0I/A7wwb5xV1XV8c24/vH7DbF26VkZBtLgLgEeAq4H\nzubpj+veArwjyRX0Th3t0XesBhgvdcowkAa3GFhfVcfR+xyHj9L7WFaADwG3VdVpzbGRvnE7nmW8\nNCX4oDppcN8GrkgyRu8F/k+BQ5vZwCXAxUlOpfeI4rEkew0wXpoSXECWJHmaSJJkGEiSMAwkSRgG\nkiQMA0kShoEkCcNAkoRhIEkC/h+PS/Wt2pIedgAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x1b2079287b8>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"sns.countplot(x='stars',data=yelp,palette='rainbow')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**groupby to get the mean values of the numerical columns**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>cool</th>\n", | |
" <th>useful</th>\n", | |
" <th>funny</th>\n", | |
" <th>text length</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>stars</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>0.576769</td>\n", | |
" <td>1.604806</td>\n", | |
" <td>1.056075</td>\n", | |
" <td>826.515354</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>0.719525</td>\n", | |
" <td>1.563107</td>\n", | |
" <td>0.875944</td>\n", | |
" <td>842.256742</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>0.788501</td>\n", | |
" <td>1.306639</td>\n", | |
" <td>0.694730</td>\n", | |
" <td>758.498289</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>0.954623</td>\n", | |
" <td>1.395916</td>\n", | |
" <td>0.670448</td>\n", | |
" <td>712.923142</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>0.944261</td>\n", | |
" <td>1.381780</td>\n", | |
" <td>0.608631</td>\n", | |
" <td>624.999101</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" cool useful funny text length\n", | |
"stars \n", | |
"1 0.576769 1.604806 1.056075 826.515354\n", | |
"2 0.719525 1.563107 0.875944 842.256742\n", | |
"3 0.788501 1.306639 0.694730 758.498289\n", | |
"4 0.954623 1.395916 0.670448 712.923142\n", | |
"5 0.944261 1.381780 0.608631 624.999101" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"stars = yelp.groupby('stars').mean()\n", | |
"stars" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**corr() method on that groupby dataframe to produce this dataframe:**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>cool</th>\n", | |
" <th>useful</th>\n", | |
" <th>funny</th>\n", | |
" <th>text length</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>cool</th>\n", | |
" <td>1.000000</td>\n", | |
" <td>-0.743329</td>\n", | |
" <td>-0.944939</td>\n", | |
" <td>-0.857664</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>useful</th>\n", | |
" <td>-0.743329</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.894506</td>\n", | |
" <td>0.699881</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>funny</th>\n", | |
" <td>-0.944939</td>\n", | |
" <td>0.894506</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.843461</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>text length</th>\n", | |
" <td>-0.857664</td>\n", | |
" <td>0.699881</td>\n", | |
" <td>0.843461</td>\n", | |
" <td>1.000000</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" cool useful funny text length\n", | |
"cool 1.000000 -0.743329 -0.944939 -0.857664\n", | |
"useful -0.743329 1.000000 0.894506 0.699881\n", | |
"funny -0.944939 0.894506 1.000000 0.843461\n", | |
"text length -0.857664 0.699881 0.843461 1.000000" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"stars.corr()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**A heatmap based on that .corr() dataframe:**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<matplotlib.axes._subplots.AxesSubplot at 0x1b20b4a4940>" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAVkAAAD3CAYAAAC3kyfxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3Xl8VOX1+PHPzCRkJ+xCQHY4IJsL\n4gZVa7F1qXUX3BdE3EVwr3Wp0NbignXHBZSvxe+3rfrtl5bqzwWpLFUUAwoHArLvEELInpn5/XGH\nLGy5JLkzmfG8X695JXeeO3fOTJIzT87z3Of6wuEwxhhjvOGPdQDGGJPILMkaY4yHLMkaY4yHLMka\nY4yHLMkaY4yHLMkaY4yHkhr7gDOTxeaEAe+M/yTWITQZa5bkxTqEJiOrdctYh9Ck/HPqQF9DHn84\n+eacCm3Qc9VXoydZY4yJFl9yTPLmYbEka4yJW/4kS7LGGOMZX3LTH1ayJGuMiVuBNEuyxhjjGSsX\nGGOMh2zgyxhjPGQ9WWOM8ZAvYEnWGGM847cka4wx3vEnB2IdQp0syRpj4pb1ZI0xxkM+vyVZY4zx\njPVkjTHGQza7wBhjPORPsoEvY4zxjJULjDHGQzbwZYwxHvL5bRUuY4zxTEN6siLiB14EBgFlwChV\nzavRPh4YCYSAiar6Xn2ep+l/DBhjzEH4Az7XtwM4H0hV1ZOA+4Gn9jaISAvgDuAk4Ezg2XrHWN8H\nGmNMrPmTAq5vBzAUmAWgqvOBwTXaioA1QEbkFqp3jPV9oDHGxJrP73N9O4DmQEGN7aCI1CyhrgO+\nB74GnqtvjAlXk20xZCB9Jo5n/s+ujnUoUXFMn1QuOCOLUAhmf1XEp18W12q/bURLsrOcT/G2LQPk\nrS3n+Rn5ADRL9vHomDbM+NducpeXRT32xnbK8a25dmRngkGY+dEm/v7h5lrtvXtkMv6WXlRUhFmx\nag+Tp+QRjlxQOiXFz8tPHsPL01ax4Ov8GETfuE44OovLzzuCYCjMh3PymTV7Z6327p1Tuf2ajgSD\nsGFzGc++uZ5wGAYPyOKK89sBkLe6hBfe3hiL8F1r4OyC3UBWjW2/qlZGvj8L6AB0i2z/S0S+UNX/\nHO6TJFRPtvu4UQx45Qn8qSmxDiUqAn648txsfv/6dn776jZOH5JBdmbtH+nzM/KZMGU7z7y9g+KS\nENNnVn9wX3teNq4vWt/EBQI+bh/Vg7sfXsxtDyzivF90oFWL5Fr73Htrb56bspJb719EUXElw09t\nV9V295hehMOJ8W4EAjB6ZA4PTfqBe3+3irNObUXL7Nr9qSt+dQTvfLCV8RNXkpzsY8igLNJS/dxw\nWQceeWY1Y3+7ki3bK6o+oJuqBvZkvwDOBhCRE4HFNdrygRKgTFVLgV1Ai/rEmFBJtnjVWhZecnus\nw4ianHZJbNlRSXFpmGAQlq8uR7o2O+C+Fw9vzr/mFbGr0CktnT0skxVry1m7qSKaIXum65HpbNhU\nQmFRJZWVYXK/382gftm19mnbJoUly3YDsHjpbgYe5bSPvKATS5YWkLe6KOpxe+HIDqls3FrOnuIg\nlcEw360ool/vjFr7rFxTQmaGk0DTUgNUBsP07ZnO6vWl3DiiA398oAe7dldSUBiMxUtwzef3u74d\nwHtAqYjMBZ4BxorI3SJynqrOAb4E5ovIPGA58FF9YjxkuSBy8H0/3n1AWFVPrs8Temnzex+S1qVj\nrMOImrQUPyWl1fX4krIQ6an7/zI1z/DTr0cKb/+f04vt1yOF9q2TeOP9XfTucuCkHG8y0gPsKa6s\n2i4uCZKRUfvXe+PmEo7un82iJQWcMqQVqakBjhvYgk45afzxhRUMOCp738PGpYw0P0XF1cmxpDRE\nRlrtHumGLeXcelUOI89rR1FxiNxlRQwdnM3Avhnc9psVlJSGmPRgD5bmFbFhS3m0X4JrgQZcElxV\nQ8CYfe5eVqP9EeCRej9BRF012RENfQLT+C4ZnkXvril0bp/MynXVfwBpKX6KS/fvmQ4ZkMbcRcVV\n9cfTBqfTpkWAh25sQ07bJLrmNKOgMJ81cdirvfHKrgw8KpseXTP4fnlh1f3paQH27Kmste/Eycpd\nN/bk8gvDLFtRSHlFmHPPbM8RbVP508RBdO6UTu/umezIX0beD/HXq736wiPo1zuDbp1SWbaqujaf\nllo76QKMuSKH8RNXsnZjGeee0ZobR3RgwaLdrPihhPwC531brEV075zWpJNs3J+MoKprAESkE053\n+iicbvNY70MzB/M/HxUChQT88OTYI8hI81FaHqZPt2bMnFO43/79e6Tw/qfV97/wbvXAzk0Xt2Be\nbklcJliAKdNXA05NdvqLg8nKTKKkNMjR/bL589/W1dr35MGtmficsmNnOXeN7sn8hTuZv7B6QOjB\nu4SPP98alwkW4K2/bQGcmuwrE4TMjAClpSH6SwZ//ee2WvsWFlVSHPkvaGd+BUf1TCdvdQldOqbS\nPDPAnuIgfXqkM2v2jqi/jsORSKfVTgFeAj4HTgNeB87wKCbjUjAE02cWcN/1bfD7YPZXxeTvDtGx\nXRLDT8pg6gdOeaBD2yS27qys42jxLRgM8/xrq3j68QH4fT5mfrSZ7TvL6XpkOhedm8NTL+WxfmMJ\nkx4ZQGlZkG8W76qVYBNJMAhTZmxkwrhu+Pzw4Zx8duyqpHNOCr88ozUvvL2RyW+s5/6bOxMKhqmo\nDDN56noKCoNM/csmnhjvDKjP+U8BazY07Vkn8ZBkfW5GVEXkU1U9vcb256r6kwPtOzNZEmOItoHe\nGf9JrENoMtYsyat7px+JrNYtYx1Ck/LPqQMblCXXjrnQdb7p/PLfYpKR3RY0kkRkAEDkqyVSY0zM\nNXAKV1S4LRfcAbwhIh2AjcBo70Iyxhh3fIGmPY8XXCZZVf1GRH4B9ABWqep2b8Myxpi6xUNN1lW5\nQEQuBeYCD+BMzr3S06iMMcaFBp6MEBVun3kscJyqXgAcA9zpXUjGGONOPNRk3SbZkKruAVDVQqDU\nu5CMMcadeOjJuh34WikiT+HMkx0GrPQuJGOMcScearJuk+yrwKnAcJzLMfzcs4iMMcYlXxxcEtxt\nH/pp4D1VvQ04PrJtjDEx5fP5XN9ixW2SrVTV7wFUdRUNuBSDMcY0lkSqya4RkYnAPGAIsMG7kIwx\nxp14qMm6Te/XAVtxVhHfBlzvWUTGGOOW3+/+FiNuz/gqpQGXxDXGGC/4E+W0WmOMaZLioFxgSdYY\nE7fi/soIxhjTlMXDwJclWWNM/PJZT9YYYzxjPVljjPFQwizabYwxTZINfBljjHesXGCMMV6ygS9j\njPHQj7En+874Txr7kHHp8kk/jXUITUbrY7NjHUKTcWT3brEOoYl5t0GP9llP1hhjvBMPi3ZbkjXG\nxK8YLsbtliVZY0z8silcxhjjIevJGmOMd2wVLmOM8ZKdVmuMMd6xKVzGGOOlH+PJCMYYEzXWkzXG\nGA81YHaBiPiBF4FBQBkwSlXzDrDPTOADVX25Ps/T9D8GjDHmYBp2SfDzgVRVPQm4H3jqAPs8AbRq\nUIgNebAxxsRUIOD+tr+hwCwAVZ0PDK7ZKCIXAyHgnw0J0ZKsMSZ++fzub/trDhTU2A6KSBKAiPQH\nLgd+09AQrSZrjIlfDTvjazeQVWPbr6qVke+vBjoCnwBdgXIRWa2qsw73SSzJGmPiV8PO+PoC+CXw\n3yJyIrB4b4Oq3rv3exF5FNhcnwQLlmSNMfGsYT3Z94DhIjIX8AHXicjdQJ6q/m9jhAeWZI0x8awB\n82RVNQSM2efuZQfY79F6PwmWZI0x8czWLjDGGA/ZGV/GGOMhW0/WGGM8ZOvJGmOMd8LWk/XGMX1S\nueCMLEIhmP1VEZ9+WVyr/bYRLcnOcgribVsGyFtbzvMz8gFoluzj0TFtmPGv3eQuL4t67NHUYshA\n+kwcz/yfXR3rULzn89Ht3vtI79WLcHk5KydOoGz9+qrmDldcQeszz4RQmA1Tp5I/+zMCzZvT87HH\nCGRkUFlQwKqJE6nMz4/hi/CAz0eLS24gOacL4coK8me8QnD7FgCSO3Yh+4JrqnZt1rUXO16bRNmy\nb2MV7eHzN/0U1vQj3EfAD1eem83Dz2+lrCLMI2Pa8vXSUgr2hKr22ZtQ01N9/PrGNkyfWX3m3LXn\nZROOetTR133cKDpeeR7BopJYhxIVLU89FX+zZnw36gYy+/eny513svyeewAIZGbS/tLLWHTRhfjT\n0hj49nTyZ39Gx2uvpXDRt2ycNpXmxx9P55tvYdXECTF+JY0rdcDxkJTMtmcfJrlLL7LPv4qdr00C\noGLDGrY//7iz39EnkrY7P74SLPHRk236BY195LRLYsuOSopLwwSDsHx1OdK12QH3vXh4c/41r4hd\nhU4CPntYJivWlrN2U0U0Q46J4lVrWXjJ7bEOI2qaDzqaXfPnAbBnyRIy+/StaguVlFC2eTP+tDQC\naWmEw87HbFq3buyaNxeAwtxcsgYNin7gHkvpLpQtdRJnxZoVNDuyx377+Jql0PysSyj469QoR9cI\nGrZ2QVQcsicrImcerE1VP2z8cOqWluKnpLS611pSFiI9df83sHmGn349Unj7/5xebL8eKbRvncQb\n7++id5cDJ+VEsvm9D0nr0jHWYURNICOD4J49VdvhUMiZQxkMAlC+ZQuDZryLz+9nw7RpABQvX07L\nYT+hePlyWg0bhj81NSaxe8mXmk6otLqcFg6HnMGiUPXfUPqJp1OyaD6hosJYhNgwcdCTratcMPIg\n94eBqCbZS4Zn0btrCp3bJ7NyXXnV/WkpfopL9++ZDhmQxtxFxUQ6LZw2OJ02LQI8dGMbctom0TWn\nGQWF+az5EfRqfwyCRUX40zOq7/D7qhJsi5NPJrlNa7654HwA+k5+jsLcb9kwbRpdx42j75+eZ9f8\neZRv2RKL0D0VLi3Gn1L94eHz+WolWID044ay881noh1a44j32QWqel20AqnL/3xUCBQS8MOTY48g\nI81HaXmYPt2aMXPO/p/A/Xuk8P6n1fe/8G71gMZNF7dgXm6JJdgEUpj7LS2HDmPnx/+PzP79Kclb\nWdVWWVhIqKyMcLnz4Vy5p5CkrCyaH3MM2//xD3YvXEir00+nMDc3VuF7puwHJbXfcZQsmk9yl15U\nbFxbq92XmgZJyQR37YhRhA0TDzVZVwNfIrIJp/fqw1klfJWq9j30o7wRDMH0mQXcd30b/D6Y/VUx\n+btDdGyXxPCTMpj6gVMe6NA2ia07K+s4mkkUOz/7jOwhJ9Bvymvg87Hyt4/TfuTllK1fR/6cORQd\nfzz9X3+DcDhM4beLKFiwgJROnej5yKMAlG/bxqoJT8T2RXigNPdLUmUgbe56HB8+8t95iczTzqFy\n+2ZKlywkqV0OwZ3bYh1mvYXjYHaBb+8ggFsi0gV49GC93Cse2PBjGLyv0+WTfhrrEJqM1sdmxzqE\nJuPIE7vFOoQmpePkdxvUFd2z4O+u803mCb+MSbf3sAsaqroG6ONBLMYYc1jCPr/rW6y4LRf8Gaqm\nl3YAEm+EwBgTf+K9JisiP1HVz4FpwN5Z7aXAV14HZowxdUqAVbgmicjpOJfLHY4z8AUQAIJeBmaM\nMXUJ++N/PdmPgEVAJ0CpTrJhoLuHcRljTJ3CxHm5QFUfAh4SkYdV9bdRiskYY1yJ5YCWW24nmb0p\nItOBtsBfgFxVXeBdWMYY40IcJFm3Eb4CvAE0Az4HJnsWkTHGuBT2+VzfYsVtkk1V1U+AsKoqzgwD\nY4yJqYSZJwuUicjPgYCInIglWWNME5AIswv2Gg1MAtoA44GbPYvIGGNcivvZBXup6noRuQJnCtdJ\nwAZPozLGGBcSZnaBiPwBWAV0AY7FOa32mkM+yBhjvBYHp9W6/RgYqqqvACep6i9wTk4wxpiYCuN3\nfYsVtzXZgIgMAVaLSDOc+bLGGBNTCbNoN84CMX8CrgP+ADzrWUTGGONSyJc4swvujXz9P5zBr9Nx\nTk4wxpiYSZiBL6oX6fYBxwEXexOOMca4lzDlAlUtq7H5hYj8zqN4jDHGtYSZJxtJqjWvjBA6xO7G\nGBMViVQuWFbj+2+BWR7EYowxhyVhBr5UdZrXgRhjzOFKmHKBMcY0RYlULnBtzZK8xj5kXGp9bHas\nQ2gydnxdEOsQmoy0lptiHUKT0rGBj29IT1ZE/MCLwCCgDBilqnk12m8EbgIqgSdU9f/q8zxN/2PA\nGGMOooGLdp+Ps1b2STgXi31qb4OItAfuAE4Bfg78TkRS6hOjJVljTNwKh32ubwcwlMggvqrOBwbX\naBsCfKGqZapaAOQBA+sToyVZY0zcChFwfTuA5kDNWlZQRJIO0lYI1KsGaANfxpi41cDZBbuBrBrb\nflWtPEhbFrCrPk9iPVljTNwK43N9O4AvgLMBIpfVWlyj7T/AMBFJFZFsoC+wpD4xWk/WGBO3GtiT\nfQ8YLiJzcdZluU5E7gbyVPV/ReQ5YA5OZ/QhVa3XtQ0tyRpj4tZBBrRcUdUQMGafu5fVaJ8CTKn3\nE0RYkjXGxC0748sYYzwUioNhJUuyxpi41ZByQbRYkjXGxK2QlQuMMcY7VpM1xhgPWbnAGGM8FArb\nwJcxxnjGygXGGOMhKxcYY4yH4uGKrpZkjTFxy3qyxhjjIavJGmOMh2x2gTHGeCgUjnUEdbMka4yJ\nW1YuMMYYD9nAl0dOOb41147sTDAIMz/axN8/3FyrvXePTMbf0ouKijArVu1h8pQ8wpF/K1JS/Lz8\n5DG8PG0VC77Oj0H0jcjno9u995Heqxfh8nJWTpxA2fr1Vc0drriC1meeCaEwG6ZOJX/2ZwSaN6fn\nY48RyMigsqCAVRMnUpkf5++DCy2GDKTPxPHM/9nVsQ7Fez4fHW+7m7TuPQhXVLDumScp37Shqrnt\nRSNocdoZEA6xZcZ0ds+dU9WW0qkzvSa/zHcjzidcUR6L6A9LOA7KBU2/aryPQMDH7aN6cPfDi7nt\ngUWc94sOtGqRXGufe2/tzXNTVnLr/YsoKq5k+KntqtruHtOLcDz8ZFxoeeqp+Js147tRN7D2xRfo\ncuedVW2BzEzaX3oZ391wA0vvuJ2uY8cC0PHaaylc9C3fjx7N5v/+bzrffEuswo+a7uNGMeCVJ/Cn\npsQ6lKjIPnkY/mbNyBt7C5veeIWc0bdWtfkzMmnzq4vIG3szqx4cR8ebbq9uS08nZ/SthCoqYhF2\nvYTwub7FSp1JVkTGiUjbaATjRtcj09mwqYTCokoqK8Pkfr+bQf1qX6m3bZsUlizbDcDipbsZeJTT\nPvKCTixZWkDe6qKox+2F5oOOZtf8eQDsWbKEzD59q9pCJSWUbd6MPy2NQFpa1QdLWrdu7Jo3F4DC\n3FyyBg2KfuBRVrxqLQsvub3uHRNERr8BFH61AIDiZd+T3kuq2kKlJZRv3Yw/NRV/alqtDkenO+5h\n05uvEi6r16WsYiIU8rm+xYqbckER8L6IbAJeB2apasy6ghnpAfYUV1ZtF5cEycio/TI2bi7h6P7Z\nLFpSwClDWpGaGuC4gS3olJPGH19YwYCj6nX59CYnkJFBcM+equ1wKASBAASDAJRv2cKgGe/i8/vZ\nMG0aAMXLl9Ny2E8oXr6cVsOG4U9NjUns0bT5vQ9J69Ix1mFEjT89g2BRdUciHAqBPwAh5/eiYttW\n5NW3we9n67vTATjiyuso/HIepT+sjEnM9ZUQA1+q+jLwsoj0Ax4CXhGRN4BnVbVe1yGvjxuv7MrA\no7Lp0TWD75cXVt2fnhZgz57KWvtOnKzcdWNPLr8wzLIVhZRXhDn3zPYc0TaVP00cROdO6fTunsmO\n/GXk/RC/vdpgURH+9IzqO/y+qgTb4uSTSW7Tmm8uOB+AvpOfozD3WzZMm0bXcePo+6fn2TV/HuVb\ntsQidOOhUHER/rT06jt8vqoE2/z4E0lu1Zql11wGQPcJkyj6bgktfzqciu3baPXzc0hq2YruE59i\n5T1Nv/efEFO4RKQFMAK4GtgF3Bl53AfAqZ5GV8OU6asBpyY7/cXBZGUmUVIa5Oh+2fz5b+tq7Xvy\n4NZMfE7ZsbOcu0b3ZP7CncxfuLOq/cG7hI8/3xrXCRagMPdbWg4dxs6P/x+Z/ftTklfdC6ksLCRU\nVka43Bm8qNxTSFJWFs2POYbt//gHuxcupNXpp1OYmxur8I1Hir5bQvMTT6Zgzqek9zmK0tWrqtqC\nhYWEysqrBrWCRXsIZGay7PrLq/bpO+1dVj04Lupx10eizC74EpgOXKaqVdlMRI72LKpDCAbDPP/a\nKp5+fAB+n4+ZH21m+85yuh6ZzkXn5vDUS3ms31jCpEcGUFoW5JvFu2ol2ESy87PPyB5yAv2mvAY+\nHyt/+zjtR15O2fp15M+ZQ9Hxx9P/9TcIh8MUfruIggULSOnUiZ6PPApA+bZtrJrwRGxfhGl0BXM/\nJ/PYwfR8+kXwwbqnfk+bCy+lfOMGds//guLlS+n57MsQDlH03WL2fP1lrEOut3gYw/bVNdIuIr7D\nqcEO/eXsOHjZ3pu09Z5Yh9Bk7Pi6INYhNBmdzjgi1iE0KYNmfd6grujfF1a6zje/PC4pJt1eNz3Z\n+0XkPqAY8AFhVc3xNixjjKlbopQLLgNyVLXY62CMMeZwJMTAF7AaKPE4DmOMOWzxUJN1k2SbAYtF\nZHFkO6yqlx/qAcYYEw0JMU8W+IPnURhjTD0kSrnga+AsIPFPDTLGxJVQHFzky02S/QDYCOydIxsH\nnx3GmB+DUILMLvCr6pWeR2KMMYcpUQa+ckXkBGARkV6sqjb9hSaNMQkvUZLsqcAva2yHge7ehGOM\nMe4lxMCXqib+gqPGmLiUEGd8icin7DPYpao/9SwiY4xxKZggswvGRL76gOMA69kaY5qEhKjJqqrW\n2FwmItd7GI8xxrjW2ElWRNJwlnZtBxQC16jqtgPslw7MBe5X1VmHOuZBk6yIZKtqgYiMrnF3DpBV\nn+CNMaaxeTDwdTOwWFUfFZERwK9xLlSwrxdwec7AoS6k+PfI12OB9pFbMXCJ63CNMcZD4bD7m0tD\ngb09038CP9t3BxEZj9OL/dbNAQ9VLigRkS+BXsDSGvefD5zs5uDGGOOlyCXt6kVEbgDG7nP3FmDv\nKvOFQPY+jzkD6KWqN4nIKW6e51BJ9iyc8sArwC1uDmaMMdHUkJqsqr6OcwXuKiLyN6pLolk41zWs\n6Qagi4h8BvQBjhWRzaq66GDPc9Akq6ohYD1wzmFHb4wxUeBBTfYL4GzgPzgdzTk1G2su8yoiU4EZ\nh0qw4G4KlzHGNEl1XaOwNlcnLrwETBORfwPlwOUAIvIk8BdV/c/hxmhJ1hgTtxp7ClfkMlv7De6r\n6r0HuO9aN8e0JGuMiVuJsp7sYclq3bKxDxmXjuzeLdYhNBlpLTfFOoQmY/3HW2IdQpPS0NNHE+W0\nWmOMaZIS4rRaY4xpqsKHNb0gNit2WZI1xsSthFhP1hhjmiorFxhjjIeCwaafZS3JGmPilvVkjTHG\nQ6E4yLKWZI0xcSts82SNMcY7h7d2QWxYkjXGxK0f5Wm1xhgTLcE4mChrSdYYE7cO74yv2LAka4yJ\nW3FQkrUka4yJXyHryRpjjHdsdoExxnjI5skaY4yHgnEwh8uSrDEmbllN1hhjPBQHJVlLssaY+GXz\nZI0xxkO2CpcxxngoVGkDX8YY45k4qBbEZ5I94egsLj/vCIKhMB/OyWfW7J212rt3TuX2azoSDMKG\nzWU8++Z6wmEYPCCLK85vB0De6hJeeHtjLML3hs9Hi0tuIDmnC+HKCvJnvEJw+xYAkjt2IfuCa6p2\nbda1Fztem0TZsm9jFW3j8/noeNvdpHXvQbiignXPPEn5pg1VzW0vGkGL086AcIgtM6aze+6cqraU\nTp3pNfllvhtxPuGK8lhEH1Uthgykz8TxzP/Z1bEOpcGsJuuBQABGj8zhzsfyKC0L8dRDPViwaDf5\nBZVV+1zxqyN454OtfJlbyL03HcmQQVnkLivihss6cN/vV7J7T5CLz2pLdlaAgsJgDF9N40kdcDwk\nJbPt2YdJ7tKL7POvYudrkwCo2LCG7c8/7ux39Imk7c5PrAQLZJ88DH+zZuSNvYX0PkeRM/pWVj/2\nIAD+jEza/Ooill0/En9qKr1feKMqyfrT08kZfSuhiopYhh813ceNouOV5xEsKol1KI0iHs748te1\ng4icIyIzReSTvbdoBHYwR3ZIZePWcvYUB6kMhvluRRH9emfU2mflmhIyMwIApKUGqAyG6dszndXr\nS7lxRAf++EAPdu2uTJgEC5DSXShb6iTOijUraHZkj/328TVLoflZl1Dw16lRjs57Gf0GUPjVAgCK\nl31Pei+paguVllC+dTP+1FT8qWm1/jA73XEPm958lXBZadRjjoXiVWtZeMntsQ6j0YRCYde3WHHT\nk/0tMBbY7HEsrmSk+Skqrk6OJaUhMtICtfbZsKWcW6/KYeR57SgqDpG7rIihg7MZ2DeD236zgpLS\nEJMe7MHSvCI2bEmMfw99qemESourtsPhEPj9tVY1Tj/xdEoWzSdUVBiLED3lT88gWFRUtR0OhcAf\ngJDzu1KxbSvy6tvg97P13ekAHHHldRR+OY/SH1bGJOZY2Pzeh6R16RjrMBpNPPRk3STZnao62/NI\n6nD1hUfQr3cG3TqlsmxVdTJJS62ddAHGXJHD+IkrWbuxjHPPaM2NIzqwYNFuVvxQUlVWWKxFdO+c\nljBJNlxajD8ltWrb5/Ptt2x8+nFD2fnmM9EOLSpCxUX409Kr7/D5qhJs8+NPJLlVa5ZecxkA3SdM\noui7JbT86XAqtm+j1c/PIallK7pPfIqV9yROL+/HIK5nF4jI6Mi35SLyKrAQCAOo6qtRiK2Wt/7m\nDOIEAvDKBCEzI0BpaYj+ksFf/7mt1r6FRZUUlzpv/s78Co7qmU7e6hK6dEyleWaAPcVB+vRIZ9bs\nHdF+GZ4p+0FJ7XccJYvmk9ylFxUb19Zq96WmQVIywV2J85prKvpuCc1PPJmCOZ+S3ucoSlevqmoL\nFhYSKiuvGtQKFu0hkJnJsusvr9qn77R3WfXguKjHbRom3ufJdoh8XRD52j7yNaavKhiEKTM2MmFc\nN3x++HBOPjt2VdI5J4VfntHZDrb6AAAJZklEQVSaF97eyOQ31nP/zZ0JBcNUVIaZPHU9BYVBpv5l\nE0+M7wbAnP8UsGZDWSxfSqMqzf2SVBlIm7sex4eP/HdeIvO0c6jcvpnSJQtJapdDcOe2ug8Upwrm\nfk7msYPp+fSL4IN1T/2eNhdeSvnGDeye/wXFy5fS89mXIRyi6LvF7Pn6y1iHbBpBPMwu8NVV0xCR\nX6vqEzW2f6eqDxxs/7OuzW36rzoKXsueEOsQmoztuinWITQZ6z/eEusQmpRzKtTXkMdf/fAm1/nm\nrd92aNBz1dehygU3AKOAviJyduTuAJAMHDTJGmNMtMT7KlzTgY+BB4G93bIQsNXroIwxxo14KBcc\nNMmqahmwWkS+AE6t0VQhIutU9d+eR2eMMYcQCjb9ue5upnBdBmQAc4EhQCpQKSJfq+pYL4MzxphD\nifdywV7JwOmqGhIRP/APVf2FiMz1ODZjjDmkxj4ZQUTScEql7YBC4BpV3bbPPk8DQ3HKp+NU9YtD\nHbPO02qB1jiJlsjXVpHvU9yHbowxjS8cCru+uXQzsFhVhwFvAb+u2Sgig4CTgROAq4Dn6jqgmyT7\nApArIn8DvgFeFJEHgVluozbGGC94kGSHUp3b/gn8bJ/2DUAxTiezOVDnykJ1lgtU9XUReR/oCeSp\n6g4RCahq0684G2MSWrABA1+Raar7jittAQoi3xcC2fu0V+KUCZZF2m6s63nqTLIicjQwGmfACxFB\nVa+v63HGGOO1hkzhUtXXgddr3hf5jz0rspkF7NrnYVfjLJb180j7v0Vknqpu4CDcDHxNBZ4H1rmK\n3BhjosSDVbi+AM4G/gOcBczZpz0f2KOqQREpBMqAzEMd0E2S3ayqr9UjWGOM8VQo1OircL0ETBOR\nfwPlwOUAIvIk8BfgHeCUyOyqAPBfqqqHOqCbJLtaRO7HGfTauwrXh/V+CcYY00ga+4wvVS0GLjnA\n/ffW2BxzOMd0k2RTAIncwEm0lmSNMTEXDsfxerJ7qep1ItIb6AEsBhLo6oPGmHgW14t27yUitwEX\n4JyEMBXoBdzmbVjGGFO3UBz0ZN2cjDACZ0LuLlWdjHOmgzHGxJwHJyM0Ojc12b2JeG+UiXM5AWNM\nXAs3/uyCRucmyb4DfA50EZF/AO97G5IxxrgT1+vJ7qWqz4vIx0B/Z1NzvQ/LGGPqFtezC0Tkd+x/\n0cRjRGSEqj7obVjGGFO3YGXTX0LlUD3ZZVGLwhhj6iGuywWqOi2agRhjzOGK63KBMcY0dfHQk/XV\ntYqNiCSpamWN7Raquu/yX8YYYw7gUANf7XFW/n5LRK4CfDhzZt/CuaCiMcaYOhyqXHAicCfOwjCv\n4CTZEPCvKMRljDEJwU254DxV/d8a21mqWuh5ZMYYkwDcrF0wTkQ6AIjICcA8b0Pyjoh0FZH5sY4j\nVkRkgoh8JSKnHaR9qoj8Isph1ZuIBETkXyLybxFpGet4GpuIpIrIqHo87gIRydnnvmtF5PeNHZeI\nPCoih7W+6o+Nm9kFjwH/EJHZwGDgYm9DMh66DDgmgf4T6QC0UdXjYh2IR9oDo4DDvTLJnTgLS3u1\nLGl94/pRcpNkvwO2AsNx6rErPY2oDiKSBrwJdAGSca42ORpnvdsA8LSqvisixwB/AoJAKS6uKtnU\nici1QB9VvV9EUnFOGHkSuAanXv5vVb1HRI4EXsW5+GUpzvtzHdAJmBk5m+8aVR0ROe5mVW0f9RfU\ncK8CvUTkFeAbVX1ZRPoAL6vqaSKSC8wGBuKcvfgr4BjgPpxLi3QD3gV+BywHhqjqThG5GchU1T9G\n/yXV8hBwlIj8BpiMc9G/1pG2O3Au8vcJ8BOgL06HaBJwNM6A9VBVLd/3oCJyO85lVcLADFV9TkSm\n4iz+1BXnw+taVf06ckXX24CdOO/Zu8ApNeIC+JWIXBKJ7WFV/Xujvgtxzk25YA7woqr2w/lkjHW5\nYAywWlVPAq4FTgW2q+rJOEsyPiEibYApwG2qeirwIvB0jOL12nXAnZH3Y5WIJOH8oT2nqqdHvv+9\nqj6Oc5XNM4GSmEXbuG4Bvgc2HaS9OfDnyO/ABpwL44HzAX0RcBJwr6qGgP/CWdYT4CqcWTSxNgH4\nPvKzexD4OPIzHQ28pKrrgHuBacAzwEhV/QBYBFx9kAR7FM5/NEMjt/NFZO9VT9ao6s9xOiejI39H\n9+Ek1TOBjAPEBbBBVc8A7gJubtR3IAG4SbI/jfzgUNVJwE3ehlQnIZLoVXUJzqfu55HtQpw/uh5A\njqouijzmc6Bf9EP1lC/y9TpgTKSc0yVy/wDgQRH5DPgN0M7lsRLBvq/lm8jXdUQuaw8sVtVKVS2i\n+gPndeAqEemPc/HQLd6HelgGANdHfqZTgL016Pdx/kOZrarrXRynP87vycc4veDWQM9I277vVU+c\nZFqsqkFg7kGOuTDydTOQ7vYF/Vi4SbLZIjJHRBZHLqjYweug6rAUOB5ARLoDI4Fhke0snF/GH4CN\nIjIw8phTcf4djHelVL//x0a+3giMifTWjgFOxikj3Keqp+F8KP7lYMcRkS44V72IZwd6X/Y60PSZ\n/e5T1bU4/34/hJNwm4IQ1X+jy4BnIj/TS3F63gDjcK65N1hETjzA4/alOCXA0yPHmopzWSnY/33J\nA/qISJqI+KmeH7/v8Zv+aVcx5CbJPofTW9qO88v3qJcBufAK0D3Sc3sL+AXQOnIJ38+Ax1R1K07y\neV5E5uAMBIyNUbyNaRbQNfJaLwV24/yBfCkin+DUzhcA44FHarxH+y5P+RWwS0QW4NTxfohS/F55\nFzhbRD7F+aCpryk4H9izGiWqhtsKNBORP+D8i35ppCc7C1giIoNxaqv3ATcAb4hINk6P8y0R2e/D\nU1W/xenF/ltEvsK5nNSGAz25qm4H/oBTMpwFpAEV+8Rl6uBmnuzHqnqGiHyiqj8VkU8jdSFjEoqI\nXAr0V9Xf1Lnzj0Ckvn+fqk6IbH8O/FpVP49tZPHFzeyCnSJyE5AhIiNw/qUyJqGIyEScXuyvYh1L\nU6GqlSKSISJf48wsWIDTqzWHwU1PtjnOyOYAnHroBFXNj0JsxhgT99z0ZO9Q1fv3bkTmWD7gXUjG\nGJM4DtqTjUxCHoUzyfn7yN1+oJmq7juCa4wx5gAO1ZOdjjMK+SDOyCY4Uze2eh2UMcYkijprssYY\nY+rPzTxZY4wx9WRJ1hhjPGRJ1hhjPGRJ1hhjPGRJ1hhjPPT/AWIgj6lnLg09AAAAAElFTkSuQmCC\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x1b20b4d9a20>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"sns.heatmap(stars.corr(),cmap='coolwarm',annot=True)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## NLP Classification Task\n", | |
"\n", | |
"**A dataframe called yelp_class that contains the columns of yelp dataframe but for only the 1 or 5 star reviews.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"yelp_class = yelp[(yelp.stars==1) | (yelp.stars==5)]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"** Create two objects X and y. X will be the 'text' column of yelp_class and y will be the 'stars' column of yelp_class. (Your features and target/labels)**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"X = yelp_class['text']\n", | |
"y = yelp_class['stars']" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Import CountVectorizer and create a CountVectorizer object.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.feature_extraction.text import CountVectorizer\n", | |
"cv = CountVectorizer()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"** Use the fit_transform method on the CountVectorizer object and pass in X (the 'text' column). Save this result by overwriting X.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"X = cv.fit_transform(X)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<4086x19183 sparse matrix of type '<class 'numpy.int64'>'\n", | |
"\twith 317288 stored elements in Compressed Sparse Row format>" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"X" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Train Test Split\n", | |
"\n", | |
"Let's split our data into training and testing data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.model_selection import train_test_split" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=101)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Training a Model\n", | |
"\n", | |
"Time to train a model!\n", | |
"\n", | |
"** Import MultinomialNB and create an instance of the estimator and call is nb **" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.naive_bayes import MultinomialNB\n", | |
"nb = MultinomialNB()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Now fit nb using the training data.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"nb.fit(X_train,y_train)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Predictions and Evaluations\n", | |
"\n", | |
"Time to see how our model did!\n", | |
"\n", | |
"**Use the predict method off of nb to predict labels from X_test.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"predictions = nb.predict(X_test)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"** Create a confusion matrix and classification report using these predictions and y_test **" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.metrics import confusion_matrix,classification_report" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[[159 69]\n", | |
" [ 22 976]]\n", | |
"\n", | |
"\n", | |
" precision recall f1-score support\n", | |
"\n", | |
" 1 0.88 0.70 0.78 228\n", | |
" 5 0.93 0.98 0.96 998\n", | |
"\n", | |
"avg / total 0.92 0.93 0.92 1226\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(confusion_matrix(y_test,predictions))\n", | |
"print('\\n')\n", | |
"print(classification_report(y_test,predictions))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Great! Let's see what happens if we try to include TF-IDF to this process using a pipeline.**" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Using Text Processing\n", | |
"\n", | |
"** Import TfidfTransformer from sklearn. **" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.feature_extraction.text import TfidfTransformer" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"** Import Pipeline from sklearn. **" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.pipeline import Pipeline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"** Now create a pipeline with the following steps:CountVectorizer(), TfidfTransformer(),MultinomialNB()**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"pipeline = Pipeline([\n", | |
" ('bow', CountVectorizer()), # strings to token integer counts\n", | |
" ('tfidf', TfidfTransformer()), # integer counts to weighted TF-IDF scores\n", | |
" ('classifier', MultinomialNB()), # train on TF-IDF vectors w/ Naive Bayes classifier\n", | |
"])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Using the Pipeline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Train Test Split\n", | |
"\n", | |
"**Redo the train test split on the yelp_class object.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"X = yelp_class['text']\n", | |
"y = yelp_class['stars']\n", | |
"X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=101)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Fit the pipeline to the training data **" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Pipeline(memory=None,\n", | |
" steps=[('bow', CountVectorizer(analyzer='word', binary=False, decode_error='strict',\n", | |
" dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',\n", | |
" lowercase=True, max_df=1.0, max_features=None, min_df=1,\n", | |
" ngram_range=(1, 1), preprocessor=None, stop_words=None,\n", | |
" strip_...f=False, use_idf=True)), ('classifier', MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))])" | |
] | |
}, | |
"execution_count": 31, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"pipeline.fit(X_train,y_train)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Predictions and Evaluation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"predictions = pipeline.predict(X_test)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[[ 0 228]\n", | |
" [ 0 998]]\n", | |
" precision recall f1-score support\n", | |
"\n", | |
" 1 0.00 0.00 0.00 228\n", | |
" 5 0.81 1.00 0.90 998\n", | |
"\n", | |
"avg / total 0.66 0.81 0.73 1226\n", | |
"\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Miniconda3\\lib\\site-packages\\sklearn\\metrics\\classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n", | |
" 'precision', 'predicted', average, warn_for)\n" | |
] | |
} | |
], | |
"source": [ | |
"print(confusion_matrix(y_test,predictions))\n", | |
"print(classification_report(y_test,predictions))" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment