Skip to content

Instantly share code, notes, and snippets.

@MarwenAouiti
Created March 29, 2019 07:24
Show Gist options
  • Save MarwenAouiti/7882caaf1642f7322f21ddf4dc1359de to your computer and use it in GitHub Desktop.
Save MarwenAouiti/7882caaf1642f7322f21ddf4dc1359de to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
" <a href=\"http://cocl.us/DA0101EN_NotbookLink_Top\">\n",
" <img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/Images/TopAd.png\" width=\"750\" align=\"center\">\n",
" </a>\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://www.bigdatauniversity.com\"><img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/Images/CCLog.png\" width=300, align=\"center\"></a>\n",
"\n",
"<h1 align=center><font size = 5>Data Analysis with Python</font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1>Introduction</h1>\n",
"<h3>Welcome!</h3>\n",
"\n",
"<p>\n",
"In this section, you will learn how to approach data acquisition in various ways, and obtain necessary insights from a dataset. By the end of this lab, you will successfully load the data into Jupyter Notebook, and gain some fundamental insights via Pandas Library.\n",
"</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Table of Contents</h2>\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"<ol>\n",
" <li><a href=\"#data_acquisition\">Data Acquisition</a>\n",
" <li><a href=\"#basic_insight\">Basic Insight of Dataset</a></li>\n",
"</ol>\n",
"\n",
"Estimated Time Needed: <strong>10 min</strong>\n",
"</div>\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1 id=\"data_acquisition\">Data Acquisition</h1>\n",
"<p>\n",
"There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different places, on your local machine or sometimes online.<br>\n",
"In this section, you will learn how to load a dataset into our Jupyter Notebook.<br>\n",
"In our case, the Automobile Dataset is an online source, and it is in CSV (comma separated value) format. Let's use this dataset as an example to practice data reading.\n",
"<ul>\n",
" <li>data source: <a href=\"https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data\" target=\"_blank\">https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data</a></li>\n",
" <li>data type: csv</li>\n",
"</ul>\n",
"The Pandas Library is a useful tool that enables us to read various datasets into a data frame; our Jupyter notebook platforms have a built-in <b>Pandas Library</b> so that all we need to do is import Pandas without installing.\n",
"</p>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# import pandas library\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Read Data</h2>\n",
"<p>\n",
"We use <code>pandas.read_csv()</code> function to read the csv file. In the bracket, we put the file path along with a quotation mark, so that pandas will read the file into a data frame from that address. The file path can be either an URL or your local file address.<br>\n",
"Because the data does not include headers, we can add an argument <code>headers = None</code> inside the <code>read_csv()</code> method, so that pandas will not automatically set the first row as a header.<br>\n",
"You can also assign the dataset to any variable you create.\n",
"</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This dataset was hosted on IBM Cloud object click <a href=\"https://cocl.us/cognitive_class_DA0101EN_objectstorage\">HERE</a> for free storage."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Import pandas library\n",
"import pandas as pd\n",
"\n",
"# Read the online file by the URL provides above, and assign it to variable \"df\"\n",
"other_path = \"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/auto.csv\"\n",
"df = pd.read_csv(other_path, header=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After reading the dataset, we can use the <code>dataframe.head(n)</code> method to check the top n rows of the dataframe; where n is an integer. Contrary to <code>dataframe.head(n)</code>, <code>dataframe.tail(n)</code> will show you the bottom n rows of the dataframe.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The first 5 rows of the dataframe\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>16</th>\n",
" <th>17</th>\n",
" <th>18</th>\n",
" <th>19</th>\n",
" <th>20</th>\n",
" <th>21</th>\n",
" <th>22</th>\n",
" <th>23</th>\n",
" <th>24</th>\n",
" <th>25</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.0</td>\n",
" <td>111</td>\n",
" <td>5000</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>13495</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.0</td>\n",
" <td>111</td>\n",
" <td>5000</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>16500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>152</td>\n",
" <td>mpfi</td>\n",
" <td>2.68</td>\n",
" <td>3.47</td>\n",
" <td>9.0</td>\n",
" <td>154</td>\n",
" <td>5000</td>\n",
" <td>19</td>\n",
" <td>26</td>\n",
" <td>16500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>99.8</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>10.0</td>\n",
" <td>102</td>\n",
" <td>5500</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>13950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>4wd</td>\n",
" <td>front</td>\n",
" <td>99.4</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.0</td>\n",
" <td>115</td>\n",
" <td>5500</td>\n",
" <td>18</td>\n",
" <td>22</td>\n",
" <td>17450</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 8 9 ... \\\n",
"0 3 ? alfa-romero gas std two convertible rwd front 88.6 ... \n",
"1 3 ? alfa-romero gas std two convertible rwd front 88.6 ... \n",
"2 1 ? alfa-romero gas std two hatchback rwd front 94.5 ... \n",
"3 2 164 audi gas std four sedan fwd front 99.8 ... \n",
"4 2 164 audi gas std four sedan 4wd front 99.4 ... \n",
"\n",
" 16 17 18 19 20 21 22 23 24 25 \n",
"0 130 mpfi 3.47 2.68 9.0 111 5000 21 27 13495 \n",
"1 130 mpfi 3.47 2.68 9.0 111 5000 21 27 16500 \n",
"2 152 mpfi 2.68 3.47 9.0 154 5000 19 26 16500 \n",
"3 109 mpfi 3.19 3.40 10.0 102 5500 24 30 13950 \n",
"4 136 mpfi 3.19 3.40 8.0 115 5500 18 22 17450 \n",
"\n",
"[5 rows x 26 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show the first 5 rows using dataframe.head() method\n",
"print(\"The first 5 rows of the dataframe\") \n",
"df.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-danger alertdanger\" style=\"margin-top: 20px\">\n",
"<h1> Question #1: </h1>\n",
"<b>check the bottom 10 rows of data frame \"df\".</b>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>...</th>\n",
" <th>16</th>\n",
" <th>17</th>\n",
" <th>18</th>\n",
" <th>19</th>\n",
" <th>20</th>\n",
" <th>21</th>\n",
" <th>22</th>\n",
" <th>23</th>\n",
" <th>24</th>\n",
" <th>25</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>195</th>\n",
" <td>-1</td>\n",
" <td>74</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.5</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>13415</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>-2</td>\n",
" <td>103</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.5</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>15985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>197</th>\n",
" <td>-1</td>\n",
" <td>74</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.5</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>16515</td>\n",
" </tr>\n",
" <tr>\n",
" <th>198</th>\n",
" <td>-2</td>\n",
" <td>103</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.15</td>\n",
" <td>7.5</td>\n",
" <td>162</td>\n",
" <td>5100</td>\n",
" <td>17</td>\n",
" <td>22</td>\n",
" <td>18420</td>\n",
" </tr>\n",
" <tr>\n",
" <th>199</th>\n",
" <td>-1</td>\n",
" <td>74</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.15</td>\n",
" <td>7.5</td>\n",
" <td>162</td>\n",
" <td>5100</td>\n",
" <td>17</td>\n",
" <td>22</td>\n",
" <td>18950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>200</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.5</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>16845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>201</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>8.7</td>\n",
" <td>160</td>\n",
" <td>5300</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>19045</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>173</td>\n",
" <td>mpfi</td>\n",
" <td>3.58</td>\n",
" <td>2.87</td>\n",
" <td>8.8</td>\n",
" <td>134</td>\n",
" <td>5500</td>\n",
" <td>18</td>\n",
" <td>23</td>\n",
" <td>21485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>203</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>diesel</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>145</td>\n",
" <td>idi</td>\n",
" <td>3.01</td>\n",
" <td>3.40</td>\n",
" <td>23.0</td>\n",
" <td>106</td>\n",
" <td>4800</td>\n",
" <td>26</td>\n",
" <td>27</td>\n",
" <td>22470</td>\n",
" </tr>\n",
" <tr>\n",
" <th>204</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.5</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>22625</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 8 9 ... 16 \\\n",
"195 -1 74 volvo gas std four wagon rwd front 104.3 ... 141 \n",
"196 -2 103 volvo gas std four sedan rwd front 104.3 ... 141 \n",
"197 -1 74 volvo gas std four wagon rwd front 104.3 ... 141 \n",
"198 -2 103 volvo gas turbo four sedan rwd front 104.3 ... 130 \n",
"199 -1 74 volvo gas turbo four wagon rwd front 104.3 ... 130 \n",
"200 -1 95 volvo gas std four sedan rwd front 109.1 ... 141 \n",
"201 -1 95 volvo gas turbo four sedan rwd front 109.1 ... 141 \n",
"202 -1 95 volvo gas std four sedan rwd front 109.1 ... 173 \n",
"203 -1 95 volvo diesel turbo four sedan rwd front 109.1 ... 145 \n",
"204 -1 95 volvo gas turbo four sedan rwd front 109.1 ... 141 \n",
"\n",
" 17 18 19 20 21 22 23 24 25 \n",
"195 mpfi 3.78 3.15 9.5 114 5400 23 28 13415 \n",
"196 mpfi 3.78 3.15 9.5 114 5400 24 28 15985 \n",
"197 mpfi 3.78 3.15 9.5 114 5400 24 28 16515 \n",
"198 mpfi 3.62 3.15 7.5 162 5100 17 22 18420 \n",
"199 mpfi 3.62 3.15 7.5 162 5100 17 22 18950 \n",
"200 mpfi 3.78 3.15 9.5 114 5400 23 28 16845 \n",
"201 mpfi 3.78 3.15 8.7 160 5300 19 25 19045 \n",
"202 mpfi 3.58 2.87 8.8 134 5500 18 23 21485 \n",
"203 idi 3.01 3.40 23.0 106 4800 26 27 22470 \n",
"204 mpfi 3.78 3.15 9.5 114 5400 19 25 22625 \n",
"\n",
"[10 rows x 26 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Write your code below and press Shift+Enter to execute \n",
"df.tail(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-danger alertdanger\" style=\"margin-top: 20px\">\n",
"<h1> Question #1 Answer: </h1>\n",
"<b>Run the code below for the solution!</b>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click <b>here</b> for the solution.\n",
"\n",
"<!-- The answer is below:\n",
"\n",
"print(\"The last 10 rows of the dataframe\\n\")\n",
"df.tail(10)\n",
"\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Add Headers</h3>\n",
"<p>\n",
"Take a look at our dataset; pandas automatically set the header by an integer from 0.\n",
"</p>\n",
"<p>\n",
"To better describe our data we can introduce a header, this information is available at: <a href=\"https://archive.ics.uci.edu/ml/datasets/Automobile\" target=\"_blank\">https://archive.ics.uci.edu/ml/datasets/Automobile</a>\n",
"</p>\n",
"<p>\n",
"Thus, we have to add headers manually.\n",
"</p>\n",
"<p>\n",
"Firstly, we create a list \"headers\" that include all column names in order.\n",
"Then, we use <code>dataframe.columns = headers</code> to replace the headers by the list we created.\n",
"</p>"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"headers\n",
" ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-of-doors', 'body-style', 'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'price']\n"
]
}
],
"source": [
"# create headers list\n",
"headers = [\"symboling\",\"normalized-losses\",\"make\",\"fuel-type\",\"aspiration\", \"num-of-doors\",\"body-style\",\n",
" \"drive-wheels\",\"engine-location\",\"wheel-base\", \"length\",\"width\",\"height\",\"curb-weight\",\"engine-type\",\n",
" \"num-of-cylinders\", \"engine-size\",\"fuel-system\",\"bore\",\"stroke\",\"compression-ratio\",\"horsepower\",\n",
" \"peak-rpm\",\"city-mpg\",\"highway-mpg\",\"price\"]\n",
"print(\"headers\\n\", headers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We replace headers and recheck our data frame"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>symboling</th>\n",
" <th>normalized-losses</th>\n",
" <th>make</th>\n",
" <th>fuel-type</th>\n",
" <th>aspiration</th>\n",
" <th>num-of-doors</th>\n",
" <th>body-style</th>\n",
" <th>drive-wheels</th>\n",
" <th>engine-location</th>\n",
" <th>wheel-base</th>\n",
" <th>...</th>\n",
" <th>engine-size</th>\n",
" <th>fuel-system</th>\n",
" <th>bore</th>\n",
" <th>stroke</th>\n",
" <th>compression-ratio</th>\n",
" <th>horsepower</th>\n",
" <th>peak-rpm</th>\n",
" <th>city-mpg</th>\n",
" <th>highway-mpg</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.0</td>\n",
" <td>111</td>\n",
" <td>5000</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>13495</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.0</td>\n",
" <td>111</td>\n",
" <td>5000</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>16500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>152</td>\n",
" <td>mpfi</td>\n",
" <td>2.68</td>\n",
" <td>3.47</td>\n",
" <td>9.0</td>\n",
" <td>154</td>\n",
" <td>5000</td>\n",
" <td>19</td>\n",
" <td>26</td>\n",
" <td>16500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>99.8</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>10.0</td>\n",
" <td>102</td>\n",
" <td>5500</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>13950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>4wd</td>\n",
" <td>front</td>\n",
" <td>99.4</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.0</td>\n",
" <td>115</td>\n",
" <td>5500</td>\n",
" <td>18</td>\n",
" <td>22</td>\n",
" <td>17450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2</td>\n",
" <td>?</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>99.8</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.5</td>\n",
" <td>110</td>\n",
" <td>5500</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>15250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1</td>\n",
" <td>158</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>105.8</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.5</td>\n",
" <td>110</td>\n",
" <td>5500</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>17710</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1</td>\n",
" <td>?</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>105.8</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.5</td>\n",
" <td>110</td>\n",
" <td>5500</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>18920</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>158</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>105.8</td>\n",
" <td>...</td>\n",
" <td>131</td>\n",
" <td>mpfi</td>\n",
" <td>3.13</td>\n",
" <td>3.40</td>\n",
" <td>8.3</td>\n",
" <td>140</td>\n",
" <td>5500</td>\n",
" <td>17</td>\n",
" <td>20</td>\n",
" <td>23875</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>4wd</td>\n",
" <td>front</td>\n",
" <td>99.5</td>\n",
" <td>...</td>\n",
" <td>131</td>\n",
" <td>mpfi</td>\n",
" <td>3.13</td>\n",
" <td>3.40</td>\n",
" <td>7.0</td>\n",
" <td>160</td>\n",
" <td>5500</td>\n",
" <td>16</td>\n",
" <td>22</td>\n",
" <td>?</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" symboling normalized-losses make fuel-type aspiration num-of-doors \\\n",
"0 3 ? alfa-romero gas std two \n",
"1 3 ? alfa-romero gas std two \n",
"2 1 ? alfa-romero gas std two \n",
"3 2 164 audi gas std four \n",
"4 2 164 audi gas std four \n",
"5 2 ? audi gas std two \n",
"6 1 158 audi gas std four \n",
"7 1 ? audi gas std four \n",
"8 1 158 audi gas turbo four \n",
"9 0 ? audi gas turbo two \n",
"\n",
" body-style drive-wheels engine-location wheel-base ... engine-size \\\n",
"0 convertible rwd front 88.6 ... 130 \n",
"1 convertible rwd front 88.6 ... 130 \n",
"2 hatchback rwd front 94.5 ... 152 \n",
"3 sedan fwd front 99.8 ... 109 \n",
"4 sedan 4wd front 99.4 ... 136 \n",
"5 sedan fwd front 99.8 ... 136 \n",
"6 sedan fwd front 105.8 ... 136 \n",
"7 wagon fwd front 105.8 ... 136 \n",
"8 sedan fwd front 105.8 ... 131 \n",
"9 hatchback 4wd front 99.5 ... 131 \n",
"\n",
" fuel-system bore stroke compression-ratio horsepower peak-rpm city-mpg \\\n",
"0 mpfi 3.47 2.68 9.0 111 5000 21 \n",
"1 mpfi 3.47 2.68 9.0 111 5000 21 \n",
"2 mpfi 2.68 3.47 9.0 154 5000 19 \n",
"3 mpfi 3.19 3.40 10.0 102 5500 24 \n",
"4 mpfi 3.19 3.40 8.0 115 5500 18 \n",
"5 mpfi 3.19 3.40 8.5 110 5500 19 \n",
"6 mpfi 3.19 3.40 8.5 110 5500 19 \n",
"7 mpfi 3.19 3.40 8.5 110 5500 19 \n",
"8 mpfi 3.13 3.40 8.3 140 5500 17 \n",
"9 mpfi 3.13 3.40 7.0 160 5500 16 \n",
"\n",
" highway-mpg price \n",
"0 27 13495 \n",
"1 27 16500 \n",
"2 26 16500 \n",
"3 30 13950 \n",
"4 22 17450 \n",
"5 25 15250 \n",
"6 25 17710 \n",
"7 25 18920 \n",
"8 20 23875 \n",
"9 22 ? \n",
"\n",
"[10 rows x 26 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns = headers\n",
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we can drop missing values along the column \"price\" as follows "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>symboling</th>\n",
" <th>normalized-losses</th>\n",
" <th>make</th>\n",
" <th>fuel-type</th>\n",
" <th>aspiration</th>\n",
" <th>num-of-doors</th>\n",
" <th>body-style</th>\n",
" <th>drive-wheels</th>\n",
" <th>engine-location</th>\n",
" <th>wheel-base</th>\n",
" <th>...</th>\n",
" <th>engine-size</th>\n",
" <th>fuel-system</th>\n",
" <th>bore</th>\n",
" <th>stroke</th>\n",
" <th>compression-ratio</th>\n",
" <th>horsepower</th>\n",
" <th>peak-rpm</th>\n",
" <th>city-mpg</th>\n",
" <th>highway-mpg</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.00</td>\n",
" <td>111</td>\n",
" <td>5000</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>13495</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>88.6</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.47</td>\n",
" <td>2.68</td>\n",
" <td>9.00</td>\n",
" <td>111</td>\n",
" <td>5000</td>\n",
" <td>21</td>\n",
" <td>27</td>\n",
" <td>16500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>?</td>\n",
" <td>alfa-romero</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>152</td>\n",
" <td>mpfi</td>\n",
" <td>2.68</td>\n",
" <td>3.47</td>\n",
" <td>9.00</td>\n",
" <td>154</td>\n",
" <td>5000</td>\n",
" <td>19</td>\n",
" <td>26</td>\n",
" <td>16500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>99.8</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>10.00</td>\n",
" <td>102</td>\n",
" <td>5500</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>13950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>164</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>4wd</td>\n",
" <td>front</td>\n",
" <td>99.4</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.00</td>\n",
" <td>115</td>\n",
" <td>5500</td>\n",
" <td>18</td>\n",
" <td>22</td>\n",
" <td>17450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2</td>\n",
" <td>?</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>99.8</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.50</td>\n",
" <td>110</td>\n",
" <td>5500</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>15250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1</td>\n",
" <td>158</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>105.8</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.50</td>\n",
" <td>110</td>\n",
" <td>5500</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>17710</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1</td>\n",
" <td>?</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>105.8</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.50</td>\n",
" <td>110</td>\n",
" <td>5500</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>18920</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>158</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>105.8</td>\n",
" <td>...</td>\n",
" <td>131</td>\n",
" <td>mpfi</td>\n",
" <td>3.13</td>\n",
" <td>3.40</td>\n",
" <td>8.30</td>\n",
" <td>140</td>\n",
" <td>5500</td>\n",
" <td>17</td>\n",
" <td>20</td>\n",
" <td>23875</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>audi</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>4wd</td>\n",
" <td>front</td>\n",
" <td>99.5</td>\n",
" <td>...</td>\n",
" <td>131</td>\n",
" <td>mpfi</td>\n",
" <td>3.13</td>\n",
" <td>3.40</td>\n",
" <td>7.00</td>\n",
" <td>160</td>\n",
" <td>5500</td>\n",
" <td>16</td>\n",
" <td>22</td>\n",
" <td>?</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2</td>\n",
" <td>192</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>101.2</td>\n",
" <td>...</td>\n",
" <td>108</td>\n",
" <td>mpfi</td>\n",
" <td>3.50</td>\n",
" <td>2.80</td>\n",
" <td>8.80</td>\n",
" <td>101</td>\n",
" <td>5800</td>\n",
" <td>23</td>\n",
" <td>29</td>\n",
" <td>16430</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>0</td>\n",
" <td>192</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>101.2</td>\n",
" <td>...</td>\n",
" <td>108</td>\n",
" <td>mpfi</td>\n",
" <td>3.50</td>\n",
" <td>2.80</td>\n",
" <td>8.80</td>\n",
" <td>101</td>\n",
" <td>5800</td>\n",
" <td>23</td>\n",
" <td>29</td>\n",
" <td>16925</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>0</td>\n",
" <td>188</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>101.2</td>\n",
" <td>...</td>\n",
" <td>164</td>\n",
" <td>mpfi</td>\n",
" <td>3.31</td>\n",
" <td>3.19</td>\n",
" <td>9.00</td>\n",
" <td>121</td>\n",
" <td>4250</td>\n",
" <td>21</td>\n",
" <td>28</td>\n",
" <td>20970</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>0</td>\n",
" <td>188</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>101.2</td>\n",
" <td>...</td>\n",
" <td>164</td>\n",
" <td>mpfi</td>\n",
" <td>3.31</td>\n",
" <td>3.19</td>\n",
" <td>9.00</td>\n",
" <td>121</td>\n",
" <td>4250</td>\n",
" <td>21</td>\n",
" <td>28</td>\n",
" <td>21105</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>1</td>\n",
" <td>?</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>103.5</td>\n",
" <td>...</td>\n",
" <td>164</td>\n",
" <td>mpfi</td>\n",
" <td>3.31</td>\n",
" <td>3.19</td>\n",
" <td>9.00</td>\n",
" <td>121</td>\n",
" <td>4250</td>\n",
" <td>20</td>\n",
" <td>25</td>\n",
" <td>24565</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>103.5</td>\n",
" <td>...</td>\n",
" <td>209</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.39</td>\n",
" <td>8.00</td>\n",
" <td>182</td>\n",
" <td>5400</td>\n",
" <td>16</td>\n",
" <td>22</td>\n",
" <td>30760</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>103.5</td>\n",
" <td>...</td>\n",
" <td>209</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.39</td>\n",
" <td>8.00</td>\n",
" <td>182</td>\n",
" <td>5400</td>\n",
" <td>16</td>\n",
" <td>22</td>\n",
" <td>41315</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>bmw</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>110.0</td>\n",
" <td>...</td>\n",
" <td>209</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.39</td>\n",
" <td>8.00</td>\n",
" <td>182</td>\n",
" <td>5400</td>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" <td>36880</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>2</td>\n",
" <td>121</td>\n",
" <td>chevrolet</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>88.4</td>\n",
" <td>...</td>\n",
" <td>61</td>\n",
" <td>2bbl</td>\n",
" <td>2.91</td>\n",
" <td>3.03</td>\n",
" <td>9.50</td>\n",
" <td>48</td>\n",
" <td>5100</td>\n",
" <td>47</td>\n",
" <td>53</td>\n",
" <td>5151</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>1</td>\n",
" <td>98</td>\n",
" <td>chevrolet</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>90</td>\n",
" <td>2bbl</td>\n",
" <td>3.03</td>\n",
" <td>3.11</td>\n",
" <td>9.60</td>\n",
" <td>70</td>\n",
" <td>5400</td>\n",
" <td>38</td>\n",
" <td>43</td>\n",
" <td>6295</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>0</td>\n",
" <td>81</td>\n",
" <td>chevrolet</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>90</td>\n",
" <td>2bbl</td>\n",
" <td>3.03</td>\n",
" <td>3.11</td>\n",
" <td>9.60</td>\n",
" <td>70</td>\n",
" <td>5400</td>\n",
" <td>38</td>\n",
" <td>43</td>\n",
" <td>6575</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>1</td>\n",
" <td>118</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>93.7</td>\n",
" <td>...</td>\n",
" <td>90</td>\n",
" <td>2bbl</td>\n",
" <td>2.97</td>\n",
" <td>3.23</td>\n",
" <td>9.41</td>\n",
" <td>68</td>\n",
" <td>5500</td>\n",
" <td>37</td>\n",
" <td>41</td>\n",
" <td>5572</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>1</td>\n",
" <td>118</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>93.7</td>\n",
" <td>...</td>\n",
" <td>90</td>\n",
" <td>2bbl</td>\n",
" <td>2.97</td>\n",
" <td>3.23</td>\n",
" <td>9.40</td>\n",
" <td>68</td>\n",
" <td>5500</td>\n",
" <td>31</td>\n",
" <td>38</td>\n",
" <td>6377</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>1</td>\n",
" <td>118</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>93.7</td>\n",
" <td>...</td>\n",
" <td>98</td>\n",
" <td>mpfi</td>\n",
" <td>3.03</td>\n",
" <td>3.39</td>\n",
" <td>7.60</td>\n",
" <td>102</td>\n",
" <td>5500</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>7957</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>1</td>\n",
" <td>148</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>93.7</td>\n",
" <td>...</td>\n",
" <td>90</td>\n",
" <td>2bbl</td>\n",
" <td>2.97</td>\n",
" <td>3.23</td>\n",
" <td>9.40</td>\n",
" <td>68</td>\n",
" <td>5500</td>\n",
" <td>31</td>\n",
" <td>38</td>\n",
" <td>6229</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>1</td>\n",
" <td>148</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>93.7</td>\n",
" <td>...</td>\n",
" <td>90</td>\n",
" <td>2bbl</td>\n",
" <td>2.97</td>\n",
" <td>3.23</td>\n",
" <td>9.40</td>\n",
" <td>68</td>\n",
" <td>5500</td>\n",
" <td>31</td>\n",
" <td>38</td>\n",
" <td>6692</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>1</td>\n",
" <td>148</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>93.7</td>\n",
" <td>...</td>\n",
" <td>90</td>\n",
" <td>2bbl</td>\n",
" <td>2.97</td>\n",
" <td>3.23</td>\n",
" <td>9.40</td>\n",
" <td>68</td>\n",
" <td>5500</td>\n",
" <td>31</td>\n",
" <td>38</td>\n",
" <td>7609</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>1</td>\n",
" <td>148</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>?</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>93.7</td>\n",
" <td>...</td>\n",
" <td>98</td>\n",
" <td>mpfi</td>\n",
" <td>3.03</td>\n",
" <td>3.39</td>\n",
" <td>7.60</td>\n",
" <td>102</td>\n",
" <td>5500</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>8558</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>-1</td>\n",
" <td>110</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>103.3</td>\n",
" <td>...</td>\n",
" <td>122</td>\n",
" <td>2bbl</td>\n",
" <td>3.34</td>\n",
" <td>3.46</td>\n",
" <td>8.50</td>\n",
" <td>88</td>\n",
" <td>5000</td>\n",
" <td>24</td>\n",
" <td>30</td>\n",
" <td>8921</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>3</td>\n",
" <td>145</td>\n",
" <td>dodge</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>95.9</td>\n",
" <td>...</td>\n",
" <td>156</td>\n",
" <td>mfi</td>\n",
" <td>3.60</td>\n",
" <td>3.90</td>\n",
" <td>7.00</td>\n",
" <td>145</td>\n",
" <td>5000</td>\n",
" <td>19</td>\n",
" <td>24</td>\n",
" <td>12964</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>175</th>\n",
" <td>-1</td>\n",
" <td>65</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>102.4</td>\n",
" <td>...</td>\n",
" <td>122</td>\n",
" <td>mpfi</td>\n",
" <td>3.31</td>\n",
" <td>3.54</td>\n",
" <td>8.70</td>\n",
" <td>92</td>\n",
" <td>4200</td>\n",
" <td>27</td>\n",
" <td>32</td>\n",
" <td>9988</td>\n",
" </tr>\n",
" <tr>\n",
" <th>176</th>\n",
" <td>-1</td>\n",
" <td>65</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>102.4</td>\n",
" <td>...</td>\n",
" <td>122</td>\n",
" <td>mpfi</td>\n",
" <td>3.31</td>\n",
" <td>3.54</td>\n",
" <td>8.70</td>\n",
" <td>92</td>\n",
" <td>4200</td>\n",
" <td>27</td>\n",
" <td>32</td>\n",
" <td>10898</td>\n",
" </tr>\n",
" <tr>\n",
" <th>177</th>\n",
" <td>-1</td>\n",
" <td>65</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>102.4</td>\n",
" <td>...</td>\n",
" <td>122</td>\n",
" <td>mpfi</td>\n",
" <td>3.31</td>\n",
" <td>3.54</td>\n",
" <td>8.70</td>\n",
" <td>92</td>\n",
" <td>4200</td>\n",
" <td>27</td>\n",
" <td>32</td>\n",
" <td>11248</td>\n",
" </tr>\n",
" <tr>\n",
" <th>178</th>\n",
" <td>3</td>\n",
" <td>197</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>102.9</td>\n",
" <td>...</td>\n",
" <td>171</td>\n",
" <td>mpfi</td>\n",
" <td>3.27</td>\n",
" <td>3.35</td>\n",
" <td>9.30</td>\n",
" <td>161</td>\n",
" <td>5200</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>16558</td>\n",
" </tr>\n",
" <tr>\n",
" <th>179</th>\n",
" <td>3</td>\n",
" <td>197</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>102.9</td>\n",
" <td>...</td>\n",
" <td>171</td>\n",
" <td>mpfi</td>\n",
" <td>3.27</td>\n",
" <td>3.35</td>\n",
" <td>9.30</td>\n",
" <td>161</td>\n",
" <td>5200</td>\n",
" <td>19</td>\n",
" <td>24</td>\n",
" <td>15998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>180</th>\n",
" <td>-1</td>\n",
" <td>90</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.5</td>\n",
" <td>...</td>\n",
" <td>171</td>\n",
" <td>mpfi</td>\n",
" <td>3.27</td>\n",
" <td>3.35</td>\n",
" <td>9.20</td>\n",
" <td>156</td>\n",
" <td>5200</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>15690</td>\n",
" </tr>\n",
" <tr>\n",
" <th>181</th>\n",
" <td>-1</td>\n",
" <td>?</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.5</td>\n",
" <td>...</td>\n",
" <td>161</td>\n",
" <td>mpfi</td>\n",
" <td>3.27</td>\n",
" <td>3.35</td>\n",
" <td>9.20</td>\n",
" <td>156</td>\n",
" <td>5200</td>\n",
" <td>19</td>\n",
" <td>24</td>\n",
" <td>15750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>182</th>\n",
" <td>2</td>\n",
" <td>122</td>\n",
" <td>volkswagen</td>\n",
" <td>diesel</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>97.3</td>\n",
" <td>...</td>\n",
" <td>97</td>\n",
" <td>idi</td>\n",
" <td>3.01</td>\n",
" <td>3.40</td>\n",
" <td>23.00</td>\n",
" <td>52</td>\n",
" <td>4800</td>\n",
" <td>37</td>\n",
" <td>46</td>\n",
" <td>7775</td>\n",
" </tr>\n",
" <tr>\n",
" <th>183</th>\n",
" <td>2</td>\n",
" <td>122</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>97.3</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>9.00</td>\n",
" <td>85</td>\n",
" <td>5250</td>\n",
" <td>27</td>\n",
" <td>34</td>\n",
" <td>7975</td>\n",
" </tr>\n",
" <tr>\n",
" <th>184</th>\n",
" <td>2</td>\n",
" <td>94</td>\n",
" <td>volkswagen</td>\n",
" <td>diesel</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>97.3</td>\n",
" <td>...</td>\n",
" <td>97</td>\n",
" <td>idi</td>\n",
" <td>3.01</td>\n",
" <td>3.40</td>\n",
" <td>23.00</td>\n",
" <td>52</td>\n",
" <td>4800</td>\n",
" <td>37</td>\n",
" <td>46</td>\n",
" <td>7995</td>\n",
" </tr>\n",
" <tr>\n",
" <th>185</th>\n",
" <td>2</td>\n",
" <td>94</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>97.3</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>9.00</td>\n",
" <td>85</td>\n",
" <td>5250</td>\n",
" <td>27</td>\n",
" <td>34</td>\n",
" <td>8195</td>\n",
" </tr>\n",
" <tr>\n",
" <th>186</th>\n",
" <td>2</td>\n",
" <td>94</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>97.3</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>9.00</td>\n",
" <td>85</td>\n",
" <td>5250</td>\n",
" <td>27</td>\n",
" <td>34</td>\n",
" <td>8495</td>\n",
" </tr>\n",
" <tr>\n",
" <th>187</th>\n",
" <td>2</td>\n",
" <td>94</td>\n",
" <td>volkswagen</td>\n",
" <td>diesel</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>97.3</td>\n",
" <td>...</td>\n",
" <td>97</td>\n",
" <td>idi</td>\n",
" <td>3.01</td>\n",
" <td>3.40</td>\n",
" <td>23.00</td>\n",
" <td>68</td>\n",
" <td>4500</td>\n",
" <td>37</td>\n",
" <td>42</td>\n",
" <td>9495</td>\n",
" </tr>\n",
" <tr>\n",
" <th>188</th>\n",
" <td>2</td>\n",
" <td>94</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>97.3</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>10.00</td>\n",
" <td>100</td>\n",
" <td>5500</td>\n",
" <td>26</td>\n",
" <td>32</td>\n",
" <td>9995</td>\n",
" </tr>\n",
" <tr>\n",
" <th>189</th>\n",
" <td>3</td>\n",
" <td>?</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>convertible</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.50</td>\n",
" <td>90</td>\n",
" <td>5500</td>\n",
" <td>24</td>\n",
" <td>29</td>\n",
" <td>11595</td>\n",
" </tr>\n",
" <tr>\n",
" <th>190</th>\n",
" <td>3</td>\n",
" <td>256</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>two</td>\n",
" <td>hatchback</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>94.5</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.50</td>\n",
" <td>90</td>\n",
" <td>5500</td>\n",
" <td>24</td>\n",
" <td>29</td>\n",
" <td>9980</td>\n",
" </tr>\n",
" <tr>\n",
" <th>191</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>100.4</td>\n",
" <td>...</td>\n",
" <td>136</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>8.50</td>\n",
" <td>110</td>\n",
" <td>5500</td>\n",
" <td>19</td>\n",
" <td>24</td>\n",
" <td>13295</td>\n",
" </tr>\n",
" <tr>\n",
" <th>192</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>volkswagen</td>\n",
" <td>diesel</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>100.4</td>\n",
" <td>...</td>\n",
" <td>97</td>\n",
" <td>idi</td>\n",
" <td>3.01</td>\n",
" <td>3.40</td>\n",
" <td>23.00</td>\n",
" <td>68</td>\n",
" <td>4500</td>\n",
" <td>33</td>\n",
" <td>38</td>\n",
" <td>13845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>193</th>\n",
" <td>0</td>\n",
" <td>?</td>\n",
" <td>volkswagen</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>100.4</td>\n",
" <td>...</td>\n",
" <td>109</td>\n",
" <td>mpfi</td>\n",
" <td>3.19</td>\n",
" <td>3.40</td>\n",
" <td>9.00</td>\n",
" <td>88</td>\n",
" <td>5500</td>\n",
" <td>25</td>\n",
" <td>31</td>\n",
" <td>12290</td>\n",
" </tr>\n",
" <tr>\n",
" <th>194</th>\n",
" <td>-2</td>\n",
" <td>103</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.50</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>12940</td>\n",
" </tr>\n",
" <tr>\n",
" <th>195</th>\n",
" <td>-1</td>\n",
" <td>74</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.50</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>13415</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>-2</td>\n",
" <td>103</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.50</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>15985</td>\n",
" </tr>\n",
" <tr>\n",
" <th>197</th>\n",
" <td>-1</td>\n",
" <td>74</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.50</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>24</td>\n",
" <td>28</td>\n",
" <td>16515</td>\n",
" </tr>\n",
" <tr>\n",
" <th>198</th>\n",
" <td>-2</td>\n",
" <td>103</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.15</td>\n",
" <td>7.50</td>\n",
" <td>162</td>\n",
" <td>5100</td>\n",
" <td>17</td>\n",
" <td>22</td>\n",
" <td>18420</td>\n",
" </tr>\n",
" <tr>\n",
" <th>199</th>\n",
" <td>-1</td>\n",
" <td>74</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>wagon</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>104.3</td>\n",
" <td>...</td>\n",
" <td>130</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.15</td>\n",
" <td>7.50</td>\n",
" <td>162</td>\n",
" <td>5100</td>\n",
" <td>17</td>\n",
" <td>22</td>\n",
" <td>18950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>200</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.50</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>23</td>\n",
" <td>28</td>\n",
" <td>16845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>201</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>8.70</td>\n",
" <td>160</td>\n",
" <td>5300</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>19045</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>173</td>\n",
" <td>mpfi</td>\n",
" <td>3.58</td>\n",
" <td>2.87</td>\n",
" <td>8.80</td>\n",
" <td>134</td>\n",
" <td>5500</td>\n",
" <td>18</td>\n",
" <td>23</td>\n",
" <td>21485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>203</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>diesel</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>145</td>\n",
" <td>idi</td>\n",
" <td>3.01</td>\n",
" <td>3.40</td>\n",
" <td>23.00</td>\n",
" <td>106</td>\n",
" <td>4800</td>\n",
" <td>26</td>\n",
" <td>27</td>\n",
" <td>22470</td>\n",
" </tr>\n",
" <tr>\n",
" <th>204</th>\n",
" <td>-1</td>\n",
" <td>95</td>\n",
" <td>volvo</td>\n",
" <td>gas</td>\n",
" <td>turbo</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>rwd</td>\n",
" <td>front</td>\n",
" <td>109.1</td>\n",
" <td>...</td>\n",
" <td>141</td>\n",
" <td>mpfi</td>\n",
" <td>3.78</td>\n",
" <td>3.15</td>\n",
" <td>9.50</td>\n",
" <td>114</td>\n",
" <td>5400</td>\n",
" <td>19</td>\n",
" <td>25</td>\n",
" <td>22625</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>205 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" symboling normalized-losses make fuel-type aspiration \\\n",
"0 3 ? alfa-romero gas std \n",
"1 3 ? alfa-romero gas std \n",
"2 1 ? alfa-romero gas std \n",
"3 2 164 audi gas std \n",
"4 2 164 audi gas std \n",
"5 2 ? audi gas std \n",
"6 1 158 audi gas std \n",
"7 1 ? audi gas std \n",
"8 1 158 audi gas turbo \n",
"9 0 ? audi gas turbo \n",
"10 2 192 bmw gas std \n",
"11 0 192 bmw gas std \n",
"12 0 188 bmw gas std \n",
"13 0 188 bmw gas std \n",
"14 1 ? bmw gas std \n",
"15 0 ? bmw gas std \n",
"16 0 ? bmw gas std \n",
"17 0 ? bmw gas std \n",
"18 2 121 chevrolet gas std \n",
"19 1 98 chevrolet gas std \n",
"20 0 81 chevrolet gas std \n",
"21 1 118 dodge gas std \n",
"22 1 118 dodge gas std \n",
"23 1 118 dodge gas turbo \n",
"24 1 148 dodge gas std \n",
"25 1 148 dodge gas std \n",
"26 1 148 dodge gas std \n",
"27 1 148 dodge gas turbo \n",
"28 -1 110 dodge gas std \n",
"29 3 145 dodge gas turbo \n",
".. ... ... ... ... ... \n",
"175 -1 65 toyota gas std \n",
"176 -1 65 toyota gas std \n",
"177 -1 65 toyota gas std \n",
"178 3 197 toyota gas std \n",
"179 3 197 toyota gas std \n",
"180 -1 90 toyota gas std \n",
"181 -1 ? toyota gas std \n",
"182 2 122 volkswagen diesel std \n",
"183 2 122 volkswagen gas std \n",
"184 2 94 volkswagen diesel std \n",
"185 2 94 volkswagen gas std \n",
"186 2 94 volkswagen gas std \n",
"187 2 94 volkswagen diesel turbo \n",
"188 2 94 volkswagen gas std \n",
"189 3 ? volkswagen gas std \n",
"190 3 256 volkswagen gas std \n",
"191 0 ? volkswagen gas std \n",
"192 0 ? volkswagen diesel turbo \n",
"193 0 ? volkswagen gas std \n",
"194 -2 103 volvo gas std \n",
"195 -1 74 volvo gas std \n",
"196 -2 103 volvo gas std \n",
"197 -1 74 volvo gas std \n",
"198 -2 103 volvo gas turbo \n",
"199 -1 74 volvo gas turbo \n",
"200 -1 95 volvo gas std \n",
"201 -1 95 volvo gas turbo \n",
"202 -1 95 volvo gas std \n",
"203 -1 95 volvo diesel turbo \n",
"204 -1 95 volvo gas turbo \n",
"\n",
" num-of-doors body-style drive-wheels engine-location wheel-base ... \\\n",
"0 two convertible rwd front 88.6 ... \n",
"1 two convertible rwd front 88.6 ... \n",
"2 two hatchback rwd front 94.5 ... \n",
"3 four sedan fwd front 99.8 ... \n",
"4 four sedan 4wd front 99.4 ... \n",
"5 two sedan fwd front 99.8 ... \n",
"6 four sedan fwd front 105.8 ... \n",
"7 four wagon fwd front 105.8 ... \n",
"8 four sedan fwd front 105.8 ... \n",
"9 two hatchback 4wd front 99.5 ... \n",
"10 two sedan rwd front 101.2 ... \n",
"11 four sedan rwd front 101.2 ... \n",
"12 two sedan rwd front 101.2 ... \n",
"13 four sedan rwd front 101.2 ... \n",
"14 four sedan rwd front 103.5 ... \n",
"15 four sedan rwd front 103.5 ... \n",
"16 two sedan rwd front 103.5 ... \n",
"17 four sedan rwd front 110.0 ... \n",
"18 two hatchback fwd front 88.4 ... \n",
"19 two hatchback fwd front 94.5 ... \n",
"20 four sedan fwd front 94.5 ... \n",
"21 two hatchback fwd front 93.7 ... \n",
"22 two hatchback fwd front 93.7 ... \n",
"23 two hatchback fwd front 93.7 ... \n",
"24 four hatchback fwd front 93.7 ... \n",
"25 four sedan fwd front 93.7 ... \n",
"26 four sedan fwd front 93.7 ... \n",
"27 ? sedan fwd front 93.7 ... \n",
"28 four wagon fwd front 103.3 ... \n",
"29 two hatchback fwd front 95.9 ... \n",
".. ... ... ... ... ... ... \n",
"175 four hatchback fwd front 102.4 ... \n",
"176 four sedan fwd front 102.4 ... \n",
"177 four hatchback fwd front 102.4 ... \n",
"178 two hatchback rwd front 102.9 ... \n",
"179 two hatchback rwd front 102.9 ... \n",
"180 four sedan rwd front 104.5 ... \n",
"181 four wagon rwd front 104.5 ... \n",
"182 two sedan fwd front 97.3 ... \n",
"183 two sedan fwd front 97.3 ... \n",
"184 four sedan fwd front 97.3 ... \n",
"185 four sedan fwd front 97.3 ... \n",
"186 four sedan fwd front 97.3 ... \n",
"187 four sedan fwd front 97.3 ... \n",
"188 four sedan fwd front 97.3 ... \n",
"189 two convertible fwd front 94.5 ... \n",
"190 two hatchback fwd front 94.5 ... \n",
"191 four sedan fwd front 100.4 ... \n",
"192 four sedan fwd front 100.4 ... \n",
"193 four wagon fwd front 100.4 ... \n",
"194 four sedan rwd front 104.3 ... \n",
"195 four wagon rwd front 104.3 ... \n",
"196 four sedan rwd front 104.3 ... \n",
"197 four wagon rwd front 104.3 ... \n",
"198 four sedan rwd front 104.3 ... \n",
"199 four wagon rwd front 104.3 ... \n",
"200 four sedan rwd front 109.1 ... \n",
"201 four sedan rwd front 109.1 ... \n",
"202 four sedan rwd front 109.1 ... \n",
"203 four sedan rwd front 109.1 ... \n",
"204 four sedan rwd front 109.1 ... \n",
"\n",
" engine-size fuel-system bore stroke compression-ratio horsepower \\\n",
"0 130 mpfi 3.47 2.68 9.00 111 \n",
"1 130 mpfi 3.47 2.68 9.00 111 \n",
"2 152 mpfi 2.68 3.47 9.00 154 \n",
"3 109 mpfi 3.19 3.40 10.00 102 \n",
"4 136 mpfi 3.19 3.40 8.00 115 \n",
"5 136 mpfi 3.19 3.40 8.50 110 \n",
"6 136 mpfi 3.19 3.40 8.50 110 \n",
"7 136 mpfi 3.19 3.40 8.50 110 \n",
"8 131 mpfi 3.13 3.40 8.30 140 \n",
"9 131 mpfi 3.13 3.40 7.00 160 \n",
"10 108 mpfi 3.50 2.80 8.80 101 \n",
"11 108 mpfi 3.50 2.80 8.80 101 \n",
"12 164 mpfi 3.31 3.19 9.00 121 \n",
"13 164 mpfi 3.31 3.19 9.00 121 \n",
"14 164 mpfi 3.31 3.19 9.00 121 \n",
"15 209 mpfi 3.62 3.39 8.00 182 \n",
"16 209 mpfi 3.62 3.39 8.00 182 \n",
"17 209 mpfi 3.62 3.39 8.00 182 \n",
"18 61 2bbl 2.91 3.03 9.50 48 \n",
"19 90 2bbl 3.03 3.11 9.60 70 \n",
"20 90 2bbl 3.03 3.11 9.60 70 \n",
"21 90 2bbl 2.97 3.23 9.41 68 \n",
"22 90 2bbl 2.97 3.23 9.40 68 \n",
"23 98 mpfi 3.03 3.39 7.60 102 \n",
"24 90 2bbl 2.97 3.23 9.40 68 \n",
"25 90 2bbl 2.97 3.23 9.40 68 \n",
"26 90 2bbl 2.97 3.23 9.40 68 \n",
"27 98 mpfi 3.03 3.39 7.60 102 \n",
"28 122 2bbl 3.34 3.46 8.50 88 \n",
"29 156 mfi 3.60 3.90 7.00 145 \n",
".. ... ... ... ... ... ... \n",
"175 122 mpfi 3.31 3.54 8.70 92 \n",
"176 122 mpfi 3.31 3.54 8.70 92 \n",
"177 122 mpfi 3.31 3.54 8.70 92 \n",
"178 171 mpfi 3.27 3.35 9.30 161 \n",
"179 171 mpfi 3.27 3.35 9.30 161 \n",
"180 171 mpfi 3.27 3.35 9.20 156 \n",
"181 161 mpfi 3.27 3.35 9.20 156 \n",
"182 97 idi 3.01 3.40 23.00 52 \n",
"183 109 mpfi 3.19 3.40 9.00 85 \n",
"184 97 idi 3.01 3.40 23.00 52 \n",
"185 109 mpfi 3.19 3.40 9.00 85 \n",
"186 109 mpfi 3.19 3.40 9.00 85 \n",
"187 97 idi 3.01 3.40 23.00 68 \n",
"188 109 mpfi 3.19 3.40 10.00 100 \n",
"189 109 mpfi 3.19 3.40 8.50 90 \n",
"190 109 mpfi 3.19 3.40 8.50 90 \n",
"191 136 mpfi 3.19 3.40 8.50 110 \n",
"192 97 idi 3.01 3.40 23.00 68 \n",
"193 109 mpfi 3.19 3.40 9.00 88 \n",
"194 141 mpfi 3.78 3.15 9.50 114 \n",
"195 141 mpfi 3.78 3.15 9.50 114 \n",
"196 141 mpfi 3.78 3.15 9.50 114 \n",
"197 141 mpfi 3.78 3.15 9.50 114 \n",
"198 130 mpfi 3.62 3.15 7.50 162 \n",
"199 130 mpfi 3.62 3.15 7.50 162 \n",
"200 141 mpfi 3.78 3.15 9.50 114 \n",
"201 141 mpfi 3.78 3.15 8.70 160 \n",
"202 173 mpfi 3.58 2.87 8.80 134 \n",
"203 145 idi 3.01 3.40 23.00 106 \n",
"204 141 mpfi 3.78 3.15 9.50 114 \n",
"\n",
" peak-rpm city-mpg highway-mpg price \n",
"0 5000 21 27 13495 \n",
"1 5000 21 27 16500 \n",
"2 5000 19 26 16500 \n",
"3 5500 24 30 13950 \n",
"4 5500 18 22 17450 \n",
"5 5500 19 25 15250 \n",
"6 5500 19 25 17710 \n",
"7 5500 19 25 18920 \n",
"8 5500 17 20 23875 \n",
"9 5500 16 22 ? \n",
"10 5800 23 29 16430 \n",
"11 5800 23 29 16925 \n",
"12 4250 21 28 20970 \n",
"13 4250 21 28 21105 \n",
"14 4250 20 25 24565 \n",
"15 5400 16 22 30760 \n",
"16 5400 16 22 41315 \n",
"17 5400 15 20 36880 \n",
"18 5100 47 53 5151 \n",
"19 5400 38 43 6295 \n",
"20 5400 38 43 6575 \n",
"21 5500 37 41 5572 \n",
"22 5500 31 38 6377 \n",
"23 5500 24 30 7957 \n",
"24 5500 31 38 6229 \n",
"25 5500 31 38 6692 \n",
"26 5500 31 38 7609 \n",
"27 5500 24 30 8558 \n",
"28 5000 24 30 8921 \n",
"29 5000 19 24 12964 \n",
".. ... ... ... ... \n",
"175 4200 27 32 9988 \n",
"176 4200 27 32 10898 \n",
"177 4200 27 32 11248 \n",
"178 5200 20 24 16558 \n",
"179 5200 19 24 15998 \n",
"180 5200 20 24 15690 \n",
"181 5200 19 24 15750 \n",
"182 4800 37 46 7775 \n",
"183 5250 27 34 7975 \n",
"184 4800 37 46 7995 \n",
"185 5250 27 34 8195 \n",
"186 5250 27 34 8495 \n",
"187 4500 37 42 9495 \n",
"188 5500 26 32 9995 \n",
"189 5500 24 29 11595 \n",
"190 5500 24 29 9980 \n",
"191 5500 19 24 13295 \n",
"192 4500 33 38 13845 \n",
"193 5500 25 31 12290 \n",
"194 5400 23 28 12940 \n",
"195 5400 23 28 13415 \n",
"196 5400 24 28 15985 \n",
"197 5400 24 28 16515 \n",
"198 5100 17 22 18420 \n",
"199 5100 17 22 18950 \n",
"200 5400 23 28 16845 \n",
"201 5300 19 25 19045 \n",
"202 5500 18 23 21485 \n",
"203 4800 26 27 22470 \n",
"204 5400 19 25 22625 \n",
"\n",
"[205 rows x 26 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dropna(subset=[\"price\"], axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we have successfully read the raw dataset and add the correct headers into the data frame."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <div class=\"alert alert-danger alertdanger\" style=\"margin-top: 20px\">\n",
"<h1> Question #2: </h1>\n",
"<b>Find the name of the columns of the dataframe</b>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration',\n",
" 'num-of-doors', 'body-style', 'drive-wheels', 'engine-location',\n",
" 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type',\n",
" 'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke',\n",
" 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg',\n",
" 'highway-mpg', 'price'],\n",
" dtype='object')\n"
]
}
],
"source": [
"# Write your code below and press Shift+Enter to execute \n",
"print(df.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click <b>here</b> for the solution.\n",
"\n",
"<!-- The answer is below:\n",
"\n",
"print(df.columns)\n",
"\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Save Dataset</h2>\n",
"<p>\n",
"Correspondingly, Pandas enables us to save the dataset to csv by using the <code>dataframe.to_csv()</code> method, you can add the file path and name along with quotation marks in the brackets.\n",
"</p>\n",
"<p>\n",
" For example, if you would save the dataframe <b>df</b> as <b>automobile.csv</b> to your local machine, you may use the syntax below:\n",
"</p>"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"df.to_csv(\"automobile.csv\", index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can also read and save other file formats, we can use similar functions to **`pd.read_csv()`** and **`df.to_csv()`** for other data formats, the functions are listed in the following table:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Read/Save Other Data Formats</h2>\n",
"\n",
"\n",
"\n",
"| Data Formate | Read | Save |\n",
"| ------------- |:--------------:| ----------------:|\n",
"| csv | `pd.read_csv()` |`df.to_csv()` |\n",
"| json | `pd.read_json()` |`df.to_json()` |\n",
"| excel | `pd.read_excel()`|`df.to_excel()` |\n",
"| hdf | `pd.read_hdf()` |`df.to_hdf()` |\n",
"| sql | `pd.read_sql()` |`df.to_sql()` |\n",
"| ... | ... | ... |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1 id=\"basic_insight\">Basic Insight of Dataset</h1>\n",
"<p>\n",
"After reading data into Pandas dataframe, it is time for us to explore the dataset.<br>\n",
"There are several ways to obtain essential insights of the data to help us better understand our dataset.\n",
"</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Data Types</h2>\n",
"<p>\n",
"Data has a variety of types.<br>\n",
"The main types stored in Pandas dataframes are <b>object</b>, <b>float</b>, <b>int</b>, <b>bool</b> and <b>datetime64</b>. In order to better learn about each attribute, it is always good for us to know the data type of each column. In Pandas:\n",
"</p>"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"symboling int64\n",
"normalized-losses object\n",
"make object\n",
"fuel-type object\n",
"aspiration object\n",
"num-of-doors object\n",
"body-style object\n",
"drive-wheels object\n",
"engine-location object\n",
"wheel-base float64\n",
"length float64\n",
"width float64\n",
"height float64\n",
"curb-weight int64\n",
"engine-type object\n",
"num-of-cylinders object\n",
"engine-size int64\n",
"fuel-system object\n",
"bore object\n",
"stroke object\n",
"compression-ratio float64\n",
"horsepower object\n",
"peak-rpm object\n",
"city-mpg int64\n",
"highway-mpg int64\n",
"price object\n",
"dtype: object"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"returns a Series with the data type of each column."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"symboling int64\n",
"normalized-losses object\n",
"make object\n",
"fuel-type object\n",
"aspiration object\n",
"num-of-doors object\n",
"body-style object\n",
"drive-wheels object\n",
"engine-location object\n",
"wheel-base float64\n",
"length float64\n",
"width float64\n",
"height float64\n",
"curb-weight int64\n",
"engine-type object\n",
"num-of-cylinders object\n",
"engine-size int64\n",
"fuel-system object\n",
"bore object\n",
"stroke object\n",
"compression-ratio float64\n",
"horsepower object\n",
"peak-rpm object\n",
"city-mpg int64\n",
"highway-mpg int64\n",
"price object\n",
"dtype: object\n"
]
}
],
"source": [
"# check the data type of data frame \"df\" by .dtypes\n",
"print(df.dtypes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>\n",
"As a result, as shown above, it is clear to see that the data type of \"symboling\" and \"curb-weight\" are <code>int64</code>, \"normalized-losses\" is <code>object</code>, and \"wheel-base\" is <code>float64</code>, etc.\n",
"</p>\n",
"<p>\n",
"These data types can be changed; we will learn how to accomplish this in a later module.\n",
"</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Describe</h2>\n",
"If we would like to get a statistical summary of each column, such as count, column mean value, column standard deviation, etc. We use the describe method:"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"dataframe.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This method will provide various summary statistics, excluding <code>NaN</code> (Not a Number) values."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>symboling</th>\n",
" <th>wheel-base</th>\n",
" <th>length</th>\n",
" <th>width</th>\n",
" <th>height</th>\n",
" <th>curb-weight</th>\n",
" <th>engine-size</th>\n",
" <th>compression-ratio</th>\n",
" <th>city-mpg</th>\n",
" <th>highway-mpg</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.834146</td>\n",
" <td>98.756585</td>\n",
" <td>174.049268</td>\n",
" <td>65.907805</td>\n",
" <td>53.724878</td>\n",
" <td>2555.565854</td>\n",
" <td>126.907317</td>\n",
" <td>10.142537</td>\n",
" <td>25.219512</td>\n",
" <td>30.751220</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1.245307</td>\n",
" <td>6.021776</td>\n",
" <td>12.337289</td>\n",
" <td>2.145204</td>\n",
" <td>2.443522</td>\n",
" <td>520.680204</td>\n",
" <td>41.642693</td>\n",
" <td>3.972040</td>\n",
" <td>6.542142</td>\n",
" <td>6.886443</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>-2.000000</td>\n",
" <td>86.600000</td>\n",
" <td>141.100000</td>\n",
" <td>60.300000</td>\n",
" <td>47.800000</td>\n",
" <td>1488.000000</td>\n",
" <td>61.000000</td>\n",
" <td>7.000000</td>\n",
" <td>13.000000</td>\n",
" <td>16.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>0.000000</td>\n",
" <td>94.500000</td>\n",
" <td>166.300000</td>\n",
" <td>64.100000</td>\n",
" <td>52.000000</td>\n",
" <td>2145.000000</td>\n",
" <td>97.000000</td>\n",
" <td>8.600000</td>\n",
" <td>19.000000</td>\n",
" <td>25.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1.000000</td>\n",
" <td>97.000000</td>\n",
" <td>173.200000</td>\n",
" <td>65.500000</td>\n",
" <td>54.100000</td>\n",
" <td>2414.000000</td>\n",
" <td>120.000000</td>\n",
" <td>9.000000</td>\n",
" <td>24.000000</td>\n",
" <td>30.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2.000000</td>\n",
" <td>102.400000</td>\n",
" <td>183.100000</td>\n",
" <td>66.900000</td>\n",
" <td>55.500000</td>\n",
" <td>2935.000000</td>\n",
" <td>141.000000</td>\n",
" <td>9.400000</td>\n",
" <td>30.000000</td>\n",
" <td>34.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>3.000000</td>\n",
" <td>120.900000</td>\n",
" <td>208.100000</td>\n",
" <td>72.300000</td>\n",
" <td>59.800000</td>\n",
" <td>4066.000000</td>\n",
" <td>326.000000</td>\n",
" <td>23.000000</td>\n",
" <td>49.000000</td>\n",
" <td>54.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" symboling wheel-base length width height \\\n",
"count 205.000000 205.000000 205.000000 205.000000 205.000000 \n",
"mean 0.834146 98.756585 174.049268 65.907805 53.724878 \n",
"std 1.245307 6.021776 12.337289 2.145204 2.443522 \n",
"min -2.000000 86.600000 141.100000 60.300000 47.800000 \n",
"25% 0.000000 94.500000 166.300000 64.100000 52.000000 \n",
"50% 1.000000 97.000000 173.200000 65.500000 54.100000 \n",
"75% 2.000000 102.400000 183.100000 66.900000 55.500000 \n",
"max 3.000000 120.900000 208.100000 72.300000 59.800000 \n",
"\n",
" curb-weight engine-size compression-ratio city-mpg highway-mpg \n",
"count 205.000000 205.000000 205.000000 205.000000 205.000000 \n",
"mean 2555.565854 126.907317 10.142537 25.219512 30.751220 \n",
"std 520.680204 41.642693 3.972040 6.542142 6.886443 \n",
"min 1488.000000 61.000000 7.000000 13.000000 16.000000 \n",
"25% 2145.000000 97.000000 8.600000 19.000000 25.000000 \n",
"50% 2414.000000 120.000000 9.000000 24.000000 30.000000 \n",
"75% 2935.000000 141.000000 9.400000 30.000000 34.000000 \n",
"max 4066.000000 326.000000 23.000000 49.000000 54.000000 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>\n",
"This shows the statistical summary of all numeric-typed (int, float) columns.<br>\n",
"For example, the attribute \"symboling\" has 205 counts, the mean value of this column is 0.83, the standard deviation is 1.25, the minimum value is -2, 25th percentile is 0, 50th percentile is 1, 75th percentile is 2, and the maximum value is 3.\n",
"<br>\n",
"However, what if we would also like to check all the columns including those that are of type object.\n",
"<br><br>\n",
"\n",
"You can add an argument <code>include = \"all\"</code> inside the bracket. Let's try it again.\n",
"</p>"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>symboling</th>\n",
" <th>normalized-losses</th>\n",
" <th>make</th>\n",
" <th>fuel-type</th>\n",
" <th>aspiration</th>\n",
" <th>num-of-doors</th>\n",
" <th>body-style</th>\n",
" <th>drive-wheels</th>\n",
" <th>engine-location</th>\n",
" <th>wheel-base</th>\n",
" <th>...</th>\n",
" <th>engine-size</th>\n",
" <th>fuel-system</th>\n",
" <th>bore</th>\n",
" <th>stroke</th>\n",
" <th>compression-ratio</th>\n",
" <th>horsepower</th>\n",
" <th>peak-rpm</th>\n",
" <th>city-mpg</th>\n",
" <th>highway-mpg</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>205.000000</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205.000000</td>\n",
" <td>...</td>\n",
" <td>205.000000</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205.000000</td>\n",
" <td>205</td>\n",
" <td>205</td>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" <td>205</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>NaN</td>\n",
" <td>52</td>\n",
" <td>22</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>8</td>\n",
" <td>39</td>\n",
" <td>37</td>\n",
" <td>NaN</td>\n",
" <td>60</td>\n",
" <td>24</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>187</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>NaN</td>\n",
" <td>?</td>\n",
" <td>toyota</td>\n",
" <td>gas</td>\n",
" <td>std</td>\n",
" <td>four</td>\n",
" <td>sedan</td>\n",
" <td>fwd</td>\n",
" <td>front</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>mpfi</td>\n",
" <td>3.62</td>\n",
" <td>3.40</td>\n",
" <td>NaN</td>\n",
" <td>68</td>\n",
" <td>5500</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>?</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>NaN</td>\n",
" <td>41</td>\n",
" <td>32</td>\n",
" <td>185</td>\n",
" <td>168</td>\n",
" <td>114</td>\n",
" <td>96</td>\n",
" <td>120</td>\n",
" <td>202</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>94</td>\n",
" <td>23</td>\n",
" <td>20</td>\n",
" <td>NaN</td>\n",
" <td>19</td>\n",
" <td>37</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.834146</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>98.756585</td>\n",
" <td>...</td>\n",
" <td>126.907317</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>10.142537</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>25.219512</td>\n",
" <td>30.751220</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1.245307</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6.021776</td>\n",
" <td>...</td>\n",
" <td>41.642693</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3.972040</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6.542142</td>\n",
" <td>6.886443</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>-2.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>86.600000</td>\n",
" <td>...</td>\n",
" <td>61.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>7.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>13.000000</td>\n",
" <td>16.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>94.500000</td>\n",
" <td>...</td>\n",
" <td>97.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>8.600000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>19.000000</td>\n",
" <td>25.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>97.000000</td>\n",
" <td>...</td>\n",
" <td>120.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>9.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>24.000000</td>\n",
" <td>30.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>102.400000</td>\n",
" <td>...</td>\n",
" <td>141.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>9.400000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>30.000000</td>\n",
" <td>34.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>3.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>120.900000</td>\n",
" <td>...</td>\n",
" <td>326.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>23.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>49.000000</td>\n",
" <td>54.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>11 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" symboling normalized-losses make fuel-type aspiration \\\n",
"count 205.000000 205 205 205 205 \n",
"unique NaN 52 22 2 2 \n",
"top NaN ? toyota gas std \n",
"freq NaN 41 32 185 168 \n",
"mean 0.834146 NaN NaN NaN NaN \n",
"std 1.245307 NaN NaN NaN NaN \n",
"min -2.000000 NaN NaN NaN NaN \n",
"25% 0.000000 NaN NaN NaN NaN \n",
"50% 1.000000 NaN NaN NaN NaN \n",
"75% 2.000000 NaN NaN NaN NaN \n",
"max 3.000000 NaN NaN NaN NaN \n",
"\n",
" num-of-doors body-style drive-wheels engine-location wheel-base ... \\\n",
"count 205 205 205 205 205.000000 ... \n",
"unique 3 5 3 2 NaN ... \n",
"top four sedan fwd front NaN ... \n",
"freq 114 96 120 202 NaN ... \n",
"mean NaN NaN NaN NaN 98.756585 ... \n",
"std NaN NaN NaN NaN 6.021776 ... \n",
"min NaN NaN NaN NaN 86.600000 ... \n",
"25% NaN NaN NaN NaN 94.500000 ... \n",
"50% NaN NaN NaN NaN 97.000000 ... \n",
"75% NaN NaN NaN NaN 102.400000 ... \n",
"max NaN NaN NaN NaN 120.900000 ... \n",
"\n",
" engine-size fuel-system bore stroke compression-ratio horsepower \\\n",
"count 205.000000 205 205 205 205.000000 205 \n",
"unique NaN 8 39 37 NaN 60 \n",
"top NaN mpfi 3.62 3.40 NaN 68 \n",
"freq NaN 94 23 20 NaN 19 \n",
"mean 126.907317 NaN NaN NaN 10.142537 NaN \n",
"std 41.642693 NaN NaN NaN 3.972040 NaN \n",
"min 61.000000 NaN NaN NaN 7.000000 NaN \n",
"25% 97.000000 NaN NaN NaN 8.600000 NaN \n",
"50% 120.000000 NaN NaN NaN 9.000000 NaN \n",
"75% 141.000000 NaN NaN NaN 9.400000 NaN \n",
"max 326.000000 NaN NaN NaN 23.000000 NaN \n",
"\n",
" peak-rpm city-mpg highway-mpg price \n",
"count 205 205.000000 205.000000 205 \n",
"unique 24 NaN NaN 187 \n",
"top 5500 NaN NaN ? \n",
"freq 37 NaN NaN 4 \n",
"mean NaN 25.219512 30.751220 NaN \n",
"std NaN 6.542142 6.886443 NaN \n",
"min NaN 13.000000 16.000000 NaN \n",
"25% NaN 19.000000 25.000000 NaN \n",
"50% NaN 24.000000 30.000000 NaN \n",
"75% NaN 30.000000 34.000000 NaN \n",
"max NaN 49.000000 54.000000 NaN \n",
"\n",
"[11 rows x 26 columns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# describe all the columns in \"df\" \n",
"df.describe(include = \"all\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>\n",
"Now, it provides the statistical summary of all the columns, including object-typed attributes.<br>\n",
"We can now see how many unique values, which is the top value and the frequency of top value in the object-typed columns.<br>\n",
"Some values in the table above show as \"NaN\", this is because those numbers are not available regarding a particular column type.<br>\n",
"</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-danger alertdanger\" style=\"margin-top: 20px\">\n",
"<h1> Question #3: </h1>\n",
"\n",
"<p>\n",
"You can select the columns of a data frame by indicating the name of each column, for example, you can select the three columns as follows:\n",
"</p>\n",
"<p>\n",
" <code>dataframe[[' column 1 ',column 2', 'column 3']]</code>\n",
"</p>\n",
"<p>\n",
"Where \"column\" is the name of the column, you can apply the method \".describe()\" to get the statistics of those columns as follows:\n",
"</p>\n",
"<p>\n",
" <code>dataframe[[' column 1 ',column 2', 'column 3'] ].describe()</code>\n",
"</p>\n",
"\n",
"Apply the method to \".describe()\" to the columns 'length' and 'compression-ratio'.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>length</th>\n",
" <th>compression-ratio</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>205.000000</td>\n",
" <td>205.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>174.049268</td>\n",
" <td>10.142537</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>12.337289</td>\n",
" <td>3.972040</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>141.100000</td>\n",
" <td>7.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>166.300000</td>\n",
" <td>8.600000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>173.200000</td>\n",
" <td>9.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>183.100000</td>\n",
" <td>9.400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>208.100000</td>\n",
" <td>23.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" length compression-ratio\n",
"count 205.000000 205.000000\n",
"mean 174.049268 10.142537\n",
"std 12.337289 3.972040\n",
"min 141.100000 7.000000\n",
"25% 166.300000 8.600000\n",
"50% 173.200000 9.000000\n",
"75% 183.100000 9.400000\n",
"max 208.100000 23.000000"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Write your code below and press Shift+Enter to execute \n",
"df[['length','compression-ratio']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click <b>here</b> for the solution.\n",
"\n",
"<!-- The answer is below:\n",
"\n",
"df[['length', 'compression-ratio']].describe()\n",
"\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2>Info</h2>\n",
"Another method you can use to check your dataset is:"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"dataframe.info"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It provide a concise summary of your DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<bound method DataFrame.info of symboling normalized-losses make fuel-type aspiration \\\n",
"0 3 ? alfa-romero gas std \n",
"1 3 ? alfa-romero gas std \n",
"2 1 ? alfa-romero gas std \n",
"3 2 164 audi gas std \n",
"4 2 164 audi gas std \n",
"5 2 ? audi gas std \n",
"6 1 158 audi gas std \n",
"7 1 ? audi gas std \n",
"8 1 158 audi gas turbo \n",
"9 0 ? audi gas turbo \n",
"10 2 192 bmw gas std \n",
"11 0 192 bmw gas std \n",
"12 0 188 bmw gas std \n",
"13 0 188 bmw gas std \n",
"14 1 ? bmw gas std \n",
"15 0 ? bmw gas std \n",
"16 0 ? bmw gas std \n",
"17 0 ? bmw gas std \n",
"18 2 121 chevrolet gas std \n",
"19 1 98 chevrolet gas std \n",
"20 0 81 chevrolet gas std \n",
"21 1 118 dodge gas std \n",
"22 1 118 dodge gas std \n",
"23 1 118 dodge gas turbo \n",
"24 1 148 dodge gas std \n",
"25 1 148 dodge gas std \n",
"26 1 148 dodge gas std \n",
"27 1 148 dodge gas turbo \n",
"28 -1 110 dodge gas std \n",
"29 3 145 dodge gas turbo \n",
".. ... ... ... ... ... \n",
"175 -1 65 toyota gas std \n",
"176 -1 65 toyota gas std \n",
"177 -1 65 toyota gas std \n",
"178 3 197 toyota gas std \n",
"179 3 197 toyota gas std \n",
"180 -1 90 toyota gas std \n",
"181 -1 ? toyota gas std \n",
"182 2 122 volkswagen diesel std \n",
"183 2 122 volkswagen gas std \n",
"184 2 94 volkswagen diesel std \n",
"185 2 94 volkswagen gas std \n",
"186 2 94 volkswagen gas std \n",
"187 2 94 volkswagen diesel turbo \n",
"188 2 94 volkswagen gas std \n",
"189 3 ? volkswagen gas std \n",
"190 3 256 volkswagen gas std \n",
"191 0 ? volkswagen gas std \n",
"192 0 ? volkswagen diesel turbo \n",
"193 0 ? volkswagen gas std \n",
"194 -2 103 volvo gas std \n",
"195 -1 74 volvo gas std \n",
"196 -2 103 volvo gas std \n",
"197 -1 74 volvo gas std \n",
"198 -2 103 volvo gas turbo \n",
"199 -1 74 volvo gas turbo \n",
"200 -1 95 volvo gas std \n",
"201 -1 95 volvo gas turbo \n",
"202 -1 95 volvo gas std \n",
"203 -1 95 volvo diesel turbo \n",
"204 -1 95 volvo gas turbo \n",
"\n",
" num-of-doors body-style drive-wheels engine-location wheel-base ... \\\n",
"0 two convertible rwd front 88.6 ... \n",
"1 two convertible rwd front 88.6 ... \n",
"2 two hatchback rwd front 94.5 ... \n",
"3 four sedan fwd front 99.8 ... \n",
"4 four sedan 4wd front 99.4 ... \n",
"5 two sedan fwd front 99.8 ... \n",
"6 four sedan fwd front 105.8 ... \n",
"7 four wagon fwd front 105.8 ... \n",
"8 four sedan fwd front 105.8 ... \n",
"9 two hatchback 4wd front 99.5 ... \n",
"10 two sedan rwd front 101.2 ... \n",
"11 four sedan rwd front 101.2 ... \n",
"12 two sedan rwd front 101.2 ... \n",
"13 four sedan rwd front 101.2 ... \n",
"14 four sedan rwd front 103.5 ... \n",
"15 four sedan rwd front 103.5 ... \n",
"16 two sedan rwd front 103.5 ... \n",
"17 four sedan rwd front 110.0 ... \n",
"18 two hatchback fwd front 88.4 ... \n",
"19 two hatchback fwd front 94.5 ... \n",
"20 four sedan fwd front 94.5 ... \n",
"21 two hatchback fwd front 93.7 ... \n",
"22 two hatchback fwd front 93.7 ... \n",
"23 two hatchback fwd front 93.7 ... \n",
"24 four hatchback fwd front 93.7 ... \n",
"25 four sedan fwd front 93.7 ... \n",
"26 four sedan fwd front 93.7 ... \n",
"27 ? sedan fwd front 93.7 ... \n",
"28 four wagon fwd front 103.3 ... \n",
"29 two hatchback fwd front 95.9 ... \n",
".. ... ... ... ... ... ... \n",
"175 four hatchback fwd front 102.4 ... \n",
"176 four sedan fwd front 102.4 ... \n",
"177 four hatchback fwd front 102.4 ... \n",
"178 two hatchback rwd front 102.9 ... \n",
"179 two hatchback rwd front 102.9 ... \n",
"180 four sedan rwd front 104.5 ... \n",
"181 four wagon rwd front 104.5 ... \n",
"182 two sedan fwd front 97.3 ... \n",
"183 two sedan fwd front 97.3 ... \n",
"184 four sedan fwd front 97.3 ... \n",
"185 four sedan fwd front 97.3 ... \n",
"186 four sedan fwd front 97.3 ... \n",
"187 four sedan fwd front 97.3 ... \n",
"188 four sedan fwd front 97.3 ... \n",
"189 two convertible fwd front 94.5 ... \n",
"190 two hatchback fwd front 94.5 ... \n",
"191 four sedan fwd front 100.4 ... \n",
"192 four sedan fwd front 100.4 ... \n",
"193 four wagon fwd front 100.4 ... \n",
"194 four sedan rwd front 104.3 ... \n",
"195 four wagon rwd front 104.3 ... \n",
"196 four sedan rwd front 104.3 ... \n",
"197 four wagon rwd front 104.3 ... \n",
"198 four sedan rwd front 104.3 ... \n",
"199 four wagon rwd front 104.3 ... \n",
"200 four sedan rwd front 109.1 ... \n",
"201 four sedan rwd front 109.1 ... \n",
"202 four sedan rwd front 109.1 ... \n",
"203 four sedan rwd front 109.1 ... \n",
"204 four sedan rwd front 109.1 ... \n",
"\n",
" engine-size fuel-system bore stroke compression-ratio horsepower \\\n",
"0 130 mpfi 3.47 2.68 9.00 111 \n",
"1 130 mpfi 3.47 2.68 9.00 111 \n",
"2 152 mpfi 2.68 3.47 9.00 154 \n",
"3 109 mpfi 3.19 3.40 10.00 102 \n",
"4 136 mpfi 3.19 3.40 8.00 115 \n",
"5 136 mpfi 3.19 3.40 8.50 110 \n",
"6 136 mpfi 3.19 3.40 8.50 110 \n",
"7 136 mpfi 3.19 3.40 8.50 110 \n",
"8 131 mpfi 3.13 3.40 8.30 140 \n",
"9 131 mpfi 3.13 3.40 7.00 160 \n",
"10 108 mpfi 3.50 2.80 8.80 101 \n",
"11 108 mpfi 3.50 2.80 8.80 101 \n",
"12 164 mpfi 3.31 3.19 9.00 121 \n",
"13 164 mpfi 3.31 3.19 9.00 121 \n",
"14 164 mpfi 3.31 3.19 9.00 121 \n",
"15 209 mpfi 3.62 3.39 8.00 182 \n",
"16 209 mpfi 3.62 3.39 8.00 182 \n",
"17 209 mpfi 3.62 3.39 8.00 182 \n",
"18 61 2bbl 2.91 3.03 9.50 48 \n",
"19 90 2bbl 3.03 3.11 9.60 70 \n",
"20 90 2bbl 3.03 3.11 9.60 70 \n",
"21 90 2bbl 2.97 3.23 9.41 68 \n",
"22 90 2bbl 2.97 3.23 9.40 68 \n",
"23 98 mpfi 3.03 3.39 7.60 102 \n",
"24 90 2bbl 2.97 3.23 9.40 68 \n",
"25 90 2bbl 2.97 3.23 9.40 68 \n",
"26 90 2bbl 2.97 3.23 9.40 68 \n",
"27 98 mpfi 3.03 3.39 7.60 102 \n",
"28 122 2bbl 3.34 3.46 8.50 88 \n",
"29 156 mfi 3.60 3.90 7.00 145 \n",
".. ... ... ... ... ... ... \n",
"175 122 mpfi 3.31 3.54 8.70 92 \n",
"176 122 mpfi 3.31 3.54 8.70 92 \n",
"177 122 mpfi 3.31 3.54 8.70 92 \n",
"178 171 mpfi 3.27 3.35 9.30 161 \n",
"179 171 mpfi 3.27 3.35 9.30 161 \n",
"180 171 mpfi 3.27 3.35 9.20 156 \n",
"181 161 mpfi 3.27 3.35 9.20 156 \n",
"182 97 idi 3.01 3.40 23.00 52 \n",
"183 109 mpfi 3.19 3.40 9.00 85 \n",
"184 97 idi 3.01 3.40 23.00 52 \n",
"185 109 mpfi 3.19 3.40 9.00 85 \n",
"186 109 mpfi 3.19 3.40 9.00 85 \n",
"187 97 idi 3.01 3.40 23.00 68 \n",
"188 109 mpfi 3.19 3.40 10.00 100 \n",
"189 109 mpfi 3.19 3.40 8.50 90 \n",
"190 109 mpfi 3.19 3.40 8.50 90 \n",
"191 136 mpfi 3.19 3.40 8.50 110 \n",
"192 97 idi 3.01 3.40 23.00 68 \n",
"193 109 mpfi 3.19 3.40 9.00 88 \n",
"194 141 mpfi 3.78 3.15 9.50 114 \n",
"195 141 mpfi 3.78 3.15 9.50 114 \n",
"196 141 mpfi 3.78 3.15 9.50 114 \n",
"197 141 mpfi 3.78 3.15 9.50 114 \n",
"198 130 mpfi 3.62 3.15 7.50 162 \n",
"199 130 mpfi 3.62 3.15 7.50 162 \n",
"200 141 mpfi 3.78 3.15 9.50 114 \n",
"201 141 mpfi 3.78 3.15 8.70 160 \n",
"202 173 mpfi 3.58 2.87 8.80 134 \n",
"203 145 idi 3.01 3.40 23.00 106 \n",
"204 141 mpfi 3.78 3.15 9.50 114 \n",
"\n",
" peak-rpm city-mpg highway-mpg price \n",
"0 5000 21 27 13495 \n",
"1 5000 21 27 16500 \n",
"2 5000 19 26 16500 \n",
"3 5500 24 30 13950 \n",
"4 5500 18 22 17450 \n",
"5 5500 19 25 15250 \n",
"6 5500 19 25 17710 \n",
"7 5500 19 25 18920 \n",
"8 5500 17 20 23875 \n",
"9 5500 16 22 ? \n",
"10 5800 23 29 16430 \n",
"11 5800 23 29 16925 \n",
"12 4250 21 28 20970 \n",
"13 4250 21 28 21105 \n",
"14 4250 20 25 24565 \n",
"15 5400 16 22 30760 \n",
"16 5400 16 22 41315 \n",
"17 5400 15 20 36880 \n",
"18 5100 47 53 5151 \n",
"19 5400 38 43 6295 \n",
"20 5400 38 43 6575 \n",
"21 5500 37 41 5572 \n",
"22 5500 31 38 6377 \n",
"23 5500 24 30 7957 \n",
"24 5500 31 38 6229 \n",
"25 5500 31 38 6692 \n",
"26 5500 31 38 7609 \n",
"27 5500 24 30 8558 \n",
"28 5000 24 30 8921 \n",
"29 5000 19 24 12964 \n",
".. ... ... ... ... \n",
"175 4200 27 32 9988 \n",
"176 4200 27 32 10898 \n",
"177 4200 27 32 11248 \n",
"178 5200 20 24 16558 \n",
"179 5200 19 24 15998 \n",
"180 5200 20 24 15690 \n",
"181 5200 19 24 15750 \n",
"182 4800 37 46 7775 \n",
"183 5250 27 34 7975 \n",
"184 4800 37 46 7995 \n",
"185 5250 27 34 8195 \n",
"186 5250 27 34 8495 \n",
"187 4500 37 42 9495 \n",
"188 5500 26 32 9995 \n",
"189 5500 24 29 11595 \n",
"190 5500 24 29 9980 \n",
"191 5500 19 24 13295 \n",
"192 4500 33 38 13845 \n",
"193 5500 25 31 12290 \n",
"194 5400 23 28 12940 \n",
"195 5400 23 28 13415 \n",
"196 5400 24 28 15985 \n",
"197 5400 24 28 16515 \n",
"198 5100 17 22 18420 \n",
"199 5100 17 22 18950 \n",
"200 5400 23 28 16845 \n",
"201 5300 19 25 19045 \n",
"202 5500 18 23 21485 \n",
"203 4800 26 27 22470 \n",
"204 5400 19 25 22625 \n",
"\n",
"[205 rows x 26 columns]>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# look at the info of \"df\"\n",
"df.info"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>\n",
"Here we are able to see the information of our dataframe, with the top 30 rows and the bottom 30 rows.\n",
"<br><br>\n",
"And, it also shows us the whole data frame has 205 rows and 26 columns in total.\n",
"</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1>Excellent! You have just completed the Introduction Notebook!</h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"\n",
" <p><a href=\"https://cocl.us/DA0101EN_NotbookLink_Top_bottom\"><img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/Images/BottomAd.png\" width=\"750\" align=\"center\"></a></p>\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>About the Authors:</h3>\n",
"\n",
"This notebook was written by <a href=\"https://www.linkedin.com/in/mahdi-noorian-58219234/\" target=\"_blank\">Mahdi Noorian PhD</a>, <a href=\"https://www.linkedin.com/in/joseph-s-50398b136/\" target=\"_blank\">Joseph Santarcangelo</a>, Bahare Talayian, Eric Xiao, Steven Dong, Parizad, Hima Vsudevan and <a href=\"https://www.linkedin.com/in/fiorellawever/\" target=\"_blank\">Fiorella Wenver</a> and <a href=\" https://www.linkedin.com/in/yi-leng-yao-84451275/ \" target=\"_blank\" >Yi Yao</a>.\n",
"\n",
"<p><a href=\"https://www.linkedin.com/in/joseph-s-50398b136/\" target=\"_blank\">Joseph Santarcangelo</a> is a Data Scientist at IBM, and holds a PhD in Electrical Engineering. His research focused on using Machine Learning, Signal Processing, and Computer Vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"<p>Copyright &copy; 2018 IBM Developer Skills Network. This notebook and its source code are released under the terms of the <a href=\"https://cognitiveclass.ai/mit-license/\">MIT License</a>.</p>"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment