Created
March 20, 2019 14:48
-
-
Save pb111/b0c1d032eb3e78c9166f200e4aad88e3 to your computer and use it in GitHub Desktop.
Data Analysis with Pandas
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Data Analysis with Pandas\n", | |
"\n", | |
"\n", | |
"Pandas is an open source library for data analysis in Python. In this project, I explore pandas and important data analysis tools of pandas. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Table of contents\n", | |
"\n", | |
"The contents of this project are divided into various categories which are listed below:-\n", | |
"\n", | |
"\n", | |
"\n", | |
"1.\tIntroduction to Pandas\n", | |
"\n", | |
"2.\tKey features of Pandas\n", | |
"\n", | |
"3.\tAdvantages of Pandas\n", | |
"\n", | |
"4.\tImporting Pandas\n", | |
"\n", | |
"5.\tData structures in Pandas\n", | |
"\n", | |
"6.\tPandas series\n", | |
"\n", | |
"7.\tPandas dataframe\n", | |
"\n", | |
"8.\tPandas panel\n", | |
"\n", | |
"9.\tData import with pandas\n", | |
"\n", | |
"10.\tDataset description\n", | |
"\n", | |
"11.\tExploratory data analysis\n", | |
"\n", | |
"12.\tHandle missing values with pandas\n", | |
"\n", | |
"13.\tIndexing and slicing in pandas\n", | |
"\n", | |
"14.\tIndexing and reindexing in pandas\n", | |
"\n", | |
"15.\tMultiIndex or Advanced indexing\n", | |
"\n", | |
"16.\tSorting in pandas\n", | |
"\n", | |
"17.\tCategorical data in pandas\n", | |
"\n", | |
"18.\tBasic functionality in pandas\n", | |
"\n", | |
"19.\tDescriptive statistics in pandas\n", | |
"\n", | |
"20.\tStatistical functions in pandas\n", | |
"\n", | |
"21.\tWindow functions in pandas\n", | |
"\n", | |
"22.\tAggregations in pandas\n", | |
"\n", | |
"23.\tIteration in pandas\n", | |
"\n", | |
"24.\tFunction application in pandas\n", | |
"\n", | |
"25.\tPandas GroupBy operations\n", | |
"\n", | |
"26.\tPandas merging and joining\n", | |
" \n", | |
"27.\tPandas concatenation operation\n", | |
"\n", | |
"28.\tReshaping by melt and pivot\n", | |
"\n", | |
"29.\tReshaping by stacking and unstacking\n", | |
"\n", | |
"30.\tOptions and customization with pandas\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 1. Introduction to Pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"Today, Python is considered as the most popular programming language for doing data science work. The reason behind this popularity is that Python provides great packages for doing data analysis and visualization work. \n", | |
"\n", | |
"\n", | |
"\n", | |
"**Pandas** is one of those packages that makes analysing data much easier. Pandas is an open source library for data analysis \n", | |
"in Python. It was developed by Wes McKinney in 2008. Over the years, it has become the standard library for data analysis using Python.\n", | |
"\n", | |
"\n", | |
"According to the Wikipedia page on Pandas,\n", | |
"\n", | |
"\n", | |
"\n", | |
"**\"Pandas offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term 'panel data', an econometrics term for data sets that include observations over multiple time periods for the same individuals.\"**\n", | |
"\n", | |
"\n", | |
"\n", | |
"In this project, I explore Pandas and various data analysis tools provided by Pandas.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 2. Key features of Pandas\n", | |
"\n", | |
"\n", | |
"Some key features of Pandas are as follows:-\n", | |
"\n", | |
"\n", | |
"1.\tIt provides tools for reading and writing data from a wide variety of sources such as CSV files, excel files, \n", | |
" databases such as SQL, JSON files.\n", | |
" \n", | |
" \n", | |
"2.\tIt provides different data structures like series, dataframe and panel for data manipulation and indexing.\n", | |
"\n", | |
"\n", | |
"3.\tIt can handle wide variety of data sets in different formats – time series, heterogeneous data, tabular and matrix data.\n", | |
"\n", | |
"\n", | |
"4.\tIt can perform variety of operations on datasets. It includes subsetting, slicing, filtering, merging, joining, groupby, \n", | |
" reordering and reshaping operations.\n", | |
" \n", | |
" \n", | |
"5.\tIt can deal with missing data by either deleting them or filling them with zeros or a suitable test statistic.\n", | |
"\n", | |
"\n", | |
"6.\tIt can be used for parsing and conversion of data.\n", | |
"\n", | |
"\n", | |
"7.\tIt provides data filtration techniques.\n", | |
"\n", | |
"\n", | |
"8.\tIt provides time series functionality – date range generation, frequency conversion, moving window statistics, \n", | |
" data shifting and lagging.\n", | |
" \n", | |
" \n", | |
"9.\tIt integrates well with other Python libraries such as Scikit-learn, statsmodels and SciPy.\n", | |
"\n", | |
"\n", | |
"10.\tIt delivers fast performance. Also, it can be speeded up even more by making use of Cython (C extensions to Python).\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 3. Advantages of Pandas\n", | |
"\n", | |
"\n", | |
"Pandas is a core component of the Python data analysis toolkit. Pandas provides data structure and operations facilities, \n", | |
"which is particularly useful for data analysis. There are various advantages of using Pandas for data analysis. \n", | |
"\n", | |
"\n", | |
"These advantages are as follows:-\n", | |
"\n", | |
"\n", | |
"1.\t**Data representation** \n", | |
"\n", | |
"It represents data in a form that is very much suited for data analysis through its Dataframe and Series data structures.\n", | |
" \n", | |
" \n", | |
"2.\t**Data subsetting and filtering** \n", | |
"\n", | |
"It provides for easy subsetting and filtering of data. It provides procedures that are suited for data analysis.\n", | |
" \n", | |
" \n", | |
"3.\t**Concise and clear code** \n", | |
"\n", | |
"It provides functionality to write clear and concise code. It allows us to focus on the task at hand, rather than have to write tedious code.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 4. Importing Pandas\n", | |
"\n", | |
"\n", | |
"In order to use Pandas in our work, we need to import the Pandas library first. We can import the Pandas library with the following command:-\n", | |
"\n", | |
"\n", | |
"`import pandas`\n", | |
"\n", | |
"\n", | |
"Usually, we import the Pandas library by appending the alias `as pd`. It makes things easier because now instead of writing `pandas.command` we need to write `pd.command`. So, we will import pandas with the following command:-\n", | |
"\n", | |
"\n", | |
"`import pandas as pd`\n", | |
"\n", | |
"\n", | |
"Also, I will import Numpy as well, because it is very useful library for scientific computing with Python. I will import Numpy with the following command:-\n", | |
"\n", | |
"\n", | |
"`import numpy as np`\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# import pandas and numpy\n", | |
"\n", | |
"import pandas as pd\n", | |
"\n", | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 5. Data structures in Pandas\n", | |
"\n", | |
"\n", | |
"Pandas provide easy to use data structures. \n", | |
"\n", | |
"\n", | |
"There are three main data structures in Pandas. They are:-\n", | |
"\n", | |
"\n", | |
"-\tSeries\n", | |
"\n", | |
"-\tDataframe\n", | |
"\n", | |
"-\tPanel\n", | |
"\n", | |
"\n", | |
"These data structures are built on top of Numpy array, which means they are fast. I have described these data structures in the following sections.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 6. Pandas Series\n", | |
"\n", | |
"\n", | |
"\n", | |
"A Pandas Series is a one-dimensional array like structure with homogeneous data. \n", | |
"\n", | |
"\n", | |
"The data can be of any type (integer, string, float, etc.). The axis labels are collectively called index. \n", | |
"\n", | |
"\n", | |
"For example, the following series is a collection of integers 10, 20, 30, 40, 50, 60, 70, 80, 90, 100.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Key Points of Pandas Series\n", | |
"\n", | |
"\n", | |
"-\tHomogeneous data\n", | |
"\n", | |
"-\tSize of series immutable\n", | |
"\n", | |
"-\tValues of data mutable\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Series Constructor\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"A Pandas Series can be created using the following constructor −\n", | |
"\n", | |
"\n", | |
"`pandas. Series (data, index, dtype, copy)`\n", | |
"\n", | |
"\n", | |
"The parameters of the constructor are as follows –\n", | |
"\n", | |
"\n", | |
"-\t**data** - data takes various forms like ndarray, list, dictionary, constants, etc.\n", | |
"\n", | |
"\n", | |
"-\t**index**- index values must be unique, hashable and have the same length as data. The default index is RangeIndex (0, 1, 2,\n", | |
" …, n) if no index is passed.\n", | |
" \n", | |
" \n", | |
"-\t**dtype** - dtype is for data type. If none, data type will be inferred.\n", | |
"\n", | |
"\n", | |
"-\t**copy** - Copy input data. Default value is False.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 7. Pandas DataFrame\n", | |
"\n", | |
"\n", | |
"\n", | |
"A Dataframe is a two-dimensional data structure. So, data is aligned in a tabular fashion in rows and columns. \n", | |
"Its column types can be heterogeneous: - that is, of varying types. It is similar to structured arrays in NumPy \n", | |
"with mutability added.\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Properties of Dataframe are as follows:-\n", | |
"\n", | |
"\n", | |
"-\tThe dataframe is conceptually analogous to a table or spreadsheet of data. \n", | |
"\n", | |
"\n", | |
"-\tIts columns are of different types – float64, int, bool, and so on.\n", | |
"\n", | |
"\n", | |
"-\tA Dataframe column is a Series structure.\n", | |
"\n", | |
"\n", | |
"-\tIts size is mutable – columns can be inserted and deleted.\n", | |
"\n", | |
"\n", | |
"-\tIt has labelled axes (rows and columns).\n", | |
"\n", | |
"\n", | |
"-\tIt can be thought of as a dictionary of Series structures where both the rows and columns are indexed, \n", | |
" denoted as `index` in the case of rows and `columns` in the case of columns.\n", | |
" \n", | |
" \n", | |
"-\tIt can perform arithmetic operations on rows and columns.\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Dataframe Constructor\n", | |
"\n", | |
"\n", | |
"\n", | |
"Dataframe is the most commonly used data structure in pandas. \n", | |
"\n", | |
"\n", | |
"A pandas Dataframe can be created using the following constructor-\n", | |
"\n", | |
"\n", | |
"`pandas.DataFrame(data, index, columns, dtype, copy)`\n", | |
"\n", | |
"\n", | |
"The constructor accepts many different types of arguments: \n", | |
"\n", | |
"\n", | |
"\tDictionary of 1D ndarrays, lists, dictionaries, or Series structures \n", | |
" \n", | |
"\t2D NumPy array\n", | |
" \n", | |
"\tStructured or record ndarray\n", | |
" \n", | |
"\tSeries structures\n", | |
" \n", | |
"\tAnother DataFrame structure \n", | |
"\n", | |
"\n", | |
"\n", | |
"The parameters description of the constructor is as follows –\n", | |
"\n", | |
"\n", | |
"-**data** - data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.\n", | |
"\n", | |
"\n", | |
"-**index**- Index or array-like \n", | |
"\n", | |
"\n", | |
" Index to use for resulting frame. Will default to RangeIndex if no indexing information part of \n", | |
" input data and no index provided\n", | |
" \n", | |
"\n", | |
"-**columns**- Index or array-like\n", | |
"\n", | |
"\n", | |
" Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are \n", | |
" provided.\n", | |
" \n", | |
" \n", | |
"-**dtype** - data type of each column\n", | |
"\n", | |
"\n", | |
"\n", | |
"-**copy** - boolean, default False\n", | |
"\n", | |
"\n", | |
" Copy data from inputs. Only affects DataFrame / 2d ndarray input\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Dataframe Creation\n", | |
"\n", | |
"\n", | |
"A pandas Dataframe can be created using various inputs like −\n", | |
"\n", | |
"\n", | |
"•\tLists\n", | |
"\n", | |
"•\tdict\n", | |
"\n", | |
"•\tSeries\n", | |
"\n", | |
"•\tNumpy ndarrays\n", | |
"\n", | |
"•\tAnother Dataframe\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 8. Pandas Panel\n", | |
"\n", | |
"\n", | |
"\n", | |
"A panel is a 3D container of data. \n", | |
"\n", | |
"\n", | |
"The term Panel data is derived from **econometrics** and is partially responsible for the name pandas − pan(el)-da(ta)-s.\n", | |
"\n", | |
"\n", | |
"The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. \n", | |
"\n", | |
"\n", | |
"They are −\n", | |
"\n", | |
"\n", | |
"`items − axis 0`, each item corresponds to a DataFrame contained inside.\n", | |
"\n", | |
"\n", | |
"`major_axis − axis 1`, it is the index (rows) of each of the DataFrames.\n", | |
"\n", | |
"\n", | |
"`minor_axis − axis 2`, it is the columns of each of the DataFrames." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 9. Data import with Pandas\n", | |
"\n", | |
"\n", | |
"Pandas input output API provides several functions that can be used to import and export various file formats. \n", | |
"\n", | |
"\n", | |
"Below is the list of file formats and the corresponding functions to import these file formats.\n", | |
"\n", | |
"\n", | |
"- Flat files - read_csv(), to_csv()\n", | |
"\n", | |
"- Excel files - read_excel(), ExcelWriter(), to_excel()\n", | |
"\n", | |
"- JSON files - read_json(), to_json()\n", | |
"\n", | |
"- HTML tables - read_html(), to_html()\n", | |
"\n", | |
"- SAS files - read_sas()\n", | |
"\n", | |
"- SQL files - read_sql(), read_sql_query(), read_sql_table(), to_sql()\n", | |
"\n", | |
"- STATA files - read_stata(), to_stata()\n", | |
"\n", | |
"- pickle object - read_pickle(), to_pickle()\n", | |
"\n", | |
"- HDF5 files - read_hdf(), to_hdf()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this project, I work with the **BlackFriday dataset** which is a comma-separated values (CSV) file type. In a CSV file type, the data is stored as a comma-separated values where each row is separated by a new line, and each column by a comma (,). Also, in some sections, I create my own dataset to discuss the respective functionality.\n", | |
"\n", | |
"\n", | |
"So, I use the **read_csv()** function to import the file as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"data = 'C:/datasets/BlackFriday.csv'\n", | |
"\n", | |
"df = pd.read_csv(data)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 10. Dataset description\n", | |
"\n", | |
"\n", | |
"I have used **BlackFriday** dataset for this project. The dataset is the sample of the transactions made in a retail store.\n", | |
"\n", | |
"\n", | |
"The dataset contains 12 variables and 537577 instances.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"I have downloaded this dataset from the following url:-\n", | |
"\n", | |
"\n", | |
"https://www.kaggle.com/mehdidag/black-friday" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 11. Exploratory Data Analysis\n", | |
"\n", | |
"\n", | |
"The next step is to conduct exploratory data analysis. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### check the type of df\n", | |
"\n", | |
"\n", | |
"I have imported the dataset. The next step is to check its type. We can check its type with the following command:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"pandas.core.frame.DataFrame" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"type(df)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that the `df` is the pandas dataframe." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### check shape of dataframe\n", | |
"\n", | |
"\n", | |
"The next step is to check the shape of the dataframe. We can check the shape of the dataframe as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(537577, 12)" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.shape" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There are 537577 rows and 12 columns in the dataset." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### view the first five rows of the dataframe\n", | |
"\n", | |
"\n", | |
"We can view the first 5 rows of the dataframe with **head()** method as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00069042</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>8370</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00248942</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00087842</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>1422</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00085442</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>14.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>1057</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1000002</td>\n", | |
" <td>P00285442</td>\n", | |
" <td>M</td>\n", | |
" <td>55+</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>7969</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"0 1000001 P00069042 F 0-17 10 A \n", | |
"1 1000001 P00248942 F 0-17 10 A \n", | |
"2 1000001 P00087842 F 0-17 10 A \n", | |
"3 1000001 P00085442 F 0-17 10 A \n", | |
"4 1000002 P00285442 M 55+ 16 C \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"0 2 0 3 \n", | |
"1 2 0 1 \n", | |
"2 2 0 12 \n", | |
"3 2 0 12 \n", | |
"4 4+ 0 8 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 NaN NaN 8370 \n", | |
"1 6.0 14.0 15200 \n", | |
"2 NaN NaN 1422 \n", | |
"3 14.0 NaN 1057 \n", | |
"4 NaN NaN 7969 " | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### view concise summary of dataframe\n", | |
"\n", | |
"We can view the concise summary of dataframe with **info()** method as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<class 'pandas.core.frame.DataFrame'>\n", | |
"RangeIndex: 537577 entries, 0 to 537576\n", | |
"Data columns (total 12 columns):\n", | |
"User_ID 537577 non-null int64\n", | |
"Product_ID 537577 non-null object\n", | |
"Gender 537577 non-null object\n", | |
"Age 537577 non-null object\n", | |
"Occupation 537577 non-null int64\n", | |
"City_Category 537577 non-null object\n", | |
"Stay_In_Current_City_Years 537577 non-null object\n", | |
"Marital_Status 537577 non-null int64\n", | |
"Product_Category_1 537577 non-null int64\n", | |
"Product_Category_2 370591 non-null float64\n", | |
"Product_Category_3 164278 non-null float64\n", | |
"Purchase 537577 non-null int64\n", | |
"dtypes: float64(2), int64(5), object(5)\n", | |
"memory usage: 49.2+ MB\n" | |
] | |
} | |
], | |
"source": [ | |
"df.info()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 12. Handle missing values with pandas\n", | |
"\n", | |
"\n", | |
"We can check the total number of missing values in each column in the dataset with the following command:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 0\n", | |
"Product_ID 0\n", | |
"Gender 0\n", | |
"Age 0\n", | |
"Occupation 0\n", | |
"City_Category 0\n", | |
"Stay_In_Current_City_Years 0\n", | |
"Marital_Status 0\n", | |
"Product_Category_1 0\n", | |
"Product_Category_2 166986\n", | |
"Product_Category_3 373299\n", | |
"Purchase 0\n", | |
"dtype: int64" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.isnull().sum()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that there are 166986 missing values in `Product_Category_2` and 373299 columns in `Product_Category_3` columns." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### isna() and notna() functions to detect 'NA' values\n", | |
"\n", | |
"\n", | |
"Pandas provides `isna()` and `notna()` functions to detect 'NA' values. \n", | |
"\n", | |
"These are also methods on Series and DataFrame objects.\n", | |
"\n", | |
"Examples of isna() and notna() commands.\n", | |
"\n", | |
"\n", | |
"\n", | |
"detect ‘NA’ values in the dataframe\t\n", | |
"\n", | |
"`df.isna().sum()`\n", | |
"\n", | |
"\n", | |
"\n", | |
"detect ‘NA’ values in a particular column in the dataframe\n", | |
"\n", | |
"\n", | |
"`pd.isna(df[‘col_name’])`\n", | |
"\n", | |
"\n", | |
"`df[‘col_name’].notna()`\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 0\n", | |
"Product_ID 0\n", | |
"Gender 0\n", | |
"Age 0\n", | |
"Occupation 0\n", | |
"City_Category 0\n", | |
"Stay_In_Current_City_Years 0\n", | |
"Marital_Status 0\n", | |
"Product_Category_1 0\n", | |
"Product_Category_2 166986\n", | |
"Product_Category_3 373299\n", | |
"Purchase 0\n", | |
"dtype: int64" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.isna().sum()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that all the missing values are encoded as `NA` values. If the missing values are encoded in different ways we should encode them first." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Encode missing numerical values\n", | |
"\n", | |
"\n", | |
"Missing values are encoded in different ways. They can appear as `NaN`, `NA`, `?`, `zeros`, `xx`, `-1` or a blank space `“ ”`. \n", | |
"We can use various pandas methods to deal with missing values. \n", | |
"\n", | |
"But, pandas always recognize missing values as `NaN`. So, it is essential that we should first convert all the `?`, `zeros`, `xx`, `-1` or `“ ”` to `NaN`. If the missing values isn’t identified as `NaN`, then we have to first convert or replace \n", | |
"such `non NaN` entry with a `NaN`.\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Convert '?' to ‘NaN’\n", | |
"\n", | |
"`df[df == '?'] = np.nan`\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Handle missing numerical values\n", | |
"\n", | |
"There are several methods to handle missing values. Each method has its own advantages and disadvantages. The choice of the method is subjective and depends on the nature of data and the missing values. In this section, I have listed the most commonly used methods to deal with missing values. They are as follows:-\n", | |
"\n", | |
"\n", | |
"- Drop missing values with dropna() method\n", | |
"\n", | |
"- Fill missing values with zeros\n", | |
"\n", | |
"- Fill missing values with a test statistic\n", | |
"\n", | |
"- Fill missing values backward or forward\n", | |
"\n", | |
"\n", | |
"\n", | |
"In this section, I have fill the missing values with forward or backward filling.\n", | |
"\n", | |
"\n", | |
"The **pad or fill** option fill values forward, while **bfill or backfill** option fill values backward. \n", | |
"\n", | |
"\n", | |
"The following code helps us to achieve this task:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"df = df.fillna(method = 'pad')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Again, we should check whether missing values are removed or not." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 0\n", | |
"Product_ID 0\n", | |
"Gender 0\n", | |
"Age 0\n", | |
"Occupation 0\n", | |
"City_Category 0\n", | |
"Stay_In_Current_City_Years 0\n", | |
"Marital_Status 0\n", | |
"Product_Category_1 0\n", | |
"Product_Category_2 1\n", | |
"Product_Category_3 1\n", | |
"Purchase 0\n", | |
"dtype: int64" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.isnull().sum()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that the `Product_Category_2` and `Product_Category_3` have 1 missing value. We can use the **head()** to check this." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Product_Category_2 Product_Category_3\n", | |
"0 NaN NaN\n", | |
"1 6.0 14.0\n", | |
"2 6.0 14.0\n", | |
"3 14.0 14.0\n", | |
"4 14.0 14.0" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df[['Product_Category_2', 'Product_Category_3']].head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that the first element of each column are NaN. So, in this case **pad** or **fill** option does not work. Here, we\n", | |
"should use **bfill** or **backfill** options as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"df = df.fillna(method = 'backfill')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Again, we should check whether missing values are filled or not." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 0\n", | |
"Product_ID 0\n", | |
"Gender 0\n", | |
"Age 0\n", | |
"Occupation 0\n", | |
"City_Category 0\n", | |
"Stay_In_Current_City_Years 0\n", | |
"Marital_Status 0\n", | |
"Product_Category_1 0\n", | |
"Product_Category_2 0\n", | |
"Product_Category_3 0\n", | |
"Purchase 0\n", | |
"dtype: int64" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.isnull().sum()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Check with ASSERT statement\n", | |
"\n", | |
"\n", | |
"Finally, we should check for missing values programmatically. If we drop or fill missing values, we expect no missing values. \n", | |
"We can write an assert statement to verify this. So, we can use an assert statement to programmatically check that no missing or unexpected '0' value is present. This gives confidence that our code is running properly.\n", | |
"\n", | |
"\n", | |
"Assert statement will return nothing if the value being tested is true and will throw an AssertionError if the value is false.\n", | |
"\n", | |
"\n", | |
"Asserts\n", | |
"\n", | |
"\n", | |
"•\tassert 1 == 1 (return Nothing if the value is True)\n", | |
"\n", | |
"\n", | |
"•\tassert 1 == 2 (return AssertionError if the value is False)\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"#assert that there are no missing values in the dataframe\n", | |
"\n", | |
"assert pd.notnull(df).all().all()\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The above command does not throw any AssertionError. So, it is confirmed that there are no missing values in the dataframe." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 13.\tIndexing and slicing in pandas" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this section, I will discuss how to slice and dice the data and get the subset of pandas dataframe.\n", | |
"\n", | |
"\n", | |
"Pandas provides three types of Multi-axes indexing. Those three types are mentioned in the following table:-\n", | |
"\n", | |
"\n", | |
"- 1. **.loc** - Label based\n", | |
"\n", | |
"\n", | |
"- 2. **.iloc** - Integer based\n", | |
"\n", | |
"\n", | |
"- 3. **.ix** - Both Label and Integer based\n", | |
"\n", | |
"\n", | |
"Starting with pandas 0.20.0, the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers. So, I will not discuss it here and limit the discussion to .loc and .iloc indexers." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Label based indexing using .loc indexer\n", | |
"\n", | |
"\n", | |
"Pandas provide **.loc indexer** to have purely label based indexing. When slicing, the start bound is also included. \n", | |
"Integers are valid labels, but they refer to the label and not the position.\n", | |
"\n", | |
"\n", | |
".loc indexer has multiple access methods like −\n", | |
"\n", | |
"\n", | |
"- A single scalar label\n", | |
"\n", | |
"- A list of labels\n", | |
"\n", | |
"- A slice object\n", | |
"\n", | |
"- A Boolean array\n", | |
"\n", | |
"\n", | |
"**Syntax**-\n", | |
"\n", | |
"\n", | |
".loc takes two single/list/range operator separated by ','. \n", | |
"\n", | |
"\n", | |
"The first one indicates the row and the second one indicates columns.\n", | |
"\n", | |
"\n", | |
"Below are the examples of selecting data using .loc indexer:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# make a copy of dataframe\n", | |
"df1 = df.copy()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 1000001\n", | |
"Product_ID P00069042\n", | |
"Gender F\n", | |
"Age 0-17\n", | |
"Occupation 10\n", | |
"City_Category A\n", | |
"Stay_In_Current_City_Years 2\n", | |
"Marital_Status 0\n", | |
"Product_Category_1 3\n", | |
"Product_Category_2 6\n", | |
"Product_Category_3 14\n", | |
"Purchase 8370\n", | |
"Name: 0, dtype: object" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# select first row of dataframe\n", | |
"\n", | |
"df1.loc[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0 8370\n", | |
"1 15200\n", | |
"2 1422\n", | |
"3 1057\n", | |
"4 7969\n", | |
"Name: Purchase, dtype: int64" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#select first five rows for a specific column\n", | |
"\n", | |
"df1.loc[:,'Purchase'].head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Similar examples of selecting data using .loc indexer are as follows:-\n", | |
"\n", | |
"\n", | |
"Select all rows for multiple columns, say list[]\n", | |
"\n", | |
"`df1.loc[:,['Age','Occupation']]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"Select first five rows for multiple columns, say list[]\n", | |
"\n", | |
"`df1.loc[[0, 1, 2, 3, 4],['Age','Occupation']]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"Select range of rows for all columns\n", | |
"\n", | |
"`df1.loc[0:4]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"The above functionality can also be given by\n", | |
"\n", | |
"`df1.head()`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Integer position based indexing using .iloc indexer\n", | |
"\n", | |
"\n", | |
"Pandas provides **.iloc indexer** for integer position based indexing.\n", | |
"\n", | |
"\n", | |
".iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. \n", | |
".iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. Allowed inputs of .iloc indexer are:-\n", | |
"\n", | |
"\n", | |
"- An integer e.g. 5.\n", | |
"\n", | |
"\n", | |
"- A list or array of integers [4, 3, 0].\n", | |
"\n", | |
"\n", | |
"- A slice object with ints 1:7.\n", | |
"\n", | |
"\n", | |
"- A boolean array." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Rows selection using .iloc indexer\n", | |
"\n", | |
"\n", | |
"Below are the examples of row selection using .iloc indexer\n", | |
"\n", | |
"\n", | |
"#### select first row of dataframe\n", | |
"\n", | |
"\n", | |
"df1.iloc[0]\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select second row of dataframe\n", | |
"\n", | |
"\n", | |
"df1.iloc[1]\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select last row of dataframe\n", | |
"\n", | |
"\n", | |
"df1.iloc[-1]\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select second last row of dataframe\n", | |
"\n", | |
"\n", | |
"df1.iloc[-2]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 1000001\n", | |
"Product_ID P00069042\n", | |
"Gender F\n", | |
"Age 0-17\n", | |
"Occupation 10\n", | |
"City_Category A\n", | |
"Stay_In_Current_City_Years 2\n", | |
"Marital_Status 0\n", | |
"Product_Category_1 3\n", | |
"Product_Category_2 6\n", | |
"Product_Category_3 14\n", | |
"Purchase 8370\n", | |
"Name: 0, dtype: object" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#select first row of dataframe\n", | |
"\n", | |
"df1.iloc[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 1004737\n", | |
"Product_ID P00118242\n", | |
"Gender M\n", | |
"Age 36-45\n", | |
"Occupation 16\n", | |
"City_Category C\n", | |
"Stay_In_Current_City_Years 1\n", | |
"Marital_Status 0\n", | |
"Product_Category_1 5\n", | |
"Product_Category_2 8\n", | |
"Product_Category_3 16\n", | |
"Purchase 6875\n", | |
"Name: 537576, dtype: object" | |
] | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"#select last row of dataframe\n", | |
"\n", | |
"df1.iloc[-1]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Columns selection using .iloc indexer\n", | |
"\n", | |
"\n", | |
"#### select first column of dataframe\n", | |
"\n", | |
"`df1.iloc[:,0]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select second column of dataframe\n", | |
"\n", | |
"`df1.iloc[:,1]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select last column of dataframe\n", | |
"\n", | |
"`df1.iloc[:,-1]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select second last column of dataframe\n", | |
"\n", | |
"`df1.iloc[:,-2]`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Multiple rows and columns selection using .iloc indexer\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select first five rows of dataframe\n", | |
"\n", | |
"`df1.iloc[0:5]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select first five columns of data frame with all rows\n", | |
"\n", | |
"`df1.loc[:, 0:5]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select 1st, 5th and 10th rows with 1st, 4th and 7th columns\n", | |
"\n", | |
"`df1.iloc[[0,4,9]], [0,3,6]]`\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### select first 5 rows and 5th, 6th, 7th columns of data frame\n", | |
"\n", | |
"`df1.iloc[0:5, 5:8]`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Indexing first occurrence of maximum or minimum values with idxmax() and idxmin()\n", | |
"\n", | |
"\n", | |
"Pandas provide two functions **idxmax()** and **idxmin()** that return index of first occurrence of maximum or minimum values over requested axis. NA/null values are excluded from the output." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"87440" | |
] | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# get index of first occurence of maximum Purchase value \n", | |
"\n", | |
"df1['Purchase'].idxmax()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 1001474\n", | |
"Product_ID P00052842\n", | |
"Gender M\n", | |
"Age 26-35\n", | |
"Occupation 4\n", | |
"City_Category A\n", | |
"Stay_In_Current_City_Years 2\n", | |
"Marital_Status 1\n", | |
"Product_Category_1 10\n", | |
"Product_Category_2 15\n", | |
"Product_Category_3 8\n", | |
"Purchase 23961\n", | |
"Name: 87440, dtype: object" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# get the row with the maximum Purchase value \n", | |
"\n", | |
"df1.loc[df1['Purchase'].idxmax()]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Indexing a single value with at() and iat()\n", | |
"\n", | |
"\n", | |
"Pandas provides **at()** and **iat()** functions to access a single value for a row and column pair by label or by integer position." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"15200" | |
] | |
}, | |
"execution_count": 22, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# get value at 1st row and Purchase column pair\n", | |
"\n", | |
"df1.at[1, 'Purchase']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"15200" | |
] | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# get value at 1st row and 11th column pair\n", | |
"\n", | |
"df1.iat[1, 11]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Boolean indexing in pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"**Boolean indexing** is the use of boolean vectors to filter and select the data. The operators for boolean indexing are -\n", | |
"\n", | |
"\n", | |
"- 1. | for or, \n", | |
"\n", | |
"\n", | |
"- 2. & for and,\n", | |
"\n", | |
"\n", | |
"- 3. ~ for not. \n", | |
"\n", | |
"\n", | |
"These must be grouped by using parentheses. Using a boolean vector to index a Series works exactly as in a NumPy ndarray.\n", | |
"\n", | |
"\n", | |
"Conditional selections with boolean arrays using **df.loc[selection]** is the most common method to use with Pandas DataFrames. With boolean indexing or logical selection, we can pass an array or Series of True/False values to the .loc indexer to select the rows where the Series has True values. Then, we will make selections based on the values of different columns in dataset.\n", | |
"\n", | |
"\n", | |
"We can use a boolean True/False series to select rows in a pandas dataframe where there are true values. Then, a second argument can be passed to .loc indexer to select other columns of the dataframe with the same label. The columns are referred to by name for the loc indexer and can be a single string, a list of columns, or a slice \":\" operation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# make a copy of dataframe df\n", | |
"\n", | |
"df2 = df.copy()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00069042</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8370</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00248942</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00087842</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1422</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00085442</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1057</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1000002</td>\n", | |
" <td>P00285442</td>\n", | |
" <td>M</td>\n", | |
" <td>55+</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7969</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"0 1000001 P00069042 F 0-17 10 A \n", | |
"1 1000001 P00248942 F 0-17 10 A \n", | |
"2 1000001 P00087842 F 0-17 10 A \n", | |
"3 1000001 P00085442 F 0-17 10 A \n", | |
"4 1000002 P00285442 M 55+ 16 C \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"0 2 0 3 \n", | |
"1 2 0 1 \n", | |
"2 2 0 12 \n", | |
"3 2 0 12 \n", | |
"4 4+ 0 8 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 6.0 14.0 8370 \n", | |
"1 6.0 14.0 15200 \n", | |
"2 6.0 14.0 1422 \n", | |
"3 14.0 14.0 1057 \n", | |
"4 14.0 14.0 7969 " | |
] | |
}, | |
"execution_count": 25, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df2.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0 8370\n", | |
"Name: Purchase, dtype: int64" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# get the purchase amount with a given user_id and product_id\n", | |
"\n", | |
"df2.loc[((df2['User_ID'] == 1000001) & (df2['Product_ID'] == 'P00069042')), 'Purchase']" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Indexing with isin() method\n", | |
"\n", | |
"\n", | |
"The **isin()** method of Series, returns a boolean vector. It is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values we want to access. The same method is available for Index objects. It is useful for the cases when we don't know which of the sought labels are in fact present.\n", | |
"\n", | |
"\n", | |
"DataFrame also has an **isin()** method. When calling isin, we pass a set of values as either an array or dict. If values is an array, isin returns a DataFrame of booleans that is the same shape as the original DataFrame, with True wherever the element is in the sequence of values." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>True</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"0 True True True False True True \n", | |
"1 True False True False True True \n", | |
"2 True False True False True True \n", | |
"3 True False True False True True \n", | |
"4 False False False False False False \n", | |
"5 False False False False False True \n", | |
"6 False False False False False False \n", | |
"7 False False False False False False \n", | |
"8 False False False False False False \n", | |
"9 False False False False False True \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"0 False True True \n", | |
"1 False True False \n", | |
"2 False True False \n", | |
"3 False True False \n", | |
"4 False True False \n", | |
"5 False True False \n", | |
"6 False False False \n", | |
"7 False False False \n", | |
"8 False False False \n", | |
"9 False False False \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 True True True \n", | |
"1 True True False \n", | |
"2 True True False \n", | |
"3 True True False \n", | |
"4 True True False \n", | |
"5 True True False \n", | |
"6 False False False \n", | |
"7 False False False \n", | |
"8 False False False \n", | |
"9 False False False " | |
] | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"values=[1000001,'P00069042','F',0-17,10,'A',2,0,3,6,14,8370]\n", | |
"\n", | |
"df2_indexed=df2.isin(values)\n", | |
"\n", | |
"\n", | |
"df2_indexed.head(10)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can combine DataFrame's isin with the **any()** and **all()** methods to quickly select subsets of the data that meet a given criteria. We can select a row where each column meets its own criterion as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00069042</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8370</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00248942</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00087842</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1422</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00085442</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1057</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1000002</td>\n", | |
" <td>P00285442</td>\n", | |
" <td>M</td>\n", | |
" <td>55+</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7969</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>1000003</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15227</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00274942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>7871</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00251242</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>5254</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00014542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>3957</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00031342</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>6073</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00145042</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>15665</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00231342</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5378</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00190242</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>4</td>\n", | |
" <td>5.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>2079</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P0096642</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>2</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>13055</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00058442</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>8851</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>1000007</td>\n", | |
" <td>P00036842</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>14.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>11788</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00220442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>8584</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00156442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>9872</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00213742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>9743</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00214442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>5982</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00303442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11927</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00135742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>6</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>16662</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00039942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5887</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00161442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6973</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00078742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5391</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>1000010</td>\n", | |
" <td>P00085942</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>4.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16352</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>30</th>\n", | |
" <td>1000010</td>\n", | |
" <td>P00118742</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>11.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>8886</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>31</th>\n", | |
" <td>1000010</td>\n", | |
" <td>P00297942</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>11.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5875</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>32</th>\n", | |
" <td>1000010</td>\n", | |
" <td>P00266842</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>11.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>8854</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>33</th>\n", | |
" <td>1000010</td>\n", | |
" <td>P00058342</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>3</td>\n", | |
" <td>4.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>10946</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537547</th>\n", | |
" <td>1004733</td>\n", | |
" <td>P00244042</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>18</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>11543</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537548</th>\n", | |
" <td>1004734</td>\n", | |
" <td>P00111042</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>15</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>20924</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537549</th>\n", | |
" <td>1004734</td>\n", | |
" <td>P00345842</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>13082</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537550</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00278242</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11658</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537551</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00313442</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>6863</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537552</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P0098642</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>6</td>\n", | |
" <td>8.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16415</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537553</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00119342</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>10</td>\n", | |
" <td>13.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>18526</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537554</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00114042</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7099</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537555</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00135142</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>13</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>578</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537556</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00194542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>2183</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537557</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00175242</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>12724</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537558</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00101942</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>17.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7796</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537559</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00109142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>17.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7770</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537560</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00084842</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5940</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537561</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00078142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7834</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537562</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00146742</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>13.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11508</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537563</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00154642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>13.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6074</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537564</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00117442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7084</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537565</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00051142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7934</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537566</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00048742</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5350</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537567</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00157542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1994</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537568</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00250642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>11</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5930</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537569</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00023142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7042</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537570</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00162442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15491</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537571</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00221442</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>11852</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537572</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>11664</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537573</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00111142</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>19196</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537574</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00345942</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8043</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537575</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00285842</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>7172</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537576</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00118242</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>6875</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>497270 rows × 12 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"0 1000001 P00069042 F 0-17 10 A \n", | |
"1 1000001 P00248942 F 0-17 10 A \n", | |
"2 1000001 P00087842 F 0-17 10 A \n", | |
"3 1000001 P00085442 F 0-17 10 A \n", | |
"4 1000002 P00285442 M 55+ 16 C \n", | |
"5 1000003 P00193542 M 26-35 15 A \n", | |
"9 1000005 P00274942 M 26-35 20 A \n", | |
"10 1000005 P00251242 M 26-35 20 A \n", | |
"11 1000005 P00014542 M 26-35 20 A \n", | |
"12 1000005 P00031342 M 26-35 20 A \n", | |
"13 1000005 P00145042 M 26-35 20 A \n", | |
"14 1000006 P00231342 F 51-55 9 A \n", | |
"15 1000006 P00190242 F 51-55 9 A \n", | |
"16 1000006 P0096642 F 51-55 9 A \n", | |
"17 1000006 P00058442 F 51-55 9 A \n", | |
"18 1000007 P00036842 M 36-45 1 B \n", | |
"20 1000008 P00220442 M 26-35 12 C \n", | |
"21 1000008 P00156442 M 26-35 12 C \n", | |
"22 1000008 P00213742 M 26-35 12 C \n", | |
"23 1000008 P00214442 M 26-35 12 C \n", | |
"24 1000008 P00303442 M 26-35 12 C \n", | |
"25 1000009 P00135742 M 26-35 17 C \n", | |
"26 1000009 P00039942 M 26-35 17 C \n", | |
"27 1000009 P00161442 M 26-35 17 C \n", | |
"28 1000009 P00078742 M 26-35 17 C \n", | |
"29 1000010 P00085942 F 36-45 1 B \n", | |
"30 1000010 P00118742 F 36-45 1 B \n", | |
"31 1000010 P00297942 F 36-45 1 B \n", | |
"32 1000010 P00266842 F 36-45 1 B \n", | |
"33 1000010 P00058342 F 36-45 1 B \n", | |
"... ... ... ... ... ... ... \n", | |
"537547 1004733 P00244042 M 18-25 18 C \n", | |
"537548 1004734 P00111042 M 51-55 1 B \n", | |
"537549 1004734 P00345842 M 51-55 1 B \n", | |
"537550 1004735 P00278242 M 46-50 3 C \n", | |
"537551 1004735 P00313442 M 46-50 3 C \n", | |
"537552 1004735 P0098642 M 46-50 3 C \n", | |
"537553 1004735 P00119342 M 46-50 3 C \n", | |
"537554 1004735 P00114042 M 46-50 3 C \n", | |
"537555 1004735 P00135142 M 46-50 3 C \n", | |
"537556 1004736 P00194542 M 18-25 20 A \n", | |
"537557 1004736 P00175242 M 18-25 20 A \n", | |
"537558 1004736 P00101942 M 18-25 20 A \n", | |
"537559 1004736 P00109142 M 18-25 20 A \n", | |
"537560 1004736 P00084842 M 18-25 20 A \n", | |
"537561 1004736 P00078142 M 18-25 20 A \n", | |
"537562 1004736 P00146742 M 18-25 20 A \n", | |
"537563 1004736 P00154642 M 18-25 20 A \n", | |
"537564 1004736 P00117442 M 18-25 20 A \n", | |
"537565 1004736 P00051142 M 18-25 20 A \n", | |
"537566 1004736 P00048742 M 18-25 20 A \n", | |
"537567 1004736 P00157542 M 18-25 20 A \n", | |
"537568 1004736 P00250642 M 18-25 20 A \n", | |
"537569 1004736 P00023142 M 18-25 20 A \n", | |
"537570 1004736 P00162442 M 18-25 20 A \n", | |
"537571 1004737 P00221442 M 36-45 16 C \n", | |
"537572 1004737 P00193542 M 36-45 16 C \n", | |
"537573 1004737 P00111142 M 36-45 16 C \n", | |
"537574 1004737 P00345942 M 36-45 16 C \n", | |
"537575 1004737 P00285842 M 36-45 16 C \n", | |
"537576 1004737 P00118242 M 36-45 16 C \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"0 2 0 3 \n", | |
"1 2 0 1 \n", | |
"2 2 0 12 \n", | |
"3 2 0 12 \n", | |
"4 4+ 0 8 \n", | |
"5 3 0 1 \n", | |
"9 1 1 8 \n", | |
"10 1 1 5 \n", | |
"11 1 1 8 \n", | |
"12 1 1 8 \n", | |
"13 1 1 1 \n", | |
"14 1 0 5 \n", | |
"15 1 0 4 \n", | |
"16 1 0 2 \n", | |
"17 1 0 5 \n", | |
"18 1 1 1 \n", | |
"20 4+ 1 5 \n", | |
"21 4+ 1 8 \n", | |
"22 4+ 1 8 \n", | |
"23 4+ 1 8 \n", | |
"24 4+ 1 1 \n", | |
"25 0 0 6 \n", | |
"26 0 0 8 \n", | |
"27 0 0 5 \n", | |
"28 0 0 5 \n", | |
"29 4+ 1 2 \n", | |
"30 4+ 1 5 \n", | |
"31 4+ 1 8 \n", | |
"32 4+ 1 5 \n", | |
"33 4+ 1 3 \n", | |
"... ... ... ... \n", | |
"537547 1 0 1 \n", | |
"537548 1 1 15 \n", | |
"537549 1 1 2 \n", | |
"537550 3 0 1 \n", | |
"537551 3 0 5 \n", | |
"537552 3 0 6 \n", | |
"537553 3 0 10 \n", | |
"537554 3 0 5 \n", | |
"537555 3 0 13 \n", | |
"537556 1 1 8 \n", | |
"537557 1 1 2 \n", | |
"537558 1 1 8 \n", | |
"537559 1 1 8 \n", | |
"537560 1 1 8 \n", | |
"537561 1 1 8 \n", | |
"537562 1 1 1 \n", | |
"537563 1 1 8 \n", | |
"537564 1 1 5 \n", | |
"537565 1 1 8 \n", | |
"537566 1 1 5 \n", | |
"537567 1 1 8 \n", | |
"537568 1 1 11 \n", | |
"537569 1 1 5 \n", | |
"537570 1 1 1 \n", | |
"537571 1 0 1 \n", | |
"537572 1 0 1 \n", | |
"537573 1 0 1 \n", | |
"537574 1 0 8 \n", | |
"537575 1 0 5 \n", | |
"537576 1 0 5 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 6.0 14.0 8370 \n", | |
"1 6.0 14.0 15200 \n", | |
"2 6.0 14.0 1422 \n", | |
"3 14.0 14.0 1057 \n", | |
"4 14.0 14.0 7969 \n", | |
"5 2.0 14.0 15227 \n", | |
"9 16.0 17.0 7871 \n", | |
"10 11.0 17.0 5254 \n", | |
"11 11.0 17.0 3957 \n", | |
"12 11.0 17.0 6073 \n", | |
"13 2.0 5.0 15665 \n", | |
"14 8.0 14.0 5378 \n", | |
"15 5.0 14.0 2079 \n", | |
"16 3.0 4.0 13055 \n", | |
"17 14.0 4.0 8851 \n", | |
"18 14.0 16.0 11788 \n", | |
"20 14.0 15.0 8584 \n", | |
"21 14.0 15.0 9872 \n", | |
"22 14.0 15.0 9743 \n", | |
"23 14.0 15.0 5982 \n", | |
"24 8.0 14.0 11927 \n", | |
"25 8.0 14.0 16662 \n", | |
"26 8.0 14.0 5887 \n", | |
"27 14.0 14.0 6973 \n", | |
"28 8.0 14.0 5391 \n", | |
"29 4.0 8.0 16352 \n", | |
"30 11.0 8.0 8886 \n", | |
"31 11.0 8.0 5875 \n", | |
"32 11.0 8.0 8854 \n", | |
"33 4.0 8.0 10946 \n", | |
"... ... ... ... \n", | |
"537547 2.0 15.0 11543 \n", | |
"537548 2.0 15.0 20924 \n", | |
"537549 8.0 14.0 13082 \n", | |
"537550 8.0 14.0 11658 \n", | |
"537551 6.0 8.0 6863 \n", | |
"537552 8.0 8.0 16415 \n", | |
"537553 13.0 8.0 18526 \n", | |
"537554 14.0 8.0 7099 \n", | |
"537555 16.0 8.0 578 \n", | |
"537556 14.0 8.0 2183 \n", | |
"537557 14.0 8.0 12724 \n", | |
"537558 17.0 8.0 7796 \n", | |
"537559 17.0 8.0 7770 \n", | |
"537560 16.0 8.0 5940 \n", | |
"537561 16.0 8.0 7834 \n", | |
"537562 13.0 14.0 11508 \n", | |
"537563 13.0 14.0 6074 \n", | |
"537564 14.0 14.0 7084 \n", | |
"537565 14.0 14.0 7934 \n", | |
"537566 14.0 14.0 5350 \n", | |
"537567 14.0 14.0 1994 \n", | |
"537568 14.0 14.0 5930 \n", | |
"537569 14.0 14.0 7042 \n", | |
"537570 16.0 14.0 15491 \n", | |
"537571 2.0 5.0 11852 \n", | |
"537572 2.0 5.0 11664 \n", | |
"537573 15.0 16.0 19196 \n", | |
"537574 15.0 16.0 8043 \n", | |
"537575 15.0 16.0 7172 \n", | |
"537576 8.0 16.0 6875 \n", | |
"\n", | |
"[497270 rows x 12 columns]" | |
] | |
}, | |
"execution_count": 28, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"row_mask = df2.isin(values).any(1)\n", | |
"\n", | |
"df[row_mask]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### The where() method and masking\n", | |
"\n", | |
"\n", | |
"We can select values from a Series with a boolean vector and it returns a subset of the data. To guarantee that the output \n", | |
"has the same shape as the original data, we can use the where method in Series and DataFrame.\n", | |
"\n", | |
"\n", | |
"We can select values from a DataFrame with a boolean criterion. It also preserves input data shape.\n", | |
"\n", | |
"\n", | |
"The below code is equivalent to \n", | |
"\n", | |
"\n", | |
"`df2[df2==0]`\n", | |
"\n", | |
"\n", | |
"It replaces values with `NaN` where the condition is false. \n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>0.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>0.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>0.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>0.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>0.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>0.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"0 NaN NaN NaN NaN NaN NaN \n", | |
"1 NaN NaN NaN NaN NaN NaN \n", | |
"2 NaN NaN NaN NaN NaN NaN \n", | |
"3 NaN NaN NaN NaN NaN NaN \n", | |
"4 NaN NaN NaN NaN NaN NaN \n", | |
"5 NaN NaN NaN NaN NaN NaN \n", | |
"6 NaN NaN NaN NaN NaN NaN \n", | |
"7 NaN NaN NaN NaN NaN NaN \n", | |
"8 NaN NaN NaN NaN NaN NaN \n", | |
"9 NaN NaN NaN NaN NaN NaN \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"0 NaN 0.0 NaN \n", | |
"1 NaN 0.0 NaN \n", | |
"2 NaN 0.0 NaN \n", | |
"3 NaN 0.0 NaN \n", | |
"4 NaN 0.0 NaN \n", | |
"5 NaN 0.0 NaN \n", | |
"6 NaN NaN NaN \n", | |
"7 NaN NaN NaN \n", | |
"8 NaN NaN NaN \n", | |
"9 NaN NaN NaN \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 NaN NaN NaN \n", | |
"1 NaN NaN NaN \n", | |
"2 NaN NaN NaN \n", | |
"3 NaN NaN NaN \n", | |
"4 NaN NaN NaN \n", | |
"5 NaN NaN NaN \n", | |
"6 NaN NaN NaN \n", | |
"7 NaN NaN NaN \n", | |
"8 NaN NaN NaN \n", | |
"9 NaN NaN NaN " | |
] | |
}, | |
"execution_count": 29, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df2_where=df2.where(df2 == 0)\n", | |
"\n", | |
"\n", | |
"(df2_where).head(10)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Indexing with query() method\n", | |
"\n", | |
"\n", | |
"There is a **query()** method in the DataFrame objects that allows selection using an expression. This method queries the columns of a DataFrame with a boolean expression.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>165</th>\n", | |
" <td>1000033</td>\n", | |
" <td>P00111742</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>15</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>17391</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>304</th>\n", | |
" <td>1000053</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>0</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>3794</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>351</th>\n", | |
" <td>1000058</td>\n", | |
" <td>P00288642</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>2</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>16579</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>387</th>\n", | |
" <td>1000062</td>\n", | |
" <td>P00087242</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>3</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>14</td>\n", | |
" <td>12.0</td>\n", | |
" <td>6.0</td>\n", | |
" <td>11279</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>724</th>\n", | |
" <td>1000137</td>\n", | |
" <td>P00124642</td>\n", | |
" <td>F</td>\n", | |
" <td>46-50</td>\n", | |
" <td>6</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6.0</td>\n", | |
" <td>16828</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1179</th>\n", | |
" <td>1000195</td>\n", | |
" <td>P00183142</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>15</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>21224</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1356</th>\n", | |
" <td>1000216</td>\n", | |
" <td>P00281942</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>13</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>15</td>\n", | |
" <td>11.0</td>\n", | |
" <td>10.0</td>\n", | |
" <td>12953</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1766</th>\n", | |
" <td>1000284</td>\n", | |
" <td>P00180142</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>11</td>\n", | |
" <td>5.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1602</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1806</th>\n", | |
" <td>1000293</td>\n", | |
" <td>P00255842</td>\n", | |
" <td>M</td>\n", | |
" <td>55+</td>\n", | |
" <td>1</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>16875</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1935</th>\n", | |
" <td>1000308</td>\n", | |
" <td>P00031142</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>2</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>16712</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1967</th>\n", | |
" <td>1000314</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>F</td>\n", | |
" <td>55+</td>\n", | |
" <td>9</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>13.0</td>\n", | |
" <td>2356</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2025</th>\n", | |
" <td>1000324</td>\n", | |
" <td>P00312742</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>17</td>\n", | |
" <td>B</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>15</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>4198</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2174</th>\n", | |
" <td>1000338</td>\n", | |
" <td>P00111742</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>7</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>15</td>\n", | |
" <td>14.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>8750</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3226</th>\n", | |
" <td>1000529</td>\n", | |
" <td>P00111042</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>15</td>\n", | |
" <td>10.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>17045</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3351</th>\n", | |
" <td>1000544</td>\n", | |
" <td>P0094442</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>12</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>14.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>3079</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3506</th>\n", | |
" <td>1000566</td>\n", | |
" <td>P00111042</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>15</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>16661</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3851</th>\n", | |
" <td>1000637</td>\n", | |
" <td>P00124342</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>12</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>11</td>\n", | |
" <td>9.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>7566</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3944</th>\n", | |
" <td>1000648</td>\n", | |
" <td>P00298442</td>\n", | |
" <td>F</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>12635</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4380</th>\n", | |
" <td>1000720</td>\n", | |
" <td>P00020542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>0</td>\n", | |
" <td>B</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>897</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5080</th>\n", | |
" <td>1000833</td>\n", | |
" <td>P00288642</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>7</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>15.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>12403</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5113</th>\n", | |
" <td>1000839</td>\n", | |
" <td>P00046942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>0</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>16</td>\n", | |
" <td>15.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16421</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5174</th>\n", | |
" <td>1000850</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>0</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>11.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>2365</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5361</th>\n", | |
" <td>1000875</td>\n", | |
" <td>P00355642</td>\n", | |
" <td>M</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>16</td>\n", | |
" <td>13.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>20963</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5909</th>\n", | |
" <td>1000957</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>C</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>3059</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6067</th>\n", | |
" <td>1000984</td>\n", | |
" <td>P00313742</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>16</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>3023</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7916</th>\n", | |
" <td>1001227</td>\n", | |
" <td>P00061442</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>17</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>12767</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8320</th>\n", | |
" <td>1001281</td>\n", | |
" <td>P00276142</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>19</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1717</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8321</th>\n", | |
" <td>1001283</td>\n", | |
" <td>P0097142</td>\n", | |
" <td>F</td>\n", | |
" <td>18-25</td>\n", | |
" <td>1</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1759</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8772</th>\n", | |
" <td>1001340</td>\n", | |
" <td>P00001842</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>7</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>16</td>\n", | |
" <td>11.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>4107</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8882</th>\n", | |
" <td>1001357</td>\n", | |
" <td>P00326742</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>11</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6076</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>529816</th>\n", | |
" <td>1003622</td>\n", | |
" <td>P00113642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>6.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>8128</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>530284</th>\n", | |
" <td>1003688</td>\n", | |
" <td>P00119242</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>6</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1590</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>530289</th>\n", | |
" <td>1003688</td>\n", | |
" <td>P00115842</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>6</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>16312</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>530711</th>\n", | |
" <td>1003747</td>\n", | |
" <td>P00355642</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>13</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>16868</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>530889</th>\n", | |
" <td>1003769</td>\n", | |
" <td>P00115842</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>B</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>16</td>\n", | |
" <td>10.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>20568</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>530897</th>\n", | |
" <td>1003769</td>\n", | |
" <td>P00136442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>B</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>14</td>\n", | |
" <td>13.0</td>\n", | |
" <td>10.0</td>\n", | |
" <td>15227</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>531004</th>\n", | |
" <td>1003780</td>\n", | |
" <td>P00111042</td>\n", | |
" <td>M</td>\n", | |
" <td>0-17</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>15</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>21551</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>531339</th>\n", | |
" <td>1003824</td>\n", | |
" <td>P00271442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>3107</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>531533</th>\n", | |
" <td>1003841</td>\n", | |
" <td>P00152542</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>18</td>\n", | |
" <td>A</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>16</td>\n", | |
" <td>15.0</td>\n", | |
" <td>13.0</td>\n", | |
" <td>12791</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>531540</th>\n", | |
" <td>1003841</td>\n", | |
" <td>P00246842</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>18</td>\n", | |
" <td>A</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>17</td>\n", | |
" <td>16.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>13201</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>531542</th>\n", | |
" <td>1003841</td>\n", | |
" <td>P00313742</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>18</td>\n", | |
" <td>A</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>3786</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>531543</th>\n", | |
" <td>1003841</td>\n", | |
" <td>P00112042</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>18</td>\n", | |
" <td>A</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>3869</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>531998</th>\n", | |
" <td>1003910</td>\n", | |
" <td>P00116742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>11</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>5935</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>532279</th>\n", | |
" <td>1003958</td>\n", | |
" <td>P00174842</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>17</td>\n", | |
" <td>11.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>13231</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>532582</th>\n", | |
" <td>1004001</td>\n", | |
" <td>P00298742</td>\n", | |
" <td>F</td>\n", | |
" <td>26-35</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>14</td>\n", | |
" <td>11.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14737</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>532784</th>\n", | |
" <td>1004024</td>\n", | |
" <td>P00002342</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>5</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>4610</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>533119</th>\n", | |
" <td>1004064</td>\n", | |
" <td>P00119242</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>0</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>15.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>2346</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>533703</th>\n", | |
" <td>1004150</td>\n", | |
" <td>P00327642</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>0</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>3825</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>533880</th>\n", | |
" <td>1004187</td>\n", | |
" <td>P00111742</td>\n", | |
" <td>F</td>\n", | |
" <td>55+</td>\n", | |
" <td>20</td>\n", | |
" <td>C</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>15</td>\n", | |
" <td>14.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>16649</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>533881</th>\n", | |
" <td>1004187</td>\n", | |
" <td>P00100542</td>\n", | |
" <td>F</td>\n", | |
" <td>55+</td>\n", | |
" <td>20</td>\n", | |
" <td>C</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>14.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>20307</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>534320</th>\n", | |
" <td>1004271</td>\n", | |
" <td>P00344142</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>1565</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>535677</th>\n", | |
" <td>1004452</td>\n", | |
" <td>P00296242</td>\n", | |
" <td>F</td>\n", | |
" <td>26-35</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>6.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>8065</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>535679</th>\n", | |
" <td>1004452</td>\n", | |
" <td>P00263542</td>\n", | |
" <td>F</td>\n", | |
" <td>26-35</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>16</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>12547</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>535813</th>\n", | |
" <td>1004472</td>\n", | |
" <td>P00307142</td>\n", | |
" <td>F</td>\n", | |
" <td>46-50</td>\n", | |
" <td>16</td>\n", | |
" <td>B</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>6.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6045</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>536467</th>\n", | |
" <td>1004560</td>\n", | |
" <td>P00054042</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>3136</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>536596</th>\n", | |
" <td>1004579</td>\n", | |
" <td>P00112042</td>\n", | |
" <td>F</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>3074</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>536944</th>\n", | |
" <td>1004644</td>\n", | |
" <td>P00327642</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>15.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>3127</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537361</th>\n", | |
" <td>1004708</td>\n", | |
" <td>P00344142</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>0</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6.0</td>\n", | |
" <td>2351</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537410</th>\n", | |
" <td>1004725</td>\n", | |
" <td>P00123742</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>5</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>11</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>4690</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537417</th>\n", | |
" <td>1004725</td>\n", | |
" <td>P00349142</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>5</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>11</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>4645</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>2243 rows × 12 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"165 1000033 P00111742 M 46-50 3 A \n", | |
"304 1000053 P00117542 M 26-35 0 B \n", | |
"351 1000058 P00288642 M 26-35 2 B \n", | |
"387 1000062 P00087242 F 36-45 3 A \n", | |
"724 1000137 P00124642 F 46-50 6 C \n", | |
"1179 1000195 P00183142 M 26-35 12 B \n", | |
"1356 1000216 P00281942 M 46-50 13 B \n", | |
"1766 1000284 P00180142 M 26-35 12 B \n", | |
"1806 1000293 P00255842 M 55+ 1 C \n", | |
"1935 1000308 P00031142 M 26-35 2 A \n", | |
"1967 1000314 P00117542 F 55+ 9 C \n", | |
"2025 1000324 P00312742 M 36-45 17 B \n", | |
"2174 1000338 P00111742 M 36-45 7 A \n", | |
"3226 1000529 P00111042 M 36-45 12 C \n", | |
"3351 1000544 P0094442 M 36-45 12 B \n", | |
"3506 1000566 P00111042 M 26-35 17 A \n", | |
"3851 1000637 P00124342 M 36-45 12 B \n", | |
"3944 1000648 P00298442 F 26-35 12 B \n", | |
"4380 1000720 P00020542 M 18-25 0 B \n", | |
"5080 1000833 P00288642 M 36-45 7 C \n", | |
"5113 1000839 P00046942 M 26-35 0 A \n", | |
"5174 1000850 P00117542 M 36-45 0 A \n", | |
"5361 1000875 P00355642 M 0-17 10 C \n", | |
"5909 1000957 P00117542 M 36-45 1 C \n", | |
"6067 1000984 P00313742 M 51-55 16 A \n", | |
"7916 1001227 P00061442 M 51-55 1 B \n", | |
"8320 1001281 P00276142 M 26-35 19 C \n", | |
"8321 1001283 P0097142 F 18-25 1 C \n", | |
"8772 1001340 P00001842 M 26-35 7 A \n", | |
"8882 1001357 P00326742 M 46-50 0 C \n", | |
"... ... ... ... ... ... ... \n", | |
"529816 1003622 P00113642 M 18-25 1 B \n", | |
"530284 1003688 P00119242 M 18-25 6 B \n", | |
"530289 1003688 P00115842 M 18-25 6 B \n", | |
"530711 1003747 P00355642 M 46-50 13 C \n", | |
"530889 1003769 P00115842 M 26-35 15 B \n", | |
"530897 1003769 P00136442 M 26-35 15 B \n", | |
"531004 1003780 P00111042 M 0-17 0 C \n", | |
"531339 1003824 P00271442 M 26-35 17 A \n", | |
"531533 1003841 P00152542 M 46-50 18 A \n", | |
"531540 1003841 P00246842 M 46-50 18 A \n", | |
"531542 1003841 P00313742 M 46-50 18 A \n", | |
"531543 1003841 P00112042 M 46-50 18 A \n", | |
"531998 1003910 P00116742 M 26-35 20 A \n", | |
"532279 1003958 P00174842 M 46-50 7 C \n", | |
"532582 1004001 P00298742 F 26-35 1 B \n", | |
"532784 1004024 P00002342 M 26-35 5 B \n", | |
"533119 1004064 P00119242 M 46-50 0 A \n", | |
"533703 1004150 P00327642 M 26-35 0 B \n", | |
"533880 1004187 P00111742 F 55+ 20 C \n", | |
"533881 1004187 P00100542 F 55+ 20 C \n", | |
"534320 1004271 P00344142 M 36-45 7 B \n", | |
"535677 1004452 P00296242 F 26-35 7 B \n", | |
"535679 1004452 P00263542 F 26-35 7 B \n", | |
"535813 1004472 P00307142 F 46-50 16 B \n", | |
"536467 1004560 P00054042 M 26-35 0 C \n", | |
"536596 1004579 P00112042 F 18-25 4 B \n", | |
"536944 1004644 P00327642 M 51-55 1 C \n", | |
"537361 1004708 P00344142 M 26-35 0 B \n", | |
"537410 1004725 P00123742 M 36-45 5 A \n", | |
"537417 1004725 P00349142 M 36-45 5 A \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"165 1 1 15 \n", | |
"304 1 0 18 \n", | |
"351 3 0 16 \n", | |
"387 1 0 14 \n", | |
"724 4+ 1 16 \n", | |
"1179 4+ 1 15 \n", | |
"1356 1 0 15 \n", | |
"1766 3 1 11 \n", | |
"1806 1 1 16 \n", | |
"1935 3 1 16 \n", | |
"1967 1 0 18 \n", | |
"2025 0 1 15 \n", | |
"2174 1 0 15 \n", | |
"3226 1 0 15 \n", | |
"3351 2 0 18 \n", | |
"3506 3 0 15 \n", | |
"3851 2 1 11 \n", | |
"3944 1 1 16 \n", | |
"4380 0 0 18 \n", | |
"5080 1 1 16 \n", | |
"5113 2 0 16 \n", | |
"5174 3 1 18 \n", | |
"5361 4+ 0 16 \n", | |
"5909 2 1 18 \n", | |
"6067 2 1 18 \n", | |
"7916 2 0 17 \n", | |
"8320 1 0 12 \n", | |
"8321 3 0 12 \n", | |
"8772 2 0 16 \n", | |
"8882 1 1 11 \n", | |
"... ... ... ... \n", | |
"529816 2 1 8 \n", | |
"530284 1 0 18 \n", | |
"530289 1 0 16 \n", | |
"530711 1 1 16 \n", | |
"530889 0 0 16 \n", | |
"530897 0 0 14 \n", | |
"531004 0 0 15 \n", | |
"531339 3 1 18 \n", | |
"531533 4+ 0 16 \n", | |
"531540 4+ 0 17 \n", | |
"531542 4+ 0 18 \n", | |
"531543 4+ 0 18 \n", | |
"531998 2 0 11 \n", | |
"532279 1 0 17 \n", | |
"532582 1 1 14 \n", | |
"532784 3 1 16 \n", | |
"533119 1 1 18 \n", | |
"533703 3 0 18 \n", | |
"533880 2 1 15 \n", | |
"533881 2 1 16 \n", | |
"534320 2 0 18 \n", | |
"535677 2 1 8 \n", | |
"535679 2 1 16 \n", | |
"535813 0 1 8 \n", | |
"536467 1 0 18 \n", | |
"536596 1 1 18 \n", | |
"536944 4+ 0 18 \n", | |
"537361 3 0 18 \n", | |
"537410 2 0 11 \n", | |
"537417 2 0 11 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"165 8.0 5.0 17391 \n", | |
"304 16.0 5.0 3794 \n", | |
"351 14.0 12.0 16579 \n", | |
"387 12.0 6.0 11279 \n", | |
"724 14.0 6.0 16828 \n", | |
"1179 14.0 5.0 21224 \n", | |
"1356 11.0 10.0 12953 \n", | |
"1766 5.0 4.0 1602 \n", | |
"1806 14.0 9.0 16875 \n", | |
"1935 14.0 12.0 16712 \n", | |
"1967 17.0 13.0 2356 \n", | |
"2025 14.0 5.0 4198 \n", | |
"2174 14.0 12.0 8750 \n", | |
"3226 10.0 9.0 17045 \n", | |
"3351 14.0 4.0 3079 \n", | |
"3506 8.0 5.0 16661 \n", | |
"3851 9.0 5.0 7566 \n", | |
"3944 14.0 8.0 12635 \n", | |
"4380 17.0 15.0 897 \n", | |
"5080 15.0 5.0 12403 \n", | |
"5113 15.0 8.0 16421 \n", | |
"5174 11.0 8.0 2365 \n", | |
"5361 13.0 5.0 20963 \n", | |
"5909 16.0 12.0 3059 \n", | |
"6067 8.0 5.0 3023 \n", | |
"7916 16.0 8.0 12767 \n", | |
"8320 8.0 5.0 1717 \n", | |
"8321 8.0 5.0 1759 \n", | |
"8772 11.0 8.0 4107 \n", | |
"8882 8.0 5.0 6076 \n", | |
"... ... ... ... \n", | |
"529816 6.0 5.0 8128 \n", | |
"530284 16.0 5.0 1590 \n", | |
"530289 14.0 5.0 16312 \n", | |
"530711 14.0 5.0 16868 \n", | |
"530889 10.0 8.0 20568 \n", | |
"530897 13.0 10.0 15227 \n", | |
"531004 14.0 8.0 21551 \n", | |
"531339 16.0 5.0 3107 \n", | |
"531533 15.0 13.0 12791 \n", | |
"531540 16.0 15.0 13201 \n", | |
"531542 17.0 15.0 3786 \n", | |
"531543 17.0 15.0 3869 \n", | |
"531998 8.0 5.0 5935 \n", | |
"532279 11.0 5.0 13231 \n", | |
"532582 11.0 8.0 14737 \n", | |
"532784 14.0 5.0 4610 \n", | |
"533119 15.0 4.0 2346 \n", | |
"533703 17.0 14.0 3825 \n", | |
"533880 14.0 12.0 16649 \n", | |
"533881 14.0 12.0 20307 \n", | |
"534320 17.0 9.0 1565 \n", | |
"535677 6.0 5.0 8065 \n", | |
"535679 8.0 5.0 12547 \n", | |
"535813 6.0 5.0 6045 \n", | |
"536467 16.0 8.0 3136 \n", | |
"536596 17.0 14.0 3074 \n", | |
"536944 15.0 8.0 3127 \n", | |
"537361 14.0 6.0 2351 \n", | |
"537410 8.0 5.0 4690 \n", | |
"537417 8.0 5.0 4645 \n", | |
"\n", | |
"[2243 rows x 12 columns]" | |
] | |
}, | |
"execution_count": 30, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df2.query('(Product_Category_1 > Product_Category_2) & (Product_Category_2 > Product_Category_3)')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 14. Indexing and reindexing in pandas\n", | |
"\n", | |
"\n", | |
"Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match \n", | |
"a given set of labels along a particular axis.\n", | |
"\n", | |
"\n", | |
"Multiple operations can be accomplished through indexing like :−\n", | |
"\n", | |
"\n", | |
"- Reorder the existing data to match a new set of labels.\n", | |
"\n", | |
"\n", | |
"- Insert missing value (NA) markers in label locations where no data for the label existed." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Create a new dataframe\n", | |
"\n", | |
"\n", | |
"First of all, I will create a new dataframe as follows:- " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Place</th>\n", | |
" <th>Time</th>\n", | |
" <th>Food</th>\n", | |
" <th>Price($)</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>Home</td>\n", | |
" <td>Lunch</td>\n", | |
" <td>Soup</td>\n", | |
" <td>10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Home</td>\n", | |
" <td>Dinner</td>\n", | |
" <td>Rice</td>\n", | |
" <td>20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Hotel</td>\n", | |
" <td>Lunch</td>\n", | |
" <td>Soup</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>Hotel</td>\n", | |
" <td>Dinner</td>\n", | |
" <td>Chapati</td>\n", | |
" <td>40</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Place Time Food Price($)\n", | |
"0 Home Lunch Soup 10\n", | |
"1 Home Dinner Rice 20\n", | |
"2 Hotel Lunch Soup 30\n", | |
"3 Hotel Dinner Chapati 40" | |
] | |
}, | |
"execution_count": 31, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# let's create a new dataframe \n", | |
"\n", | |
"food = pd.DataFrame({'Place':['Home', 'Home', 'Hotel', 'Hotel'],\n", | |
" 'Time': ['Lunch', 'Dinner', 'Lunch', 'Dinner'],\n", | |
" 'Food':['Soup', 'Rice', 'Soup', 'Chapati'],\n", | |
" 'Price($)':[10, 20, 30, 40]})\n", | |
"\n", | |
"food" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Set an index \n", | |
"\n", | |
"\n", | |
"DataFrame has a **set_index()** method which takes a column name (for a regular Index) or a list of column names (for a MultiIndex). This method sets the dataframe index using existing columns.\n", | |
"\n", | |
"I will create a new, re-indexed DataFrame with **set_index()** method as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Time</th>\n", | |
" <th>Food</th>\n", | |
" <th>Price($)</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Place</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>Home</th>\n", | |
" <td>Lunch</td>\n", | |
" <td>Soup</td>\n", | |
" <td>10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Home</th>\n", | |
" <td>Dinner</td>\n", | |
" <td>Rice</td>\n", | |
" <td>20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Hotel</th>\n", | |
" <td>Lunch</td>\n", | |
" <td>Soup</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Hotel</th>\n", | |
" <td>Dinner</td>\n", | |
" <td>Chapati</td>\n", | |
" <td>40</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Time Food Price($)\n", | |
"Place \n", | |
"Home Lunch Soup 10\n", | |
"Home Dinner Rice 20\n", | |
"Hotel Lunch Soup 30\n", | |
"Hotel Dinner Chapati 40" | |
] | |
}, | |
"execution_count": 32, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"food_indexed1=food.set_index('Place')\n", | |
"\n", | |
"food_indexed1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th>Food</th>\n", | |
" <th>Price($)</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Place</th>\n", | |
" <th>Time</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">Home</th>\n", | |
" <th>Lunch</th>\n", | |
" <td>Soup</td>\n", | |
" <td>10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Dinner</th>\n", | |
" <td>Rice</td>\n", | |
" <td>20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">Hotel</th>\n", | |
" <th>Lunch</th>\n", | |
" <td>Soup</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Dinner</th>\n", | |
" <td>Chapati</td>\n", | |
" <td>40</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Food Price($)\n", | |
"Place Time \n", | |
"Home Lunch Soup 10\n", | |
" Dinner Rice 20\n", | |
"Hotel Lunch Soup 30\n", | |
" Dinner Chapati 40" | |
] | |
}, | |
"execution_count": 33, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"food_indexed2=food.set_index(['Place', 'Time'])\n", | |
"\n", | |
"food_indexed2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Reset the index\n", | |
"\n", | |
"\n", | |
"There is a function called **reset_index()** which transfers the index values into the DataFrame’s columns and sets a simple integer index. This is the inverse operation of set_index()." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Place</th>\n", | |
" <th>Time</th>\n", | |
" <th>Food</th>\n", | |
" <th>Price($)</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>Home</td>\n", | |
" <td>Lunch</td>\n", | |
" <td>Soup</td>\n", | |
" <td>10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Home</td>\n", | |
" <td>Dinner</td>\n", | |
" <td>Rice</td>\n", | |
" <td>20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Hotel</td>\n", | |
" <td>Lunch</td>\n", | |
" <td>Soup</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>Hotel</td>\n", | |
" <td>Dinner</td>\n", | |
" <td>Chapati</td>\n", | |
" <td>40</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Place Time Food Price($)\n", | |
"0 Home Lunch Soup 10\n", | |
"1 Home Dinner Rice 20\n", | |
"2 Hotel Lunch Soup 30\n", | |
"3 Hotel Dinner Chapati 40" | |
] | |
}, | |
"execution_count": 34, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"food_indexed2.reset_index()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 15. MultiIndex or Advanced indexing \n", | |
"\n", | |
"\n", | |
"In this section, I will explore indexing with a MultiIndex and other advanced indexing strategies.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Hierarchical indexing or MultiIndex\n", | |
"\n", | |
"\n", | |
"\n", | |
"The MultiIndex object is the hierarchical analogue of the standard index object which stores the axis labels in pandas objects. A MultiIndex is an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using **MultiIndex.from_arrays()**), an array of tuples (using **MultiIndex.from_tuples()**), a crossed set of iterables (using **MultiIndex.from_product()**), or a DataFrame (using **MultiIndex.from_frame()**). The Index constructor will attempt to return a MultiIndex when it is passed a list of tuples.\n", | |
"\n", | |
"\n", | |
"To demonstrate the concept of hierarchical or multiple indexing, first I will create a hypothetical dataframe as follows:- " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Items</th>\n", | |
" <th>Mode</th>\n", | |
" <th>Price</th>\n", | |
" <th>Profit</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>books</td>\n", | |
" <td>online</td>\n", | |
" <td>200</td>\n", | |
" <td>50</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>books</td>\n", | |
" <td>retail</td>\n", | |
" <td>250</td>\n", | |
" <td>75</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>toys</td>\n", | |
" <td>online</td>\n", | |
" <td>100</td>\n", | |
" <td>20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>toys</td>\n", | |
" <td>retail</td>\n", | |
" <td>140</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>watches</td>\n", | |
" <td>online</td>\n", | |
" <td>500</td>\n", | |
" <td>100</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>watches</td>\n", | |
" <td>retail</td>\n", | |
" <td>600</td>\n", | |
" <td>150</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>computers</td>\n", | |
" <td>online</td>\n", | |
" <td>1000</td>\n", | |
" <td>200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>computers</td>\n", | |
" <td>retail</td>\n", | |
" <td>1200</td>\n", | |
" <td>300</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>laptops</td>\n", | |
" <td>online</td>\n", | |
" <td>1100</td>\n", | |
" <td>400</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>laptops</td>\n", | |
" <td>retail</td>\n", | |
" <td>1400</td>\n", | |
" <td>500</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>smartphones</td>\n", | |
" <td>online</td>\n", | |
" <td>600</td>\n", | |
" <td>200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>smartphones</td>\n", | |
" <td>retail</td>\n", | |
" <td>800</td>\n", | |
" <td>250</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Items Mode Price Profit\n", | |
"0 books online 200 50\n", | |
"1 books retail 250 75\n", | |
"2 toys online 100 20\n", | |
"3 toys retail 140 30\n", | |
"4 watches online 500 100\n", | |
"5 watches retail 600 150\n", | |
"6 computers online 1000 200\n", | |
"7 computers retail 1200 300\n", | |
"8 laptops online 1100 400\n", | |
"9 laptops retail 1400 500\n", | |
"10 smartphones online 600 200\n", | |
"11 smartphones retail 800 250" | |
] | |
}, | |
"execution_count": 35, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sales=pd.DataFrame([['books','online', 200, 50],['books','retail', 250, 75], \n", | |
" ['toys','online', 100, 20],['toys','retail', 140, 30],\n", | |
" ['watches','online', 500, 100],['watches','retail', 600, 150],\n", | |
" ['computers','online', 1000, 200],['computers','retail', 1200, 300],\n", | |
" ['laptops','online', 1100, 400],['laptops','retail', 1400, 500],\n", | |
" ['smartphones','online', 600, 200],['smartphones','retail', 800, 250]],\n", | |
" columns=['Items', 'Mode', 'Price', 'Profit'])\n", | |
"\n", | |
"\n", | |
"sales" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Create the hierarchical index in pandas\n", | |
"\n", | |
"\n", | |
"We can create a hierarchical index in pandas using the **set_index()** function which is used for indexing. First the data is indexed on `Items` and then on `Mode` column as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th>Price</th>\n", | |
" <th>Profit</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Items</th>\n", | |
" <th>Mode</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">books</th>\n", | |
" <th>online</th>\n", | |
" <td>200</td>\n", | |
" <td>50</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <td>250</td>\n", | |
" <td>75</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">toys</th>\n", | |
" <th>online</th>\n", | |
" <td>100</td>\n", | |
" <td>20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <td>140</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">watches</th>\n", | |
" <th>online</th>\n", | |
" <td>500</td>\n", | |
" <td>100</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <td>600</td>\n", | |
" <td>150</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">computers</th>\n", | |
" <th>online</th>\n", | |
" <td>1000</td>\n", | |
" <td>200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <td>1200</td>\n", | |
" <td>300</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">laptops</th>\n", | |
" <th>online</th>\n", | |
" <td>1100</td>\n", | |
" <td>400</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <td>1400</td>\n", | |
" <td>500</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">smartphones</th>\n", | |
" <th>online</th>\n", | |
" <td>600</td>\n", | |
" <td>200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <td>800</td>\n", | |
" <td>250</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Price Profit\n", | |
"Items Mode \n", | |
"books online 200 50\n", | |
" retail 250 75\n", | |
"toys online 100 20\n", | |
" retail 140 30\n", | |
"watches online 500 100\n", | |
" retail 600 150\n", | |
"computers online 1000 200\n", | |
" retail 1200 300\n", | |
"laptops online 1100 400\n", | |
" retail 1400 500\n", | |
"smartphones online 600 200\n", | |
" retail 800 250" | |
] | |
}, | |
"execution_count": 36, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sales1=sales.set_index(['Items', 'Mode'])\n", | |
"\n", | |
"sales1" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" The resultant dataframe will be a hierarchical dataframe as shown above." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### View index in hierarchical index\n", | |
"\n", | |
"\n", | |
"One can view the details of index as shown below:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 37, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"MultiIndex(levels=[['books', 'computers', 'laptops', 'smartphones', 'toys', 'watches'], ['online', 'retail']],\n", | |
" labels=[[0, 0, 4, 4, 5, 5, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],\n", | |
" names=['Items', 'Mode'])" | |
] | |
}, | |
"execution_count": 37, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# View index\n", | |
"\n", | |
"sales1.index" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Swap the column in hierarchical index\n", | |
"\n", | |
"\n", | |
"Now, I will swap the \"Items\" and \"Mode\" columns in the above hierarchical dataframe as shown below:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th>Price</th>\n", | |
" <th>Profit</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Mode</th>\n", | |
" <th>Items</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>online</th>\n", | |
" <th>books</th>\n", | |
" <td>200</td>\n", | |
" <td>50</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <th>books</th>\n", | |
" <td>250</td>\n", | |
" <td>75</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>online</th>\n", | |
" <th>toys</th>\n", | |
" <td>100</td>\n", | |
" <td>20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <th>toys</th>\n", | |
" <td>140</td>\n", | |
" <td>30</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>online</th>\n", | |
" <th>watches</th>\n", | |
" <td>500</td>\n", | |
" <td>100</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <th>watches</th>\n", | |
" <td>600</td>\n", | |
" <td>150</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>online</th>\n", | |
" <th>computers</th>\n", | |
" <td>1000</td>\n", | |
" <td>200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <th>computers</th>\n", | |
" <td>1200</td>\n", | |
" <td>300</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>online</th>\n", | |
" <th>laptops</th>\n", | |
" <td>1100</td>\n", | |
" <td>400</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <th>laptops</th>\n", | |
" <td>1400</td>\n", | |
" <td>500</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>online</th>\n", | |
" <th>smartphones</th>\n", | |
" <td>600</td>\n", | |
" <td>200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>retail</th>\n", | |
" <th>smartphones</th>\n", | |
" <td>800</td>\n", | |
" <td>250</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Price Profit\n", | |
"Mode Items \n", | |
"online books 200 50\n", | |
"retail books 250 75\n", | |
"online toys 100 20\n", | |
"retail toys 140 30\n", | |
"online watches 500 100\n", | |
"retail watches 600 150\n", | |
"online computers 1000 200\n", | |
"retail computers 1200 300\n", | |
"online laptops 1100 400\n", | |
"retail laptops 1400 500\n", | |
"online smartphones 600 200\n", | |
"retail smartphones 800 250" | |
] | |
}, | |
"execution_count": 38, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Swap the column in multiple index\n", | |
"\n", | |
"sales2=sales1.swaplevel('Mode', 'Items')\n", | |
"\n", | |
"sales2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 16. Sorting in pandas\n", | |
"\n", | |
"\n", | |
"Pandas provides two kinds of sorting. They are:-\n", | |
"\n", | |
"\n", | |
"- 1. Sorting by label\n", | |
"\n", | |
"- 2. Sorting by actual value\n", | |
"\n", | |
"\n", | |
"They are described below:-\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 1. Sorting by label\n", | |
"\n", | |
"\n", | |
"We can use the **sort_index()** method to sort the object by labels. DataFrame can be sorted by passing the axis arguments and the order of sorting. By default, sorting is done on row labels in ascending order.\n", | |
"\n", | |
"\n", | |
"The following examples illustrate the idea of sorting by label." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 39, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00069042</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8370</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00248942</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00087842</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1422</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00085442</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1057</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1000002</td>\n", | |
" <td>P00285442</td>\n", | |
" <td>M</td>\n", | |
" <td>55+</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7969</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>1000003</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15227</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P00184942</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>19215</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P00346142</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>15854</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P0097242</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>15686</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00274942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>7871</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00251242</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>5254</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00014542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>3957</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00031342</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>6073</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00145042</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>15665</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00231342</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5378</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00190242</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>4</td>\n", | |
" <td>5.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>2079</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P0096642</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>2</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>13055</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00058442</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>8851</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>1000007</td>\n", | |
" <td>P00036842</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>14.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>11788</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00249542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>19614</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00220442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>8584</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00156442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>9872</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00213742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>9743</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00214442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>5982</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00303442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11927</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00135742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>6</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>16662</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00039942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5887</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00161442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6973</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00078742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5391</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>1000010</td>\n", | |
" <td>P00085942</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>4.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16352</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537547</th>\n", | |
" <td>1004733</td>\n", | |
" <td>P00244042</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>18</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>11543</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537548</th>\n", | |
" <td>1004734</td>\n", | |
" <td>P00111042</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>15</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>20924</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537549</th>\n", | |
" <td>1004734</td>\n", | |
" <td>P00345842</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>13082</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537550</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00278242</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11658</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537551</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00313442</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>6863</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537552</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P0098642</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>6</td>\n", | |
" <td>8.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16415</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537553</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00119342</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>10</td>\n", | |
" <td>13.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>18526</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537554</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00114042</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7099</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537555</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00135142</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>13</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>578</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537556</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00194542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>2183</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537557</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00175242</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>12724</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537558</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00101942</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>17.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7796</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537559</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00109142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>17.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7770</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537560</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00084842</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5940</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537561</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00078142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7834</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537562</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00146742</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>13.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11508</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537563</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00154642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>13.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6074</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537564</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00117442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7084</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537565</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00051142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7934</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537566</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00048742</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5350</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537567</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00157542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1994</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537568</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00250642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>11</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5930</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537569</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00023142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7042</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537570</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00162442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15491</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537571</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00221442</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>11852</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537572</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>11664</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537573</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00111142</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>19196</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537574</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00345942</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8043</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537575</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00285842</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>7172</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537576</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00118242</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>6875</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>537577 rows × 12 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"0 1000001 P00069042 F 0-17 10 A \n", | |
"1 1000001 P00248942 F 0-17 10 A \n", | |
"2 1000001 P00087842 F 0-17 10 A \n", | |
"3 1000001 P00085442 F 0-17 10 A \n", | |
"4 1000002 P00285442 M 55+ 16 C \n", | |
"5 1000003 P00193542 M 26-35 15 A \n", | |
"6 1000004 P00184942 M 46-50 7 B \n", | |
"7 1000004 P00346142 M 46-50 7 B \n", | |
"8 1000004 P0097242 M 46-50 7 B \n", | |
"9 1000005 P00274942 M 26-35 20 A \n", | |
"10 1000005 P00251242 M 26-35 20 A \n", | |
"11 1000005 P00014542 M 26-35 20 A \n", | |
"12 1000005 P00031342 M 26-35 20 A \n", | |
"13 1000005 P00145042 M 26-35 20 A \n", | |
"14 1000006 P00231342 F 51-55 9 A \n", | |
"15 1000006 P00190242 F 51-55 9 A \n", | |
"16 1000006 P0096642 F 51-55 9 A \n", | |
"17 1000006 P00058442 F 51-55 9 A \n", | |
"18 1000007 P00036842 M 36-45 1 B \n", | |
"19 1000008 P00249542 M 26-35 12 C \n", | |
"20 1000008 P00220442 M 26-35 12 C \n", | |
"21 1000008 P00156442 M 26-35 12 C \n", | |
"22 1000008 P00213742 M 26-35 12 C \n", | |
"23 1000008 P00214442 M 26-35 12 C \n", | |
"24 1000008 P00303442 M 26-35 12 C \n", | |
"25 1000009 P00135742 M 26-35 17 C \n", | |
"26 1000009 P00039942 M 26-35 17 C \n", | |
"27 1000009 P00161442 M 26-35 17 C \n", | |
"28 1000009 P00078742 M 26-35 17 C \n", | |
"29 1000010 P00085942 F 36-45 1 B \n", | |
"... ... ... ... ... ... ... \n", | |
"537547 1004733 P00244042 M 18-25 18 C \n", | |
"537548 1004734 P00111042 M 51-55 1 B \n", | |
"537549 1004734 P00345842 M 51-55 1 B \n", | |
"537550 1004735 P00278242 M 46-50 3 C \n", | |
"537551 1004735 P00313442 M 46-50 3 C \n", | |
"537552 1004735 P0098642 M 46-50 3 C \n", | |
"537553 1004735 P00119342 M 46-50 3 C \n", | |
"537554 1004735 P00114042 M 46-50 3 C \n", | |
"537555 1004735 P00135142 M 46-50 3 C \n", | |
"537556 1004736 P00194542 M 18-25 20 A \n", | |
"537557 1004736 P00175242 M 18-25 20 A \n", | |
"537558 1004736 P00101942 M 18-25 20 A \n", | |
"537559 1004736 P00109142 M 18-25 20 A \n", | |
"537560 1004736 P00084842 M 18-25 20 A \n", | |
"537561 1004736 P00078142 M 18-25 20 A \n", | |
"537562 1004736 P00146742 M 18-25 20 A \n", | |
"537563 1004736 P00154642 M 18-25 20 A \n", | |
"537564 1004736 P00117442 M 18-25 20 A \n", | |
"537565 1004736 P00051142 M 18-25 20 A \n", | |
"537566 1004736 P00048742 M 18-25 20 A \n", | |
"537567 1004736 P00157542 M 18-25 20 A \n", | |
"537568 1004736 P00250642 M 18-25 20 A \n", | |
"537569 1004736 P00023142 M 18-25 20 A \n", | |
"537570 1004736 P00162442 M 18-25 20 A \n", | |
"537571 1004737 P00221442 M 36-45 16 C \n", | |
"537572 1004737 P00193542 M 36-45 16 C \n", | |
"537573 1004737 P00111142 M 36-45 16 C \n", | |
"537574 1004737 P00345942 M 36-45 16 C \n", | |
"537575 1004737 P00285842 M 36-45 16 C \n", | |
"537576 1004737 P00118242 M 36-45 16 C \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"0 2 0 3 \n", | |
"1 2 0 1 \n", | |
"2 2 0 12 \n", | |
"3 2 0 12 \n", | |
"4 4+ 0 8 \n", | |
"5 3 0 1 \n", | |
"6 2 1 1 \n", | |
"7 2 1 1 \n", | |
"8 2 1 1 \n", | |
"9 1 1 8 \n", | |
"10 1 1 5 \n", | |
"11 1 1 8 \n", | |
"12 1 1 8 \n", | |
"13 1 1 1 \n", | |
"14 1 0 5 \n", | |
"15 1 0 4 \n", | |
"16 1 0 2 \n", | |
"17 1 0 5 \n", | |
"18 1 1 1 \n", | |
"19 4+ 1 1 \n", | |
"20 4+ 1 5 \n", | |
"21 4+ 1 8 \n", | |
"22 4+ 1 8 \n", | |
"23 4+ 1 8 \n", | |
"24 4+ 1 1 \n", | |
"25 0 0 6 \n", | |
"26 0 0 8 \n", | |
"27 0 0 5 \n", | |
"28 0 0 5 \n", | |
"29 4+ 1 2 \n", | |
"... ... ... ... \n", | |
"537547 1 0 1 \n", | |
"537548 1 1 15 \n", | |
"537549 1 1 2 \n", | |
"537550 3 0 1 \n", | |
"537551 3 0 5 \n", | |
"537552 3 0 6 \n", | |
"537553 3 0 10 \n", | |
"537554 3 0 5 \n", | |
"537555 3 0 13 \n", | |
"537556 1 1 8 \n", | |
"537557 1 1 2 \n", | |
"537558 1 1 8 \n", | |
"537559 1 1 8 \n", | |
"537560 1 1 8 \n", | |
"537561 1 1 8 \n", | |
"537562 1 1 1 \n", | |
"537563 1 1 8 \n", | |
"537564 1 1 5 \n", | |
"537565 1 1 8 \n", | |
"537566 1 1 5 \n", | |
"537567 1 1 8 \n", | |
"537568 1 1 11 \n", | |
"537569 1 1 5 \n", | |
"537570 1 1 1 \n", | |
"537571 1 0 1 \n", | |
"537572 1 0 1 \n", | |
"537573 1 0 1 \n", | |
"537574 1 0 8 \n", | |
"537575 1 0 5 \n", | |
"537576 1 0 5 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 6.0 14.0 8370 \n", | |
"1 6.0 14.0 15200 \n", | |
"2 6.0 14.0 1422 \n", | |
"3 14.0 14.0 1057 \n", | |
"4 14.0 14.0 7969 \n", | |
"5 2.0 14.0 15227 \n", | |
"6 8.0 17.0 19215 \n", | |
"7 15.0 17.0 15854 \n", | |
"8 16.0 17.0 15686 \n", | |
"9 16.0 17.0 7871 \n", | |
"10 11.0 17.0 5254 \n", | |
"11 11.0 17.0 3957 \n", | |
"12 11.0 17.0 6073 \n", | |
"13 2.0 5.0 15665 \n", | |
"14 8.0 14.0 5378 \n", | |
"15 5.0 14.0 2079 \n", | |
"16 3.0 4.0 13055 \n", | |
"17 14.0 4.0 8851 \n", | |
"18 14.0 16.0 11788 \n", | |
"19 5.0 15.0 19614 \n", | |
"20 14.0 15.0 8584 \n", | |
"21 14.0 15.0 9872 \n", | |
"22 14.0 15.0 9743 \n", | |
"23 14.0 15.0 5982 \n", | |
"24 8.0 14.0 11927 \n", | |
"25 8.0 14.0 16662 \n", | |
"26 8.0 14.0 5887 \n", | |
"27 14.0 14.0 6973 \n", | |
"28 8.0 14.0 5391 \n", | |
"29 4.0 8.0 16352 \n", | |
"... ... ... ... \n", | |
"537547 2.0 15.0 11543 \n", | |
"537548 2.0 15.0 20924 \n", | |
"537549 8.0 14.0 13082 \n", | |
"537550 8.0 14.0 11658 \n", | |
"537551 6.0 8.0 6863 \n", | |
"537552 8.0 8.0 16415 \n", | |
"537553 13.0 8.0 18526 \n", | |
"537554 14.0 8.0 7099 \n", | |
"537555 16.0 8.0 578 \n", | |
"537556 14.0 8.0 2183 \n", | |
"537557 14.0 8.0 12724 \n", | |
"537558 17.0 8.0 7796 \n", | |
"537559 17.0 8.0 7770 \n", | |
"537560 16.0 8.0 5940 \n", | |
"537561 16.0 8.0 7834 \n", | |
"537562 13.0 14.0 11508 \n", | |
"537563 13.0 14.0 6074 \n", | |
"537564 14.0 14.0 7084 \n", | |
"537565 14.0 14.0 7934 \n", | |
"537566 14.0 14.0 5350 \n", | |
"537567 14.0 14.0 1994 \n", | |
"537568 14.0 14.0 5930 \n", | |
"537569 14.0 14.0 7042 \n", | |
"537570 16.0 14.0 15491 \n", | |
"537571 2.0 5.0 11852 \n", | |
"537572 2.0 5.0 11664 \n", | |
"537573 15.0 16.0 19196 \n", | |
"537574 15.0 16.0 8043 \n", | |
"537575 15.0 16.0 7172 \n", | |
"537576 8.0 16.0 6875 \n", | |
"\n", | |
"[537577 rows x 12 columns]" | |
] | |
}, | |
"execution_count": 39, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# sort the dataframe df2 by label\n", | |
"\n", | |
"df2.sort_index()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Order of sorting\n", | |
"\n", | |
"By passing the Boolean value to ascending parameter, the order of the sorting can be controlled. \n", | |
"\n", | |
"\n", | |
"\n", | |
"#### sort the dataframe df2 by label in reverse order\n", | |
"\n", | |
"`df2.sort_index(ascending=False)`\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Sorting by columns\n", | |
"\n", | |
"\n", | |
"By passing the axis argument with a value 0 or 1, the sorting can be done on the row or column labels. \n", | |
"\n", | |
"The default value of axis=0. In this case, sorting can be done by rows. \n", | |
"\n", | |
"If we set axis=1, sorting is done by columns.\n", | |
"\n", | |
"\n", | |
"#### sort the dataframe df2 by columns\n", | |
"\n", | |
"`df2.sort_index(axis=1)`\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 2. Sorting by values\n", | |
"\n", | |
"\n", | |
"The second method of sorting is sorting by values. Pandas provides **sort_values()** method to sort by values. It accepts a 'by' argument which will use the column name of the DataFrame with which the values are to be sorted.\n", | |
"\n", | |
"\n", | |
"The following example illustrates the idea:-\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>133832</th>\n", | |
" <td>1002649</td>\n", | |
" <td>P00114942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>19479</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>205338</th>\n", | |
" <td>1001676</td>\n", | |
" <td>P00155442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>11.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>7872</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>205336</th>\n", | |
" <td>1001676</td>\n", | |
" <td>P00251642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>4466</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429787</th>\n", | |
" <td>1000163</td>\n", | |
" <td>P00222942</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>4430</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429786</th>\n", | |
" <td>1000163</td>\n", | |
" <td>P00184442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>3912</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429785</th>\n", | |
" <td>1000163</td>\n", | |
" <td>P00030842</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>8133</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429784</th>\n", | |
" <td>1000163</td>\n", | |
" <td>P00182742</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11673</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>330703</th>\n", | |
" <td>1002986</td>\n", | |
" <td>P00016342</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>11598</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429783</th>\n", | |
" <td>1000163</td>\n", | |
" <td>P00354042</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>3923</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429782</th>\n", | |
" <td>1000163</td>\n", | |
" <td>P00237542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>19097</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429781</th>\n", | |
" <td>1000163</td>\n", | |
" <td>P00025442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>8284</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429773</th>\n", | |
" <td>1000162</td>\n", | |
" <td>P00334242</td>\n", | |
" <td>F</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>19102</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429772</th>\n", | |
" <td>1000162</td>\n", | |
" <td>P00182742</td>\n", | |
" <td>F</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15187</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>107002</th>\n", | |
" <td>1004458</td>\n", | |
" <td>P00070342</td>\n", | |
" <td>F</td>\n", | |
" <td>26-35</td>\n", | |
" <td>4</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11804</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>205330</th>\n", | |
" <td>1001675</td>\n", | |
" <td>P00143642</td>\n", | |
" <td>F</td>\n", | |
" <td>26-35</td>\n", | |
" <td>6</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>6.0</td>\n", | |
" <td>4282</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106995</th>\n", | |
" <td>1004458</td>\n", | |
" <td>P00338442</td>\n", | |
" <td>F</td>\n", | |
" <td>26-35</td>\n", | |
" <td>4</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>16.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>15772</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>330702</th>\n", | |
" <td>1002986</td>\n", | |
" <td>P0098342</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>7844</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106984</th>\n", | |
" <td>1004457</td>\n", | |
" <td>P00114942</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>14</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>19409</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>205341</th>\n", | |
" <td>1001676</td>\n", | |
" <td>P00221442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>15408</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429814</th>\n", | |
" <td>1000169</td>\n", | |
" <td>P00080342</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>15447</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>330700</th>\n", | |
" <td>1002986</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>11829</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106965</th>\n", | |
" <td>1004454</td>\n", | |
" <td>P00184442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>11439</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106968</th>\n", | |
" <td>1004454</td>\n", | |
" <td>P00112642</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>15795</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429804</th>\n", | |
" <td>1000166</td>\n", | |
" <td>P00345642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>7958</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106970</th>\n", | |
" <td>1004454</td>\n", | |
" <td>P00031842</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>5.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>4216</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429801</th>\n", | |
" <td>1000165</td>\n", | |
" <td>P00293842</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>16</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>15666</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106972</th>\n", | |
" <td>1004454</td>\n", | |
" <td>P00173842</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>4044</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106973</th>\n", | |
" <td>1004455</td>\n", | |
" <td>P00080342</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>15249</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>429799</th>\n", | |
" <td>1000165</td>\n", | |
" <td>P00201442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>16</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>15915</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>106976</th>\n", | |
" <td>1004455</td>\n", | |
" <td>P00183442</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>12010</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>38029</th>\n", | |
" <td>1005848</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>3857</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>335244</th>\n", | |
" <td>1003643</td>\n", | |
" <td>P00054042</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>12.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>3029</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>266269</th>\n", | |
" <td>1005012</td>\n", | |
" <td>P00058642</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>15</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>14.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>3769</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>360529</th>\n", | |
" <td>1001503</td>\n", | |
" <td>P00220942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>13.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>3117</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>239326</th>\n", | |
" <td>1000937</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>11.0</td>\n", | |
" <td>13.0</td>\n", | |
" <td>3862</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>534320</th>\n", | |
" <td>1004271</td>\n", | |
" <td>P00344142</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>1565</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>382999</th>\n", | |
" <td>1004918</td>\n", | |
" <td>P00054042</td>\n", | |
" <td>F</td>\n", | |
" <td>18-25</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>3136</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>421525</th>\n", | |
" <td>1004852</td>\n", | |
" <td>P00286042</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>2252</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>421526</th>\n", | |
" <td>1004852</td>\n", | |
" <td>P00325542</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>3041</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>421527</th>\n", | |
" <td>1004852</td>\n", | |
" <td>P00054042</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>2280</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>357843</th>\n", | |
" <td>1001140</td>\n", | |
" <td>P00327642</td>\n", | |
" <td>F</td>\n", | |
" <td>46-50</td>\n", | |
" <td>2</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>8.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>3800</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>115211</th>\n", | |
" <td>1005786</td>\n", | |
" <td>P00119242</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>6</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>3011</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>138949</th>\n", | |
" <td>1003490</td>\n", | |
" <td>P00058642</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>3082</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>478741</th>\n", | |
" <td>1001717</td>\n", | |
" <td>P00058642</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>6</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>6.0</td>\n", | |
" <td>3007</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>57997</th>\n", | |
" <td>1002945</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>15.0</td>\n", | |
" <td>18.0</td>\n", | |
" <td>2324</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>249091</th>\n", | |
" <td>1002348</td>\n", | |
" <td>P00344042</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1510</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>397412</th>\n", | |
" <td>1001181</td>\n", | |
" <td>P00037442</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>7</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>18.0</td>\n", | |
" <td>12.0</td>\n", | |
" <td>2274</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>193495</th>\n", | |
" <td>1005880</td>\n", | |
" <td>P00344142</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>1</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>16.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>1570</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>499138</th>\n", | |
" <td>1004852</td>\n", | |
" <td>P00281242</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>0</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>2367</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16263</th>\n", | |
" <td>1002496</td>\n", | |
" <td>P00327642</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>14.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>2347</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>349842</th>\n", | |
" <td>1005880</td>\n", | |
" <td>P00313742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>1</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>11.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>1645</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>115157</th>\n", | |
" <td>1005775</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>11</td>\n", | |
" <td>A</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>17.0</td>\n", | |
" <td>6.0</td>\n", | |
" <td>3771</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>499206</th>\n", | |
" <td>1004867</td>\n", | |
" <td>P00117542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>16</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>5.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>3763</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>335200</th>\n", | |
" <td>1003635</td>\n", | |
" <td>P00327642</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>3837</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>397461</th>\n", | |
" <td>1001182</td>\n", | |
" <td>P00313742</td>\n", | |
" <td>M</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>11.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>3141</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>312626</th>\n", | |
" <td>1000169</td>\n", | |
" <td>P00313742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>3113</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>352980</th>\n", | |
" <td>1000352</td>\n", | |
" <td>P00037442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>4</td>\n", | |
" <td>A</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>3119</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>239271</th>\n", | |
" <td>1000929</td>\n", | |
" <td>P00271542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>11.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>3105</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>460422</th>\n", | |
" <td>1004869</td>\n", | |
" <td>P00068442</td>\n", | |
" <td>F</td>\n", | |
" <td>46-50</td>\n", | |
" <td>6</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>1</td>\n", | |
" <td>18</td>\n", | |
" <td>8.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>3751</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>492245</th>\n", | |
" <td>1003810</td>\n", | |
" <td>P00119242</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>7</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>18</td>\n", | |
" <td>8.0</td>\n", | |
" <td>9.0</td>\n", | |
" <td>3888</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>537577 rows × 12 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"133832 1002649 P00114942 M 26-35 16 C \n", | |
"205338 1001676 P00155442 M 18-25 4 B \n", | |
"205336 1001676 P00251642 M 18-25 4 B \n", | |
"429787 1000163 P00222942 M 18-25 4 A \n", | |
"429786 1000163 P00184442 M 18-25 4 A \n", | |
"429785 1000163 P00030842 M 18-25 4 A \n", | |
"429784 1000163 P00182742 M 18-25 4 A \n", | |
"330703 1002986 P00016342 M 26-35 4 A \n", | |
"429783 1000163 P00354042 M 18-25 4 A \n", | |
"429782 1000163 P00237542 M 18-25 4 A \n", | |
"429781 1000163 P00025442 M 18-25 4 A \n", | |
"429773 1000162 P00334242 F 18-25 4 C \n", | |
"429772 1000162 P00182742 F 18-25 4 C \n", | |
"107002 1004458 P00070342 F 26-35 4 B \n", | |
"205330 1001675 P00143642 F 26-35 6 B \n", | |
"106995 1004458 P00338442 F 26-35 4 B \n", | |
"330702 1002986 P0098342 M 26-35 4 A \n", | |
"106984 1004457 P00114942 M 36-45 14 C \n", | |
"205341 1001676 P00221442 M 18-25 4 B \n", | |
"429814 1000169 P00080342 M 26-35 7 B \n", | |
"330700 1002986 P00193542 M 26-35 4 A \n", | |
"106965 1004454 P00184442 M 26-35 20 C \n", | |
"106968 1004454 P00112642 M 26-35 20 C \n", | |
"429804 1000166 P00345642 M 18-25 4 B \n", | |
"106970 1004454 P00031842 M 26-35 20 C \n", | |
"429801 1000165 P00293842 M 18-25 16 A \n", | |
"106972 1004454 P00173842 M 26-35 20 C \n", | |
"106973 1004455 P00080342 M 36-45 7 B \n", | |
"429799 1000165 P00201442 M 18-25 16 A \n", | |
"106976 1004455 P00183442 M 36-45 7 B \n", | |
"... ... ... ... ... ... ... \n", | |
"38029 1005848 P00117542 M 51-55 20 A \n", | |
"335244 1003643 P00054042 M 26-35 17 C \n", | |
"266269 1005012 P00058642 M 51-55 15 A \n", | |
"360529 1001503 P00220942 M 26-35 12 A \n", | |
"239326 1000937 P00117542 M 26-35 15 A \n", | |
"534320 1004271 P00344142 M 36-45 7 B \n", | |
"382999 1004918 P00054042 F 18-25 12 C \n", | |
"421525 1004852 P00286042 M 36-45 0 C \n", | |
"421526 1004852 P00325542 M 36-45 0 C \n", | |
"421527 1004852 P00054042 M 36-45 0 C \n", | |
"357843 1001140 P00327642 F 46-50 2 B \n", | |
"115211 1005786 P00119242 M 26-35 6 B \n", | |
"138949 1003490 P00058642 M 26-35 0 C \n", | |
"478741 1001717 P00058642 F 51-55 6 B \n", | |
"57997 1002945 P00117542 F 36-45 17 C \n", | |
"249091 1002348 P00344042 M 51-55 12 C \n", | |
"397412 1001181 P00037442 M 36-45 7 A \n", | |
"193495 1005880 P00344142 M 26-35 1 A \n", | |
"499138 1004852 P00281242 M 36-45 0 C \n", | |
"16263 1002496 P00327642 M 51-55 1 B \n", | |
"349842 1005880 P00313742 M 26-35 1 A \n", | |
"115157 1005775 P00117542 M 26-35 11 A \n", | |
"499206 1004867 P00117542 M 26-35 16 A \n", | |
"335200 1003635 P00327642 M 51-55 1 C \n", | |
"397461 1001182 P00313742 M 0-17 10 B \n", | |
"312626 1000169 P00313742 M 26-35 7 B \n", | |
"352980 1000352 P00037442 M 18-25 4 A \n", | |
"239271 1000929 P00271542 M 26-35 15 A \n", | |
"460422 1004869 P00068442 F 46-50 6 C \n", | |
"492245 1003810 P00119242 M 51-55 7 C \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"133832 2 0 1 \n", | |
"205338 4+ 0 1 \n", | |
"205336 4+ 0 1 \n", | |
"429787 1 0 1 \n", | |
"429786 1 0 1 \n", | |
"429785 1 0 1 \n", | |
"429784 1 0 1 \n", | |
"330703 2 1 1 \n", | |
"429783 1 0 1 \n", | |
"429782 1 0 1 \n", | |
"429781 1 0 1 \n", | |
"429773 4+ 0 1 \n", | |
"429772 4+ 0 1 \n", | |
"107002 1 0 1 \n", | |
"205330 1 1 1 \n", | |
"106995 1 0 1 \n", | |
"330702 2 1 1 \n", | |
"106984 1 0 1 \n", | |
"205341 4+ 0 1 \n", | |
"429814 3 0 1 \n", | |
"330700 2 1 1 \n", | |
"106965 1 0 1 \n", | |
"106968 1 0 1 \n", | |
"429804 1 1 1 \n", | |
"106970 1 0 1 \n", | |
"429801 1 0 1 \n", | |
"106972 1 0 1 \n", | |
"106973 1 0 1 \n", | |
"429799 1 0 1 \n", | |
"106976 1 0 1 \n", | |
"... ... ... ... \n", | |
"38029 0 1 18 \n", | |
"335244 1 0 18 \n", | |
"266269 1 1 18 \n", | |
"360529 2 0 18 \n", | |
"239326 2 1 18 \n", | |
"534320 2 0 18 \n", | |
"382999 1 0 18 \n", | |
"421525 3 1 18 \n", | |
"421526 3 1 18 \n", | |
"421527 3 1 18 \n", | |
"357843 2 1 18 \n", | |
"115211 1 1 18 \n", | |
"138949 4+ 1 18 \n", | |
"478741 1 0 18 \n", | |
"57997 4+ 1 18 \n", | |
"249091 4+ 1 18 \n", | |
"397412 3 1 18 \n", | |
"193495 1 1 18 \n", | |
"499138 3 1 18 \n", | |
"16263 1 0 18 \n", | |
"349842 1 1 18 \n", | |
"115157 4+ 0 18 \n", | |
"499206 3 0 18 \n", | |
"335200 1 0 18 \n", | |
"397461 3 0 18 \n", | |
"312626 3 0 18 \n", | |
"352980 0 0 18 \n", | |
"239271 1 0 18 \n", | |
"460422 3 1 18 \n", | |
"492245 1 0 18 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"133832 15.0 16.0 19479 \n", | |
"205338 11.0 15.0 7872 \n", | |
"205336 2.0 4.0 4466 \n", | |
"429787 2.0 8.0 4430 \n", | |
"429786 6.0 8.0 3912 \n", | |
"429785 2.0 15.0 8133 \n", | |
"429784 2.0 14.0 11673 \n", | |
"330703 2.0 8.0 11598 \n", | |
"429783 6.0 16.0 3923 \n", | |
"429782 15.0 16.0 19097 \n", | |
"429781 2.0 9.0 8284 \n", | |
"429773 8.0 14.0 19102 \n", | |
"429772 2.0 14.0 15187 \n", | |
"107002 2.0 14.0 11804 \n", | |
"205330 2.0 6.0 4282 \n", | |
"106995 16.0 9.0 15772 \n", | |
"330702 2.0 5.0 7844 \n", | |
"106984 15.0 16.0 19409 \n", | |
"205341 2.0 5.0 15408 \n", | |
"429814 6.0 8.0 15447 \n", | |
"330700 2.0 5.0 11829 \n", | |
"106965 6.0 8.0 11439 \n", | |
"106968 2.0 5.0 15795 \n", | |
"429804 15.0 16.0 7958 \n", | |
"106970 5.0 12.0 4216 \n", | |
"429801 2.0 9.0 15666 \n", | |
"106972 2.0 15.0 4044 \n", | |
"106973 6.0 8.0 15249 \n", | |
"429799 6.0 8.0 15915 \n", | |
"106976 2.0 8.0 12010 \n", | |
"... ... ... ... \n", | |
"38029 16.0 8.0 3857 \n", | |
"335244 12.0 15.0 3029 \n", | |
"266269 14.0 9.0 3769 \n", | |
"360529 13.0 16.0 3117 \n", | |
"239326 11.0 13.0 3862 \n", | |
"534320 17.0 9.0 1565 \n", | |
"382999 14.0 15.0 3136 \n", | |
"421525 2.0 14.0 2252 \n", | |
"421526 2.0 14.0 3041 \n", | |
"421527 2.0 14.0 2280 \n", | |
"357843 8.0 17.0 3800 \n", | |
"115211 16.0 5.0 3011 \n", | |
"138949 17.0 16.0 3082 \n", | |
"478741 17.0 6.0 3007 \n", | |
"57997 15.0 18.0 2324 \n", | |
"249091 8.0 14.0 1510 \n", | |
"397412 18.0 12.0 2274 \n", | |
"193495 16.0 16.0 1570 \n", | |
"499138 6.0 8.0 2367 \n", | |
"16263 14.0 4.0 2347 \n", | |
"349842 11.0 15.0 1645 \n", | |
"115157 17.0 6.0 3771 \n", | |
"499206 5.0 14.0 3763 \n", | |
"335200 2.0 14.0 3837 \n", | |
"397461 11.0 14.0 3141 \n", | |
"312626 2.0 15.0 3113 \n", | |
"352980 2.0 5.0 3119 \n", | |
"239271 11.0 16.0 3105 \n", | |
"460422 8.0 15.0 3751 \n", | |
"492245 8.0 9.0 3888 \n", | |
"\n", | |
"[537577 rows x 12 columns]" | |
] | |
}, | |
"execution_count": 40, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df2.sort_values(by=['Product_Category_1'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Sort by multiple columns\n", | |
"\n", | |
"\n", | |
"`df2.sort_values(by=['Product_Category_1', 'Product_Category_2'])`\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"#### Sort in descending order\n", | |
"\n", | |
"\n", | |
"`df2.sort_values(by='Product_Category_1', ascending=False)`\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 17. Categorical data in pandas\n", | |
"\n", | |
"\n", | |
"We can check the data types of variables in the dataset with the following command:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID int64\n", | |
"Product_ID object\n", | |
"Gender object\n", | |
"Age object\n", | |
"Occupation int64\n", | |
"City_Category object\n", | |
"Stay_In_Current_City_Years object\n", | |
"Marital_Status int64\n", | |
"Product_Category_1 int64\n", | |
"Product_Category_2 float64\n", | |
"Product_Category_3 float64\n", | |
"Purchase int64\n", | |
"dtype: object" | |
] | |
}, | |
"execution_count": 41, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3 = df.copy()\n", | |
"\n", | |
"df3.dtypes" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that our dataset has 5 categorical variables. They are **Product_ID**, **Gender**, **Age**, **City_Category** and\n", | |
"**Stay_In_Current_City_Years**. They have data types as **object**.\n", | |
"\n", | |
"Now, I will explore these categorical variables." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Description of categorical data\n", | |
"\n", | |
"\n", | |
"The **describe()** method on categorical data will produce similar output to a Series or DataFrame of type string." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"count 537577\n", | |
"unique 2\n", | |
"top M\n", | |
"freq 405380\n", | |
"Name: Gender, dtype: object" | |
] | |
}, | |
"execution_count": 42, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['Gender'].describe()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The `Gender` category has 537577 counts, 2 unique values and frequency of top value M is 405380." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"count 537577\n", | |
"unique 7\n", | |
"top 26-35\n", | |
"freq 214690\n", | |
"Name: Age, dtype: object" | |
] | |
}, | |
"execution_count": 43, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['Age'].describe()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There are 7 unique categories in `Age` variable. The most frequent category is `26-35` with frequency count of 214690." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"count 537577\n", | |
"unique 3\n", | |
"top B\n", | |
"freq 226493\n", | |
"Name: City_Category, dtype: object" | |
] | |
}, | |
"execution_count": 44, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['City_Category'].describe()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There are 3 unique categories in `City_Category` variable. The most frequent category is `B` with frequency count of 226493." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Working with categorical data\n", | |
"\n", | |
"\n", | |
"Categorical data has a categories and a ordered property, which list their possible values and whether the ordering matters or not. These properties are exposed as `s.cat.categories` and `s.cat.ordered`. \n", | |
"\n", | |
"If we don't manually specify categories and ordering, they are inferred from the passed arguments.\n", | |
"\n", | |
"\n", | |
"`s.cat.categories`\n", | |
"\n", | |
"`s.cat.ordered`\n", | |
"\n", | |
"where `s` is a series object." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Unique values in categorical data\n", | |
"\n", | |
"\n", | |
"We can get the unique values in a series object by **unique()** method. It returns categories in the order of appearance, \n", | |
"and it only includes values that are actually present." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 45, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array(['F', 'M'], dtype=object)" | |
] | |
}, | |
"execution_count": 45, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['Gender'].unique()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array(['0-17', '55+', '26-35', '46-50', '51-55', '36-45', '18-25'],\n", | |
" dtype=object)" | |
] | |
}, | |
"execution_count": 46, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['Age'].unique()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Rename categories\n", | |
"\n", | |
"\n", | |
"Renaming categories is done by assigning new values to the `Series.cat.categories` property or by using the `rename_categories()` method.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Append new categories\n", | |
"\n", | |
"\n", | |
"Appending categories can be done by using the `add_categories()` method.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Remove categories\n", | |
"\n", | |
"\n", | |
"Removing categories can be done by using the `remove_categories()` method. Values which are removed are replaced by np.nan.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Setting categories\n", | |
"\n", | |
"\n", | |
"If we want to remove and add new categories in one step (which has some speed advantage), or simply set the categories to a predefined scale, we can use `set_categories()` method.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Reordering categories\n", | |
"\n", | |
"\n", | |
"Reordering the categories is possible via the `Categorical.reorder_categories()` and the `Categorical.set_categories()` methods. \n", | |
"\n", | |
"\n", | |
"\n", | |
"### Operations on categorical data \n", | |
"\n", | |
"\n", | |
"There are several operations like `Series.min()`, `Series.max()`, `Series.median()` and `Series.mode()` which are possible with categorical data. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Frequency counts of categorical data\n", | |
"\n", | |
"\n", | |
"Series methods like `Series.value_counts()` will return the frequency counts of the categories present in the series." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 47, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"M 405380\n", | |
"F 132197\n", | |
"Name: Gender, dtype: int64" | |
] | |
}, | |
"execution_count": 47, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['Gender'].value_counts()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 48, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"B 226493\n", | |
"C 166446\n", | |
"A 144638\n", | |
"Name: City_Category, dtype: int64" | |
] | |
}, | |
"execution_count": 48, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['City_Category'].value_counts()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`Series.value_counts()` will return the frequency counts of the categories in descending order. To get the categories in \n", | |
"ascending order we should set `ascending=True` as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 49, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"F 132197\n", | |
"M 405380\n", | |
"Name: Gender, dtype: int64" | |
] | |
}, | |
"execution_count": 49, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['Gender'].value_counts(ascending=True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 50, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"A 144638\n", | |
"C 166446\n", | |
"B 226493\n", | |
"Name: City_Category, dtype: int64" | |
] | |
}, | |
"execution_count": 50, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df3['City_Category'].value_counts(ascending=True)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 18. Basic functionality in pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Series basic functionality\n", | |
"\n", | |
"\n", | |
"\n", | |
"The following table lists the important attributes or methods in Series basic functionality.\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **axes** - Returns a list of the row axis labels\n", | |
"\n", | |
"\n", | |
"- **dtype** - Returns the dtype of the object.\n", | |
"\n", | |
"\n", | |
"- **empty** - Returns True if series is empty.\n", | |
"\n", | |
"\n", | |
"- **ndim** - Returns the number of dimensions of the underlying data, by definition 1.\n", | |
"\n", | |
"\n", | |
"- **size** - Returns the number of elements in the underlying data.\n", | |
"\n", | |
"\n", | |
"- **values** - Returns the Series as ndarray.\n", | |
"\n", | |
"\n", | |
"- **head()** - Returns the first n rows.\n", | |
"\n", | |
"\n", | |
"- **tail()** - Returns the last n rows.\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Dataframe basic functionality\n", | |
"\n", | |
"\n", | |
"\n", | |
"The following tables lists the important attributes or methods in Dataframe basic functionality.\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **T** - Transposes rows and columns.\n", | |
"\n", | |
"\n", | |
"- **axes** - Returns a list with the row axis labels and column axis labels as the only members.\n", | |
"\n", | |
"\n", | |
"- **dtypes** - Returns the dtypes in this object.\n", | |
"\n", | |
"\n", | |
"- **empty** - True if NDFrame is entirely empty [no items]; if any of the axes are of length 0.\n", | |
"\n", | |
"\n", | |
"- **ndim** - Number of axes / array dimensions.\n", | |
"\n", | |
"\n", | |
"- **shape** - Returns a tuple representing the dimensionality of the Dataframe.\n", | |
"\n", | |
"\n", | |
"- **size** - Number of elements in the NDFrame.\n", | |
"\n", | |
"\n", | |
"- **values** - Numpy representation of NDFrame.\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **head()** - Returns the first n rows.\n", | |
"\n", | |
"\n", | |
"- **tail()** - Returns last n rows." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 19. Descriptive statistics in pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"There exists a large number of methods for computing descriptive statistics and other related operations on Series, DataFrame, and Panel. Most of these are aggregations (hence producing a lower-dimensional result) like sum(), mean(), and quantile(), but some of them, like cumsum() and cumprod(), produce an object of the same size. Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, …}, but the axis can be specified by name or integer.\n", | |
"\n", | |
"\n", | |
"- Series: no axis argument needed.\n", | |
"\n", | |
"\n", | |
"- DataFrame: “index” (axis=0, default), “columns” (axis=1).\n", | |
"\n", | |
"\n", | |
"- Panel: “items” (axis=0), “major” (axis=1, default), “minor” (axis=2).\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Functions and description\n", | |
"\n", | |
"\n", | |
"The following table list down the important functions under Descriptive Statistics in Python Pandas. \n", | |
"\n", | |
"\n", | |
"\n", | |
"- 1 **count()** -\tNumber of non-null observations\n", | |
"\n", | |
"\n", | |
"- 2\t **sum()**\t - Sum of values\n", | |
"\n", | |
"\n", | |
"- 3\t **mean()** -\tMean of values\n", | |
"\n", | |
"\n", | |
"- 4\t **median()** -\tMedian of values\n", | |
"\n", | |
"\n", | |
"- 5\t **mode()** -\tMode of values\n", | |
"\n", | |
"\n", | |
"- 6\t **std()** -\tStandard deviation of the values\n", | |
"\n", | |
"\n", | |
"- 7\t **min()** -\tMinimum value\n", | |
"\n", | |
"\n", | |
"- 8\t **max()** -\tMaximum value\n", | |
"\n", | |
"\n", | |
"- 9\t **abs()** -\tAbsolute value\n", | |
"\n", | |
"\n", | |
"- 10 **prod()** -\tProduct of values\n", | |
"\n", | |
"\n", | |
"- 11 **cumsum()** -\tCumulative sum\n", | |
"\n", | |
"\n", | |
"- 12 **cumprod()** - Cumulative product\n", | |
"\n", | |
"\n", | |
"\n", | |
"The dataframe is a heterogeneous data structure. So, the different column values have different data types. Generic operations don't work with all functions.\n", | |
"\n", | |
"\n", | |
"Functions like **sum()**, **cumsum()** work with both numeric and character (or) string data elements without any error. \n", | |
"In practice, character aggregations are never used generally. These functions do not throw any exception.\n", | |
"\n", | |
"\n", | |
"Functions like **abs()**, **cumprod()** throw exception when the dataframe contains character or string data because such operations cannot be performed.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"User_ID 1006040\n", | |
"Product_ID P0099942\n", | |
"Gender M\n", | |
"Age 55+\n", | |
"Occupation 20\n", | |
"City_Category C\n", | |
"Stay_In_Current_City_Years 4+\n", | |
"Marital_Status 1\n", | |
"Product_Category_1 18\n", | |
"Product_Category_2 18\n", | |
"Product_Category_3 18\n", | |
"Purchase 23961\n", | |
"dtype: object" | |
] | |
}, | |
"execution_count": 51, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df4=df.copy()\n", | |
"\n", | |
"df4.max(0)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Summarizing data\n", | |
"\n", | |
"\n", | |
"\n", | |
"The **describe()** function computes the summary statistics of the numerical columns in the dataframe.\n", | |
"\n", | |
"\n", | |
"\n", | |
"This function gives the mean, std and IQR values. It excludes the character columns and gives summary about numeric columns. \n", | |
"It includes the argument which is used to pass necessary information regarding what columns need to be considered for summarizing. It takes the list of values; by default, 'number'.\n", | |
"\n", | |
"\n", | |
"- object − Summarizes string columns\n", | |
"\n", | |
"\n", | |
"- number − Summarizes numeric columns\n", | |
"\n", | |
"\n", | |
"- all − Summarizes all columns together" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>count</th>\n", | |
" <td>5.375770e+05</td>\n", | |
" <td>537577.00000</td>\n", | |
" <td>537577.000000</td>\n", | |
" <td>537577.000000</td>\n", | |
" <td>537577.000000</td>\n", | |
" <td>537577.000000</td>\n", | |
" <td>537577.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>mean</th>\n", | |
" <td>1.002992e+06</td>\n", | |
" <td>8.08271</td>\n", | |
" <td>0.408797</td>\n", | |
" <td>5.295546</td>\n", | |
" <td>9.922686</td>\n", | |
" <td>12.665430</td>\n", | |
" <td>9333.859853</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>std</th>\n", | |
" <td>1.714393e+03</td>\n", | |
" <td>6.52412</td>\n", | |
" <td>0.491612</td>\n", | |
" <td>3.750701</td>\n", | |
" <td>5.022222</td>\n", | |
" <td>4.127122</td>\n", | |
" <td>4981.022133</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>min</th>\n", | |
" <td>1.000001e+06</td>\n", | |
" <td>0.00000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>2.000000</td>\n", | |
" <td>3.000000</td>\n", | |
" <td>185.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25%</th>\n", | |
" <td>1.001495e+06</td>\n", | |
" <td>2.00000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>5.000000</td>\n", | |
" <td>9.000000</td>\n", | |
" <td>5866.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50%</th>\n", | |
" <td>1.003031e+06</td>\n", | |
" <td>7.00000</td>\n", | |
" <td>0.000000</td>\n", | |
" <td>5.000000</td>\n", | |
" <td>9.000000</td>\n", | |
" <td>14.000000</td>\n", | |
" <td>8062.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>75%</th>\n", | |
" <td>1.004417e+06</td>\n", | |
" <td>14.00000</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>8.000000</td>\n", | |
" <td>15.000000</td>\n", | |
" <td>16.000000</td>\n", | |
" <td>12073.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>max</th>\n", | |
" <td>1.006040e+06</td>\n", | |
" <td>20.00000</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>18.000000</td>\n", | |
" <td>18.000000</td>\n", | |
" <td>18.000000</td>\n", | |
" <td>23961.000000</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Occupation Marital_Status Product_Category_1 \\\n", | |
"count 5.375770e+05 537577.00000 537577.000000 537577.000000 \n", | |
"mean 1.002992e+06 8.08271 0.408797 5.295546 \n", | |
"std 1.714393e+03 6.52412 0.491612 3.750701 \n", | |
"min 1.000001e+06 0.00000 0.000000 1.000000 \n", | |
"25% 1.001495e+06 2.00000 0.000000 1.000000 \n", | |
"50% 1.003031e+06 7.00000 0.000000 5.000000 \n", | |
"75% 1.004417e+06 14.00000 1.000000 8.000000 \n", | |
"max 1.006040e+06 20.00000 1.000000 18.000000 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"count 537577.000000 537577.000000 537577.000000 \n", | |
"mean 9.922686 12.665430 9333.859853 \n", | |
"std 5.022222 4.127122 4981.022133 \n", | |
"min 2.000000 3.000000 185.000000 \n", | |
"25% 5.000000 9.000000 5866.000000 \n", | |
"50% 9.000000 14.000000 8062.000000 \n", | |
"75% 15.000000 16.000000 12073.000000 \n", | |
"max 18.000000 18.000000 23961.000000 " | |
] | |
}, | |
"execution_count": 52, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df4.describe()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 20. Statistical functions in pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"Statistical functions help us to understand and analyze the behavior of data. In this section, I will discuss few statistical functions, which we can apply on Pandas objects.\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Percent_change\n", | |
"\n", | |
"\n", | |
"Series, datFrames and panel, all have the function **pct_change()**. This function compares every element with its prior element and computes the change percentage.\n", | |
"\n", | |
"By default, the pct_change() operates on columns; if you want to apply the same row wise, then use axis=1() argument.\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Covariance\n", | |
"\n", | |
"\n", | |
"Covariance is applied on series data. The series object has a method **cov()** to compute covariance between series objects. NA values will be excluded automatically.\n", | |
"\n", | |
"\n", | |
"**Series.cov()** can be used to compute covariance between series (excluding missing values).\n", | |
"\n", | |
"\n", | |
"Analogously, **dataFrame.cov()** to compute pairwise covariances among the series in the dataFrame, also excluding NA/null values.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 53, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>User_ID</th>\n", | |
" <td>2.939142e+06</td>\n", | |
" <td>-257.522212</td>\n", | |
" <td>15.787429</td>\n", | |
" <td>23.708294</td>\n", | |
" <td>14.679685</td>\n", | |
" <td>14.455050</td>\n", | |
" <td>4.602301e+04</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Occupation</th>\n", | |
" <td>-2.575222e+02</td>\n", | |
" <td>42.564139</td>\n", | |
" <td>0.079192</td>\n", | |
" <td>-0.198560</td>\n", | |
" <td>-0.008768</td>\n", | |
" <td>0.112611</td>\n", | |
" <td>6.858232e+02</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Marital_Status</th>\n", | |
" <td>1.578743e+01</td>\n", | |
" <td>0.079192</td>\n", | |
" <td>0.241683</td>\n", | |
" <td>0.037884</td>\n", | |
" <td>0.037090</td>\n", | |
" <td>0.025665</td>\n", | |
" <td>3.159307e-01</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Product_Category_1</th>\n", | |
" <td>2.370829e+01</td>\n", | |
" <td>-0.198560</td>\n", | |
" <td>0.037884</td>\n", | |
" <td>14.067758</td>\n", | |
" <td>6.823287</td>\n", | |
" <td>0.988208</td>\n", | |
" <td>-5.868580e+03</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Product_Category_2</th>\n", | |
" <td>1.467968e+01</td>\n", | |
" <td>-0.008768</td>\n", | |
" <td>0.037090</td>\n", | |
" <td>6.823287</td>\n", | |
" <td>25.222716</td>\n", | |
" <td>4.828654</td>\n", | |
" <td>-3.899162e+03</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Product_Category_3</th>\n", | |
" <td>1.445505e+01</td>\n", | |
" <td>0.112611</td>\n", | |
" <td>0.025665</td>\n", | |
" <td>0.988208</td>\n", | |
" <td>4.828654</td>\n", | |
" <td>17.033138</td>\n", | |
" <td>5.542700e+01</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Purchase</th>\n", | |
" <td>4.602301e+04</td>\n", | |
" <td>685.823205</td>\n", | |
" <td>0.315931</td>\n", | |
" <td>-5868.580224</td>\n", | |
" <td>-3899.162103</td>\n", | |
" <td>55.427003</td>\n", | |
" <td>2.481058e+07</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Occupation Marital_Status \\\n", | |
"User_ID 2.939142e+06 -257.522212 15.787429 \n", | |
"Occupation -2.575222e+02 42.564139 0.079192 \n", | |
"Marital_Status 1.578743e+01 0.079192 0.241683 \n", | |
"Product_Category_1 2.370829e+01 -0.198560 0.037884 \n", | |
"Product_Category_2 1.467968e+01 -0.008768 0.037090 \n", | |
"Product_Category_3 1.445505e+01 0.112611 0.025665 \n", | |
"Purchase 4.602301e+04 685.823205 0.315931 \n", | |
"\n", | |
" Product_Category_1 Product_Category_2 \\\n", | |
"User_ID 23.708294 14.679685 \n", | |
"Occupation -0.198560 -0.008768 \n", | |
"Marital_Status 0.037884 0.037090 \n", | |
"Product_Category_1 14.067758 6.823287 \n", | |
"Product_Category_2 6.823287 25.222716 \n", | |
"Product_Category_3 0.988208 4.828654 \n", | |
"Purchase -5868.580224 -3899.162103 \n", | |
"\n", | |
" Product_Category_3 Purchase \n", | |
"User_ID 14.455050 4.602301e+04 \n", | |
"Occupation 0.112611 6.858232e+02 \n", | |
"Marital_Status 0.025665 3.159307e-01 \n", | |
"Product_Category_1 0.988208 -5.868580e+03 \n", | |
"Product_Category_2 4.828654 -3.899162e+03 \n", | |
"Product_Category_3 17.033138 5.542700e+01 \n", | |
"Purchase 55.427003 2.481058e+07 " | |
] | |
}, | |
"execution_count": 53, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df5=df.copy()\n", | |
"\n", | |
"\n", | |
"# view the covariance\n", | |
"\n", | |
"df5.cov()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Correlation\n", | |
"\n", | |
"\n", | |
"**Correlation** shows the linear relationship between any two array of values (series). There are multiple methods to compute the correlation. These methods are listed below:-\n", | |
"\n", | |
"\n", | |
"\n", | |
"**Method name** \t **Description**\n", | |
"\n", | |
"\n", | |
"- pearson (default)\t- Standard correlation coefficient\n", | |
"\n", | |
"\n", | |
"- kendall - Kendall Tau correlation coefficient\n", | |
"\n", | |
"\n", | |
"- spearman\t - Spearman rank correlation coefficient\n", | |
"\n", | |
"\n", | |
"\n", | |
"All of these are currently computed using pairwise complete observations.\n", | |
"\n", | |
"\n", | |
"Any non-numeric columns will be automatically excluded from the correlation calculation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 54, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>User_ID</th>\n", | |
" <td>1.000000</td>\n", | |
" <td>-0.023024</td>\n", | |
" <td>0.018732</td>\n", | |
" <td>0.003687</td>\n", | |
" <td>0.001705</td>\n", | |
" <td>0.002043</td>\n", | |
" <td>0.005389</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Occupation</th>\n", | |
" <td>-0.023024</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.024691</td>\n", | |
" <td>-0.008114</td>\n", | |
" <td>-0.000268</td>\n", | |
" <td>0.004182</td>\n", | |
" <td>0.021104</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Marital_Status</th>\n", | |
" <td>0.018732</td>\n", | |
" <td>0.024691</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.020546</td>\n", | |
" <td>0.015022</td>\n", | |
" <td>0.012650</td>\n", | |
" <td>0.000129</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Product_Category_1</th>\n", | |
" <td>0.003687</td>\n", | |
" <td>-0.008114</td>\n", | |
" <td>0.020546</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.362231</td>\n", | |
" <td>0.063839</td>\n", | |
" <td>-0.314125</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Product_Category_2</th>\n", | |
" <td>0.001705</td>\n", | |
" <td>-0.000268</td>\n", | |
" <td>0.015022</td>\n", | |
" <td>0.362231</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.232961</td>\n", | |
" <td>-0.155868</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Product_Category_3</th>\n", | |
" <td>0.002043</td>\n", | |
" <td>0.004182</td>\n", | |
" <td>0.012650</td>\n", | |
" <td>0.063839</td>\n", | |
" <td>0.232961</td>\n", | |
" <td>1.000000</td>\n", | |
" <td>0.002696</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Purchase</th>\n", | |
" <td>0.005389</td>\n", | |
" <td>0.021104</td>\n", | |
" <td>0.000129</td>\n", | |
" <td>-0.314125</td>\n", | |
" <td>-0.155868</td>\n", | |
" <td>0.002696</td>\n", | |
" <td>1.000000</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Occupation Marital_Status Product_Category_1 \\\n", | |
"User_ID 1.000000 -0.023024 0.018732 0.003687 \n", | |
"Occupation -0.023024 1.000000 0.024691 -0.008114 \n", | |
"Marital_Status 0.018732 0.024691 1.000000 0.020546 \n", | |
"Product_Category_1 0.003687 -0.008114 0.020546 1.000000 \n", | |
"Product_Category_2 0.001705 -0.000268 0.015022 0.362231 \n", | |
"Product_Category_3 0.002043 0.004182 0.012650 0.063839 \n", | |
"Purchase 0.005389 0.021104 0.000129 -0.314125 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"User_ID 0.001705 0.002043 0.005389 \n", | |
"Occupation -0.000268 0.004182 0.021104 \n", | |
"Marital_Status 0.015022 0.012650 0.000129 \n", | |
"Product_Category_1 0.362231 0.063839 -0.314125 \n", | |
"Product_Category_2 1.000000 0.232961 -0.155868 \n", | |
"Product_Category_3 0.232961 1.000000 0.002696 \n", | |
"Purchase -0.155868 0.002696 1.000000 " | |
] | |
}, | |
"execution_count": 54, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# view the correlation\n", | |
"\n", | |
"df5.corr()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Data Ranking\n", | |
"\n", | |
"\n", | |
"Data Ranking produces ranking for each element in the array of elements. In case of ties, assigns the mean rank. \n", | |
"\n", | |
"\n", | |
"The **rank()** method produces a data ranking with ties being assigned the mean of the ranks (by default) for the group.\n", | |
"\n", | |
"\n", | |
"The **rank()** is also a dataframe method and can rank either the rows (axis=0) or the columns (axis=1). NaN values are excluded from the ranking.\n", | |
"\n", | |
"\n", | |
"It optionally takes a parameter ascending which true by default. If it is set to false, data is ranked in descending order, with larger values assigned a smaller rank.\n", | |
"\n", | |
"\n", | |
"The **rank()** supports different tie-breaking methods, specified with the method parameter as follows:-\n", | |
"\n", | |
"\n", | |
"- **average** - average rank of tied group\n", | |
"\n", | |
"\n", | |
"- **min** - lowest rank in the group\n", | |
"\n", | |
"\n", | |
"- **max** - highest rank in the group\n", | |
"\n", | |
"\n", | |
"- **first** - ranks assigned in the order they appear in the array" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 55, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>7.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>7.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>7.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.5</td>\n", | |
" <td>4.5</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.5</td>\n", | |
" <td>3.5</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.5</td>\n", | |
" <td>1.5</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.5</td>\n", | |
" <td>1.5</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.5</td>\n", | |
" <td>1.5</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.5</td>\n", | |
" <td>1.5</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>7.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>7.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>7.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>7.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>7.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>7.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1.5</td>\n", | |
" <td>1.5</td>\n", | |
" <td>3.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>7.0</td>\n", | |
" <td>3.0</td>\n", | |
" <td>1.0</td>\n", | |
" <td>2.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>7.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>1.5</td>\n", | |
" <td>1.5</td>\n", | |
" <td>3.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>6.0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Occupation Marital_Status Product_Category_1 \\\n", | |
"0 7.0 4.0 1.0 2.0 \n", | |
"1 7.0 4.0 1.0 2.0 \n", | |
"2 7.0 3.0 1.0 4.0 \n", | |
"3 7.0 2.0 1.0 3.0 \n", | |
"4 7.0 5.0 1.0 2.0 \n", | |
"5 7.0 5.0 1.0 2.0 \n", | |
"6 7.0 3.0 1.5 1.5 \n", | |
"7 7.0 3.0 1.5 1.5 \n", | |
"8 7.0 3.0 1.5 1.5 \n", | |
"9 7.0 5.0 1.0 2.0 \n", | |
"10 7.0 5.0 1.0 2.0 \n", | |
"11 7.0 5.0 1.0 2.0 \n", | |
"12 7.0 5.0 1.0 2.0 \n", | |
"13 7.0 5.0 1.5 1.5 \n", | |
"14 7.0 4.0 1.0 2.0 \n", | |
"15 7.0 4.0 1.0 2.0 \n", | |
"16 7.0 5.0 1.0 2.0 \n", | |
"17 7.0 4.0 1.0 3.0 \n", | |
"18 7.0 2.0 2.0 2.0 \n", | |
"19 7.0 4.0 1.5 1.5 \n", | |
"20 7.0 3.0 1.0 2.0 \n", | |
"21 7.0 3.0 1.0 2.0 \n", | |
"22 7.0 3.0 1.0 2.0 \n", | |
"23 7.0 3.0 1.0 2.0 \n", | |
"24 7.0 4.0 1.5 1.5 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 3.0 5.0 6.0 \n", | |
"1 3.0 5.0 6.0 \n", | |
"2 2.0 5.0 6.0 \n", | |
"3 4.5 4.5 6.0 \n", | |
"4 3.5 3.5 6.0 \n", | |
"5 3.0 4.0 6.0 \n", | |
"6 4.0 5.0 6.0 \n", | |
"7 4.0 5.0 6.0 \n", | |
"8 4.0 5.0 6.0 \n", | |
"9 3.0 4.0 6.0 \n", | |
"10 3.0 4.0 6.0 \n", | |
"11 3.0 4.0 6.0 \n", | |
"12 3.0 4.0 6.0 \n", | |
"13 3.0 4.0 6.0 \n", | |
"14 3.0 5.0 6.0 \n", | |
"15 3.0 5.0 6.0 \n", | |
"16 3.0 4.0 6.0 \n", | |
"17 5.0 2.0 6.0 \n", | |
"18 4.0 5.0 6.0 \n", | |
"19 3.0 5.0 6.0 \n", | |
"20 4.0 5.0 6.0 \n", | |
"21 4.0 5.0 6.0 \n", | |
"22 4.0 5.0 6.0 \n", | |
"23 4.0 5.0 6.0 \n", | |
"24 3.0 5.0 6.0 " | |
] | |
}, | |
"execution_count": 55, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# view the top 25 rows of ranked dataframe\n", | |
"\n", | |
"df5.rank(1).head(25)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Common statistical functions\n", | |
"\n", | |
"\n", | |
"There are a number of common statistical functions. These are listed below:-\n", | |
"\n", | |
"\n", | |
"\n", | |
"**Method** - **Description**\n", | |
"\n", | |
"\n", | |
"- **count()** - Number of non-null observations\n", | |
"\n", | |
"\n", | |
"- **sum()** - Sum of values\n", | |
"\n", | |
"\n", | |
"- **mean()** - Mean of values\n", | |
"\n", | |
"\n", | |
"- **median()** - Arithmetic median of values\n", | |
"\n", | |
"\n", | |
"- **min()** -\tMinimum\n", | |
"\n", | |
"\n", | |
"- **max()** - \tMaximum\n", | |
"\n", | |
"\n", | |
"- **std()** -\tStandard deviation\n", | |
"\n", | |
"\n", | |
"- **var()** -\tVariance\n", | |
"\n", | |
"\n", | |
"- **skew()** -\tSkewness\n", | |
"\n", | |
"\n", | |
"- **kurt()** -\tKurtosis\n", | |
"\n", | |
"\n", | |
"- **quantile()** -\tQuantile\n", | |
"\n", | |
"\n", | |
"- **apply()** -\tGeneric apply\n", | |
"\n", | |
"\n", | |
"- **cov()** -\tCovariance\n", | |
"\n", | |
"\n", | |
"- **corr()** -\tCorrelation\n", | |
"\n", | |
"\n", | |
"\n", | |
"The **apply()** function takes an extra **func** argument and performs generic rolling computations. The **func** argument should be a single function that produces a single value from an ndarray input." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 21. Window functions in pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"For working with numerical data, Pandas provide few variants like **rolling**, **expanding** and **exponentially moving weights** for window statistics. \n", | |
"\n", | |
"\n", | |
"Among these are **count**, **sum**, **mean**, **median**, **correlation**, **variance**, **covariance**, **standard deviation**, **skewness** and **kurtosis**.\n", | |
"\n", | |
"\n", | |
"The **rolling()** and **expanding()** functions can be used directly from DataFrameGroupBy objects.\n", | |
"\n", | |
"\n", | |
"In this section, we work with rolling, expanding and exponentially weighted data through the corresponding objects, **Rolling**, **Expanding** and **EWM**." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### .rolling() function\n", | |
"\n", | |
"\n", | |
"This function can be applied on a series of data. Specify the **window=n** argument and apply the appropriate statistical function on top of it.\n", | |
"\n", | |
"\n", | |
"`df6=df.copy()`\n", | |
"\n", | |
"\n", | |
"`df6.rolling(window=3).mean()`\n", | |
"\n", | |
"\n", | |
"Since the window size is 3, for first two elements there are nulls and from third the value will be the average of the n, n-1 and n-2 elements. We can also apply various functions.\n", | |
"\n", | |
"\n", | |
"\n", | |
"### .expanding() function\n", | |
"\n", | |
"\n", | |
"This function can be applied on a series of data. We specify the **min_periods=n** argument and apply the appropriate statistical function on top of it.\n", | |
"\n", | |
"\n", | |
"`df6.expanding(min_periods=3).mean()`\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### .ewm() function\n", | |
"\n", | |
"\n", | |
"**ewm** is applied on a series of data. We have to specify any of the com, span, halflife argument and apply the appropriate statistical function on top of it. It assigns the weights exponentially.\n", | |
"\n", | |
"\n", | |
"`df6.ewm(com=0.5).mean()`\n", | |
"\n", | |
"\n", | |
"\n", | |
"Window functions are used in finding the trends within the data graphically by smoothing the curve. If there is a lot of variation in the data, then we can apply window functions to smooth out the curve or the trend.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 21. Aggregations in pandas\n", | |
"\n", | |
"\n", | |
"Once the rolling, expanding and ewm objects are created, several methods are available to perform aggregations on data.\n", | |
"\n", | |
"\n", | |
"### Apply aggregation on a whole dataframe\n", | |
"\n", | |
"\n", | |
"`df6=df.copy`\n", | |
"\n", | |
"\n", | |
"`df6.aggregate(np.sum)`\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Apply aggregation on a single column of a dataframe\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 56, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"5017668378" | |
] | |
}, | |
"execution_count": 56, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df6=df.copy()\n", | |
"\n", | |
"\n", | |
"df6['Purchase'].aggregate(np.sum)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Apply multiple functions on a single column of a dataframe" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"sum 5.017668e+09\n", | |
"mean 9.333860e+03\n", | |
"Name: Purchase, dtype: float64" | |
] | |
}, | |
"execution_count": 57, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df6['Purchase'].aggregate([np.sum, np.mean])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Apply aggregation on multiple columns of a dataframe" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Product_Category_1 5.295546\n", | |
"Product_Category_2 9.922686\n", | |
"Product_Category_3 12.665430\n", | |
"dtype: float64" | |
] | |
}, | |
"execution_count": 58, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df6[['Product_Category_1', 'Product_Category_2', 'Product_Category_3']].aggregate(np.mean)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Apply multiple functions on multiple columns of a dataframe" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>sum</th>\n", | |
" <td>2.846764e+06</td>\n", | |
" <td>5.334208e+06</td>\n", | |
" <td>6.808644e+06</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>mean</th>\n", | |
" <td>5.295546e+00</td>\n", | |
" <td>9.922686e+00</td>\n", | |
" <td>1.266543e+01</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Product_Category_1 Product_Category_2 Product_Category_3\n", | |
"sum 2.846764e+06 5.334208e+06 6.808644e+06\n", | |
"mean 5.295546e+00 9.922686e+00 1.266543e+01" | |
] | |
}, | |
"execution_count": 59, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df6[['Product_Category_1', 'Product_Category_2', 'Product_Category_3']].aggregate([np.sum, np.mean])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Apply different functions to different columns of a dataframe" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 60, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Product_Category_1 2.846764e+06\n", | |
"Product_Category_2 9.922686e+00\n", | |
"dtype: float64" | |
] | |
}, | |
"execution_count": 60, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df6.aggregate({'Product_Category_1' : np.sum ,'Product_Category_2' : np.mean})" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 23. Iteration in pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"The behavior of basic iteration over Pandas objects depends on the type. When iterating over a Series, it is regarded as \n", | |
"array-like, and basic iteration produces the values. Other data structures, like DataFrame and Panel, follow the **dict-like** convention of iterating over the **keys** of the objects.\n", | |
"\n", | |
"\n", | |
"\n", | |
"Iterating a dataframe gives column names.\n", | |
"\n", | |
"\n", | |
"\n", | |
"To iterate over the rows of the DataFrame, we can use the following functions −\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **iteritems()** − to iterate over the (key,value) pairs\n", | |
"\n", | |
"\n", | |
"- **iterrows()** − iterate over the rows as (index,series) pairs\n", | |
"\n", | |
"\n", | |
"- **itertuples()** − iterate over the rows as namedtuples" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 24. Function application in pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"There are three important methods that enable us to apply our own or another library's functions to pandas objects. These methods differentiate on their scope of usage. These functions expect to operate on an entire dataframe, row- or column-wise\n", | |
"operation, or element wise operation. These methods are described below:-\n", | |
"\n", | |
"\n", | |
"- Table wise Function Application: **pipe()**\n", | |
"\n", | |
"\n", | |
"\n", | |
"- Row or Column Wise Function Application: **apply()**\n", | |
"\n", | |
"\n", | |
"\n", | |
"- Element wise Function Application: **applymap()**\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Table-wise Function Application:pipe()\n", | |
"\n", | |
"\n", | |
"Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. Thus, operation is performed on the whole DataFrame.\n", | |
"\n", | |
"\n", | |
"For example, if we want to add a value 10 to all the elements in the DataFrame. Then, we can make use of **pipe()** function \n", | |
"as follows:-\n", | |
"\n", | |
"\n", | |
"`def addten(x1,x2):`\n", | |
"\n", | |
"\n", | |
" `return x1+x2`\n", | |
" \n", | |
"\n", | |
"`df7=df.copy()` \n", | |
"\n", | |
"\n", | |
"`df7.pipe(addten,10)`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Row or Column Wise Function Application: apply()\n", | |
"\n", | |
"\n", | |
"Arbitrary functions can be applied along the axes of a DataFrame or Panel using the **apply()** method. It takes an optional axis argument. By default, the operation performs column wise, taking each column as an array-like.\n", | |
"\n", | |
"\n", | |
"`df7.apply(np.mean)`\n", | |
"\n", | |
"\n", | |
"By passing axis parameter, operations can be performed row wise.\n", | |
"\n", | |
"\n", | |
"`df7.apply(np.mean,axis=1)`\n", | |
"\n", | |
"\n", | |
"`df.apply(lambda x: x.max() - x.min())`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Element Wise Function Application: applymap()\n", | |
"\n", | |
"\n", | |
"\n", | |
"The methods **applymap()** on dataframe and analogously **map()** on series accept any Python function. It takes a single value and returns a single value.\n", | |
"\n", | |
"\n", | |
"`df7.applymap(lambda x:x*100)`\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 25. Pandas GroupBy operations\n", | |
"\n", | |
"\n", | |
"A groupby operation involves one of the following operations on the original object. They are as follows :−\n", | |
"\n", | |
"\n", | |
"- **Splitting** the Object\n", | |
"\n", | |
"\n", | |
"- **Applying** a function\n", | |
"\n", | |
"\n", | |
"- **Combining** the results\n", | |
"\n", | |
"\n", | |
"\n", | |
"The split step is the most straightforward out of these. In many situations, we may wish to split the data set into groups \n", | |
"and perform operations on those groups.\n", | |
"\n", | |
"\n", | |
"In the apply functionality, we can perform the following operations :−\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **Aggregation** − compute a summary statistic (or statistics) for each group. Some examples are :- \n", | |
"\n", | |
" - Compute group sums or means. \n", | |
" \n", | |
" - Compute group sizes / counts.\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **Transformation** − perform some group-specific computations and return a like-indexed object. Some examples are :-\n", | |
"\n", | |
" - Standardize data (zscore) within a group.\n", | |
" \n", | |
" - Filling NAs within groups with a value derived from each group.\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **Filtration** − discarding the data with some condition. Some examples are :-\n", | |
"\n", | |
" - Discard data that belongs to groups with only a few members.\n", | |
" \n", | |
" - Filter out data based on the group sum or mean.\n", | |
" \n", | |
" \n", | |
" \n", | |
"- Some combination of the above: **GroupBy** will examine the results of the apply step and try to return a sensibly combined result if it doesn't fit into either of the above two categories." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Split Data into Groups\n", | |
"\n", | |
"\n", | |
"Pandas object can be split into any of their objects. There are multiple ways to split an object as follows :-\n", | |
"\n", | |
"\n", | |
"- obj.groupby('key')\n", | |
"\n", | |
"\n", | |
"- obj.groupby(['key1','key2'])\n", | |
"\n", | |
"\n", | |
"- obj.groupby(key,axis=1)\n", | |
"\n", | |
"\n", | |
"The following example illustrates the idea:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 61, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x00000088672ECFD0>" | |
] | |
}, | |
"execution_count": 61, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df8=df.copy()\n", | |
"\n", | |
"df8.groupby('Gender')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 62, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'F': Int64Index([ 0, 1, 2, 3, 14, 15, 16, 17,\n", | |
" 29, 30,\n", | |
" ...\n", | |
" 537467, 537468, 537469, 537470, 537471, 537472, 537473, 537474,\n", | |
" 537475, 537476],\n", | |
" dtype='int64', length=132197),\n", | |
" 'M': Int64Index([ 4, 5, 6, 7, 8, 9, 10, 11,\n", | |
" 12, 13,\n", | |
" ...\n", | |
" 537567, 537568, 537569, 537570, 537571, 537572, 537573, 537574,\n", | |
" 537575, 537576],\n", | |
" dtype='int64', length=405380)}" | |
] | |
}, | |
"execution_count": 62, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# view groups of Gender column\n", | |
"\n", | |
"df8.groupby('Gender').groups" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Group by with multiple columns\n", | |
"\n", | |
"\n", | |
"`df8.groupby(['Gender', 'Age']).groups`\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Iterate through groups\n", | |
"\n", | |
"\n", | |
"With the groupby object in hand, we can iterate through the object similar to itertools.obj.\n", | |
"\n", | |
"\n", | |
"`df8_grouped = df8.groupby('Gender')`\n", | |
"\n", | |
"\n", | |
"`for Age, Occupation in df8_grouped:`\n", | |
"\n", | |
" `print Age`\n", | |
" \n", | |
" `print Occupation`\n", | |
" \n", | |
" \n", | |
" \n", | |
" \n", | |
" \n", | |
"### Select a group with get_group() method\n", | |
"\n", | |
"\n", | |
"Using the **get_group()** method, we can select a single group.\n", | |
"\n", | |
"\n", | |
"`df8_grouped = df8.groupby('City_Category')`\n", | |
"\n", | |
"\n", | |
"`print(df8_grouped.get_group('A')`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Aggregation functions with groupby\n", | |
"\n", | |
"\n", | |
"An aggregation function returns a single aggregated value for each group. Once the group by object is created, several aggregation operations can be performed on the grouped data as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Gender</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>F</th>\n", | |
" <td>132605172667</td>\n", | |
" <td>891361</td>\n", | |
" <td>55223</td>\n", | |
" <td>739701</td>\n", | |
" <td>1329034.0</td>\n", | |
" <td>1658809.0</td>\n", | |
" <td>1164624021</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>M</th>\n", | |
" <td>406580175483</td>\n", | |
" <td>3453718</td>\n", | |
" <td>164537</td>\n", | |
" <td>2107063</td>\n", | |
" <td>4005174.0</td>\n", | |
" <td>5149835.0</td>\n", | |
" <td>3853044357</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Occupation Marital_Status Product_Category_1 \\\n", | |
"Gender \n", | |
"F 132605172667 891361 55223 739701 \n", | |
"M 406580175483 3453718 164537 2107063 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"Gender \n", | |
"F 1329034.0 1658809.0 1164624021 \n", | |
"M 4005174.0 5149835.0 3853044357 " | |
] | |
}, | |
"execution_count": 63, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# apply aggregation function sum with groupby\n", | |
"\n", | |
"df8.groupby('Gender').sum()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 64, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Gender</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>F</th>\n", | |
" <td>132605172667</td>\n", | |
" <td>891361</td>\n", | |
" <td>55223</td>\n", | |
" <td>739701</td>\n", | |
" <td>1329034.0</td>\n", | |
" <td>1658809.0</td>\n", | |
" <td>1164624021</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>M</th>\n", | |
" <td>406580175483</td>\n", | |
" <td>3453718</td>\n", | |
" <td>164537</td>\n", | |
" <td>2107063</td>\n", | |
" <td>4005174.0</td>\n", | |
" <td>5149835.0</td>\n", | |
" <td>3853044357</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Occupation Marital_Status Product_Category_1 \\\n", | |
"Gender \n", | |
"F 132605172667 891361 55223 739701 \n", | |
"M 406580175483 3453718 164537 2107063 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"Gender \n", | |
"F 1329034.0 1658809.0 1164624021 \n", | |
"M 4005174.0 5149835.0 3853044357 " | |
] | |
}, | |
"execution_count": 64, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# alternative way to apply aggregation function sum\n", | |
"\n", | |
"df8.groupby('Gender').agg(np.sum)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Another way to see the size of each group is by applying the **size()** function as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 65, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" User_ID Product_ID Age Occupation City_Category \\\n", | |
"Gender \n", | |
"F 132197 132197 132197 132197 132197 \n", | |
"M 405380 405380 405380 405380 405380 \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"Gender \n", | |
"F 132197 132197 132197 \n", | |
"M 405380 405380 405380 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"Gender \n", | |
"F 132197.0 132197.0 132197 \n", | |
"M 405380.0 405380.0 405380 \n" | |
] | |
} | |
], | |
"source": [ | |
"# attribute access in python pandas\n", | |
"\n", | |
"df8_grouped = df8.groupby('Gender')\n", | |
"\n", | |
"print(df8_grouped.agg(np.size))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Applying multiple aggregation functions at once\n", | |
"\n", | |
"\n", | |
"With grouped Series, you can also pass a list or dict of functions to do aggregation with, and generate DataFrame as output as \n", | |
"follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 66, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>sum</th>\n", | |
" <th>mean</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Gender</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>F</th>\n", | |
" <td>1164624021</td>\n", | |
" <td>8809.761349</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>M</th>\n", | |
" <td>3853044357</td>\n", | |
" <td>9504.771713</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" sum mean\n", | |
"Gender \n", | |
"F 1164624021 8809.761349\n", | |
"M 3853044357 9504.771713" | |
] | |
}, | |
"execution_count": 66, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df8.groupby('Gender')['Purchase'].agg([np.sum, np.mean])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Transformations\n", | |
"\n", | |
"\n", | |
"Transformation on a group or a column returns an object that is indexed the same size of that is being grouped. \n", | |
"Thus, the transform should return a result that is the same size as that of a group chunk." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 67, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0 -0.931414\n", | |
"1 13.534512\n", | |
"2 -15.647263\n", | |
"3 -16.420332\n", | |
"4 -3.040496\n", | |
"Name: Purchase, dtype: float64\n" | |
] | |
} | |
], | |
"source": [ | |
"df9=df.copy()\n", | |
"\n", | |
"\n", | |
"score = lambda x: (x - x.mean()) / x.std()*10\n", | |
"\n", | |
"\n", | |
"print(df9.groupby('Gender')['Purchase'].transform(score).head(5))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Filtration\n", | |
"\n", | |
"\n", | |
"Filtration filters the data on a defined criteria and returns the subset of data. The **filter()** function is used to filter the data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 68, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Stay_In_Current_City_Years</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" <th>Purchase</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00069042</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>3</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8370</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00248942</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15200</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00087842</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>6.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1422</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00085442</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>2</td>\n", | |
" <td>0</td>\n", | |
" <td>12</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1057</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1000002</td>\n", | |
" <td>P00285442</td>\n", | |
" <td>M</td>\n", | |
" <td>55+</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7969</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>1000003</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>A</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15227</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P00184942</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>19215</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P00346142</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>15854</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P0097242</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>2</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>15686</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00274942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>7871</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00251242</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>5254</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00014542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>3957</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00031342</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>11.0</td>\n", | |
" <td>17.0</td>\n", | |
" <td>6073</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00145042</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>15665</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00231342</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5378</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00190242</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>4</td>\n", | |
" <td>5.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>2079</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P0096642</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>2</td>\n", | |
" <td>3.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>13055</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>1000006</td>\n", | |
" <td>P00058442</td>\n", | |
" <td>F</td>\n", | |
" <td>51-55</td>\n", | |
" <td>9</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>4.0</td>\n", | |
" <td>8851</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>1000007</td>\n", | |
" <td>P00036842</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>14.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>11788</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00249542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>19614</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00220442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>8584</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00156442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>9872</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00213742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>9743</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00214442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>5982</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>1000008</td>\n", | |
" <td>P00303442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>12</td>\n", | |
" <td>C</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11927</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00135742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>6</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>16662</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>26</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00039942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5887</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>27</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00161442</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6973</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>28</th>\n", | |
" <td>1000009</td>\n", | |
" <td>P00078742</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>17</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5391</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>29</th>\n", | |
" <td>1000010</td>\n", | |
" <td>P00085942</td>\n", | |
" <td>F</td>\n", | |
" <td>36-45</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>4+</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>4.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16352</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>...</th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537547</th>\n", | |
" <td>1004733</td>\n", | |
" <td>P00244042</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>18</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>11543</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537548</th>\n", | |
" <td>1004734</td>\n", | |
" <td>P00111042</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>15</td>\n", | |
" <td>2.0</td>\n", | |
" <td>15.0</td>\n", | |
" <td>20924</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537549</th>\n", | |
" <td>1004734</td>\n", | |
" <td>P00345842</td>\n", | |
" <td>M</td>\n", | |
" <td>51-55</td>\n", | |
" <td>1</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>13082</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537550</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00278242</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>8.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11658</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537551</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00313442</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>6.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>6863</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537552</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P0098642</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>6</td>\n", | |
" <td>8.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16415</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537553</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00119342</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>10</td>\n", | |
" <td>13.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>18526</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537554</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00114042</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7099</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537555</th>\n", | |
" <td>1004735</td>\n", | |
" <td>P00135142</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>3</td>\n", | |
" <td>C</td>\n", | |
" <td>3</td>\n", | |
" <td>0</td>\n", | |
" <td>13</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>578</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537556</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00194542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>2183</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537557</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00175242</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>2</td>\n", | |
" <td>14.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>12724</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537558</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00101942</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>17.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7796</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537559</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00109142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>17.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7770</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537560</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00084842</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>5940</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537561</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00078142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8.0</td>\n", | |
" <td>7834</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537562</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00146742</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>13.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>11508</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537563</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00154642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>13.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>6074</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537564</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00117442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7084</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537565</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00051142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7934</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537566</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00048742</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5350</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537567</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00157542</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>8</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>1994</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537568</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00250642</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>11</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>5930</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537569</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00023142</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>5</td>\n", | |
" <td>14.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>7042</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537570</th>\n", | |
" <td>1004736</td>\n", | |
" <td>P00162442</td>\n", | |
" <td>M</td>\n", | |
" <td>18-25</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>16.0</td>\n", | |
" <td>14.0</td>\n", | |
" <td>15491</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537571</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00221442</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>11852</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537572</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>2.0</td>\n", | |
" <td>5.0</td>\n", | |
" <td>11664</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537573</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00111142</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>19196</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537574</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00345942</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>8</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>8043</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537575</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00285842</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>15.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>7172</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>537576</th>\n", | |
" <td>1004737</td>\n", | |
" <td>P00118242</td>\n", | |
" <td>M</td>\n", | |
" <td>36-45</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>5</td>\n", | |
" <td>8.0</td>\n", | |
" <td>16.0</td>\n", | |
" <td>6875</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>537577 rows × 12 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category \\\n", | |
"0 1000001 P00069042 F 0-17 10 A \n", | |
"1 1000001 P00248942 F 0-17 10 A \n", | |
"2 1000001 P00087842 F 0-17 10 A \n", | |
"3 1000001 P00085442 F 0-17 10 A \n", | |
"4 1000002 P00285442 M 55+ 16 C \n", | |
"5 1000003 P00193542 M 26-35 15 A \n", | |
"6 1000004 P00184942 M 46-50 7 B \n", | |
"7 1000004 P00346142 M 46-50 7 B \n", | |
"8 1000004 P0097242 M 46-50 7 B \n", | |
"9 1000005 P00274942 M 26-35 20 A \n", | |
"10 1000005 P00251242 M 26-35 20 A \n", | |
"11 1000005 P00014542 M 26-35 20 A \n", | |
"12 1000005 P00031342 M 26-35 20 A \n", | |
"13 1000005 P00145042 M 26-35 20 A \n", | |
"14 1000006 P00231342 F 51-55 9 A \n", | |
"15 1000006 P00190242 F 51-55 9 A \n", | |
"16 1000006 P0096642 F 51-55 9 A \n", | |
"17 1000006 P00058442 F 51-55 9 A \n", | |
"18 1000007 P00036842 M 36-45 1 B \n", | |
"19 1000008 P00249542 M 26-35 12 C \n", | |
"20 1000008 P00220442 M 26-35 12 C \n", | |
"21 1000008 P00156442 M 26-35 12 C \n", | |
"22 1000008 P00213742 M 26-35 12 C \n", | |
"23 1000008 P00214442 M 26-35 12 C \n", | |
"24 1000008 P00303442 M 26-35 12 C \n", | |
"25 1000009 P00135742 M 26-35 17 C \n", | |
"26 1000009 P00039942 M 26-35 17 C \n", | |
"27 1000009 P00161442 M 26-35 17 C \n", | |
"28 1000009 P00078742 M 26-35 17 C \n", | |
"29 1000010 P00085942 F 36-45 1 B \n", | |
"... ... ... ... ... ... ... \n", | |
"537547 1004733 P00244042 M 18-25 18 C \n", | |
"537548 1004734 P00111042 M 51-55 1 B \n", | |
"537549 1004734 P00345842 M 51-55 1 B \n", | |
"537550 1004735 P00278242 M 46-50 3 C \n", | |
"537551 1004735 P00313442 M 46-50 3 C \n", | |
"537552 1004735 P0098642 M 46-50 3 C \n", | |
"537553 1004735 P00119342 M 46-50 3 C \n", | |
"537554 1004735 P00114042 M 46-50 3 C \n", | |
"537555 1004735 P00135142 M 46-50 3 C \n", | |
"537556 1004736 P00194542 M 18-25 20 A \n", | |
"537557 1004736 P00175242 M 18-25 20 A \n", | |
"537558 1004736 P00101942 M 18-25 20 A \n", | |
"537559 1004736 P00109142 M 18-25 20 A \n", | |
"537560 1004736 P00084842 M 18-25 20 A \n", | |
"537561 1004736 P00078142 M 18-25 20 A \n", | |
"537562 1004736 P00146742 M 18-25 20 A \n", | |
"537563 1004736 P00154642 M 18-25 20 A \n", | |
"537564 1004736 P00117442 M 18-25 20 A \n", | |
"537565 1004736 P00051142 M 18-25 20 A \n", | |
"537566 1004736 P00048742 M 18-25 20 A \n", | |
"537567 1004736 P00157542 M 18-25 20 A \n", | |
"537568 1004736 P00250642 M 18-25 20 A \n", | |
"537569 1004736 P00023142 M 18-25 20 A \n", | |
"537570 1004736 P00162442 M 18-25 20 A \n", | |
"537571 1004737 P00221442 M 36-45 16 C \n", | |
"537572 1004737 P00193542 M 36-45 16 C \n", | |
"537573 1004737 P00111142 M 36-45 16 C \n", | |
"537574 1004737 P00345942 M 36-45 16 C \n", | |
"537575 1004737 P00285842 M 36-45 16 C \n", | |
"537576 1004737 P00118242 M 36-45 16 C \n", | |
"\n", | |
" Stay_In_Current_City_Years Marital_Status Product_Category_1 \\\n", | |
"0 2 0 3 \n", | |
"1 2 0 1 \n", | |
"2 2 0 12 \n", | |
"3 2 0 12 \n", | |
"4 4+ 0 8 \n", | |
"5 3 0 1 \n", | |
"6 2 1 1 \n", | |
"7 2 1 1 \n", | |
"8 2 1 1 \n", | |
"9 1 1 8 \n", | |
"10 1 1 5 \n", | |
"11 1 1 8 \n", | |
"12 1 1 8 \n", | |
"13 1 1 1 \n", | |
"14 1 0 5 \n", | |
"15 1 0 4 \n", | |
"16 1 0 2 \n", | |
"17 1 0 5 \n", | |
"18 1 1 1 \n", | |
"19 4+ 1 1 \n", | |
"20 4+ 1 5 \n", | |
"21 4+ 1 8 \n", | |
"22 4+ 1 8 \n", | |
"23 4+ 1 8 \n", | |
"24 4+ 1 1 \n", | |
"25 0 0 6 \n", | |
"26 0 0 8 \n", | |
"27 0 0 5 \n", | |
"28 0 0 5 \n", | |
"29 4+ 1 2 \n", | |
"... ... ... ... \n", | |
"537547 1 0 1 \n", | |
"537548 1 1 15 \n", | |
"537549 1 1 2 \n", | |
"537550 3 0 1 \n", | |
"537551 3 0 5 \n", | |
"537552 3 0 6 \n", | |
"537553 3 0 10 \n", | |
"537554 3 0 5 \n", | |
"537555 3 0 13 \n", | |
"537556 1 1 8 \n", | |
"537557 1 1 2 \n", | |
"537558 1 1 8 \n", | |
"537559 1 1 8 \n", | |
"537560 1 1 8 \n", | |
"537561 1 1 8 \n", | |
"537562 1 1 1 \n", | |
"537563 1 1 8 \n", | |
"537564 1 1 5 \n", | |
"537565 1 1 8 \n", | |
"537566 1 1 5 \n", | |
"537567 1 1 8 \n", | |
"537568 1 1 11 \n", | |
"537569 1 1 5 \n", | |
"537570 1 1 1 \n", | |
"537571 1 0 1 \n", | |
"537572 1 0 1 \n", | |
"537573 1 0 1 \n", | |
"537574 1 0 8 \n", | |
"537575 1 0 5 \n", | |
"537576 1 0 5 \n", | |
"\n", | |
" Product_Category_2 Product_Category_3 Purchase \n", | |
"0 6.0 14.0 8370 \n", | |
"1 6.0 14.0 15200 \n", | |
"2 6.0 14.0 1422 \n", | |
"3 14.0 14.0 1057 \n", | |
"4 14.0 14.0 7969 \n", | |
"5 2.0 14.0 15227 \n", | |
"6 8.0 17.0 19215 \n", | |
"7 15.0 17.0 15854 \n", | |
"8 16.0 17.0 15686 \n", | |
"9 16.0 17.0 7871 \n", | |
"10 11.0 17.0 5254 \n", | |
"11 11.0 17.0 3957 \n", | |
"12 11.0 17.0 6073 \n", | |
"13 2.0 5.0 15665 \n", | |
"14 8.0 14.0 5378 \n", | |
"15 5.0 14.0 2079 \n", | |
"16 3.0 4.0 13055 \n", | |
"17 14.0 4.0 8851 \n", | |
"18 14.0 16.0 11788 \n", | |
"19 5.0 15.0 19614 \n", | |
"20 14.0 15.0 8584 \n", | |
"21 14.0 15.0 9872 \n", | |
"22 14.0 15.0 9743 \n", | |
"23 14.0 15.0 5982 \n", | |
"24 8.0 14.0 11927 \n", | |
"25 8.0 14.0 16662 \n", | |
"26 8.0 14.0 5887 \n", | |
"27 14.0 14.0 6973 \n", | |
"28 8.0 14.0 5391 \n", | |
"29 4.0 8.0 16352 \n", | |
"... ... ... ... \n", | |
"537547 2.0 15.0 11543 \n", | |
"537548 2.0 15.0 20924 \n", | |
"537549 8.0 14.0 13082 \n", | |
"537550 8.0 14.0 11658 \n", | |
"537551 6.0 8.0 6863 \n", | |
"537552 8.0 8.0 16415 \n", | |
"537553 13.0 8.0 18526 \n", | |
"537554 14.0 8.0 7099 \n", | |
"537555 16.0 8.0 578 \n", | |
"537556 14.0 8.0 2183 \n", | |
"537557 14.0 8.0 12724 \n", | |
"537558 17.0 8.0 7796 \n", | |
"537559 17.0 8.0 7770 \n", | |
"537560 16.0 8.0 5940 \n", | |
"537561 16.0 8.0 7834 \n", | |
"537562 13.0 14.0 11508 \n", | |
"537563 13.0 14.0 6074 \n", | |
"537564 14.0 14.0 7084 \n", | |
"537565 14.0 14.0 7934 \n", | |
"537566 14.0 14.0 5350 \n", | |
"537567 14.0 14.0 1994 \n", | |
"537568 14.0 14.0 5930 \n", | |
"537569 14.0 14.0 7042 \n", | |
"537570 16.0 14.0 15491 \n", | |
"537571 2.0 5.0 11852 \n", | |
"537572 2.0 5.0 11664 \n", | |
"537573 15.0 16.0 19196 \n", | |
"537574 15.0 16.0 8043 \n", | |
"537575 15.0 16.0 7172 \n", | |
"537576 8.0 16.0 6875 \n", | |
"\n", | |
"[537577 rows x 12 columns]" | |
] | |
}, | |
"execution_count": 68, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df10=df.copy()\n", | |
"\n", | |
"\n", | |
"df10.groupby('Gender').filter(lambda x: len(x) > 4)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 26. Pandas merging and joining \n", | |
"\n", | |
"\n", | |
"Pandas has full-featured, high performance in-memory join operations that are very similar to relational databases like SQL. These methods perform significantly better than other open source implementations like base::merge.data.frame in R. The reason for this is careful algorithmic design and the internal layout of the data in DataFrame.\n", | |
"\n", | |
"\n", | |
"Pandas provides a single function, **merge**, as the entry point for all standard database join operations between DataFrame objects.\n", | |
"\n", | |
"\n", | |
"The syntax of the merge function is as follows:-\n", | |
"\n", | |
"\n", | |
"\n", | |
"`pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)`\n", | |
"\n", | |
"\n", | |
"\n", | |
"The description of the parameters used is as follows−\n", | |
"\n", | |
"\n", | |
"- **left** − A DataFrame object.\n", | |
"\n", | |
"\n", | |
"- **right** − Another DataFrame object.\n", | |
"\n", | |
"\n", | |
"- **on** − Columns (names) to join on. Must be found in both the left and right DataFrame objects.\n", | |
"\n", | |
"\n", | |
"- **left_on** − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.\n", | |
"\n", | |
"\n", | |
"- **right_on** − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.\n", | |
"\n", | |
"\n", | |
"- **left_index** − If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.\n", | |
"\n", | |
"\n", | |
"- **right_index** − Same usage as left_index for the right DataFrame.\n", | |
"\n", | |
"\n", | |
"- **how** − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. \n", | |
"\n", | |
"\n", | |
"- **sort** − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.\n", | |
"\n", | |
"\n", | |
"Now, I will create two different DataFrames and perform the merging operations on them as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 69, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" id Name subject_id\n", | |
"0 1 Rohit sub1\n", | |
"1 2 Dhawan sub2\n", | |
"2 3 Virat sub4\n", | |
"3 4 Dhoni sub6\n", | |
"4 5 Kedar sub5\n", | |
" id Name subject_id\n", | |
"0 1 Kumar sub2\n", | |
"1 2 Bumrah sub4\n", | |
"2 3 Shami sub3\n", | |
"3 4 Kuldeep sub6\n", | |
"4 5 Chahal sub5\n" | |
] | |
} | |
], | |
"source": [ | |
"# let's create two dataframes\n", | |
"\n", | |
"batsmen = pd.DataFrame({\n", | |
" 'id':[1,2,3,4,5],\n", | |
" 'Name': ['Rohit', 'Dhawan', 'Virat', 'Dhoni', 'Kedar'],\n", | |
" 'subject_id':['sub1','sub2','sub4','sub6','sub5']})\n", | |
"\n", | |
"bowler = pd.DataFrame(\n", | |
" {'id':[1,2,3,4,5],\n", | |
" 'Name': ['Kumar', 'Bumrah', 'Shami', 'Kuldeep', 'Chahal'],\n", | |
" 'subject_id':['sub2','sub4','sub3','sub6','sub5']})\n", | |
"\n", | |
"\n", | |
"print(batsmen)\n", | |
"\n", | |
"\n", | |
"print(bowler)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 70, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id</th>\n", | |
" <th>Name_x</th>\n", | |
" <th>subject_id_x</th>\n", | |
" <th>Name_y</th>\n", | |
" <th>subject_id_y</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" <td>Kumar</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" <td>Bumrah</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" <td>Shami</td>\n", | |
" <td>sub3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" <td>Kuldeep</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" <td>Chahal</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id Name_x subject_id_x Name_y subject_id_y\n", | |
"0 1 Rohit sub1 Kumar sub2\n", | |
"1 2 Dhawan sub2 Bumrah sub4\n", | |
"2 3 Virat sub4 Shami sub3\n", | |
"3 4 Dhoni sub6 Kuldeep sub6\n", | |
"4 5 Kedar sub5 Chahal sub5" | |
] | |
}, | |
"execution_count": 70, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# merge two dataframes on a key\n", | |
"\n", | |
"pd.merge(batsmen, bowler, on='id')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 71, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id</th>\n", | |
" <th>Name_x</th>\n", | |
" <th>subject_id</th>\n", | |
" <th>Name_y</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" <td>Kuldeep</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" <td>Chahal</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id Name_x subject_id Name_y\n", | |
"0 4 Dhoni sub6 Kuldeep\n", | |
"1 5 Kedar sub5 Chahal" | |
] | |
}, | |
"execution_count": 71, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# merge two dataframes on multiple keys\n", | |
"\n", | |
"pd.merge(batsmen, bowler, on=['id', 'subject_id'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Merge using 'how' argument\n", | |
"\n", | |
"\n", | |
"\n", | |
"The **how** argument to merge specifies how to determine which keys are to be included in the resulting table. If a key combination does not appear in either the left or the right tables, the values in the joined table will be **NA**.\n", | |
"\n", | |
"\n", | |
"Here is a summary of the how options and their SQL equivalent names −\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **Merge Method** -\t**SQL Equivalent**\t- **Description**\n", | |
"\n", | |
"\n", | |
"- left - LEFT OUTER JOIN\t- Use keys from left object\n", | |
"\n", | |
"\n", | |
"- right\t - RIGHT OUTER JOIN\t- Use keys from right object\n", | |
"\n", | |
"\n", | |
"- outer\t - FULL OUTER JOIN\t- Use union of keys\n", | |
"\n", | |
"\n", | |
"- inner\t - INNER JOIN\t - Use intersection of keys" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id_x</th>\n", | |
" <th>Name_x</th>\n", | |
" <th>subject_id</th>\n", | |
" <th>id_y</th>\n", | |
" <th>Name_y</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" <td>1.0</td>\n", | |
" <td>Kumar</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" <td>2.0</td>\n", | |
" <td>Bumrah</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" <td>4.0</td>\n", | |
" <td>Kuldeep</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" <td>5.0</td>\n", | |
" <td>Chahal</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id_x Name_x subject_id id_y Name_y\n", | |
"0 1 Rohit sub1 NaN NaN\n", | |
"1 2 Dhawan sub2 1.0 Kumar\n", | |
"2 3 Virat sub4 2.0 Bumrah\n", | |
"3 4 Dhoni sub6 4.0 Kuldeep\n", | |
"4 5 Kedar sub5 5.0 Chahal" | |
] | |
}, | |
"execution_count": 72, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# left join\n", | |
"\n", | |
"pd.merge(batsmen, bowler, on='subject_id', how='left')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id_x</th>\n", | |
" <th>Name_x</th>\n", | |
" <th>subject_id</th>\n", | |
" <th>id_y</th>\n", | |
" <th>Name_y</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>2.0</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" <td>1</td>\n", | |
" <td>Kumar</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>3.0</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" <td>2</td>\n", | |
" <td>Bumrah</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>4.0</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" <td>4</td>\n", | |
" <td>Kuldeep</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>5.0</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" <td>5</td>\n", | |
" <td>Chahal</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>sub3</td>\n", | |
" <td>3</td>\n", | |
" <td>Shami</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id_x Name_x subject_id id_y Name_y\n", | |
"0 2.0 Dhawan sub2 1 Kumar\n", | |
"1 3.0 Virat sub4 2 Bumrah\n", | |
"2 4.0 Dhoni sub6 4 Kuldeep\n", | |
"3 5.0 Kedar sub5 5 Chahal\n", | |
"4 NaN NaN sub3 3 Shami" | |
] | |
}, | |
"execution_count": 73, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# right join\n", | |
"\n", | |
"pd.merge(batsmen, bowler, on='subject_id', how='right')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 74, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id_x</th>\n", | |
" <th>Name_x</th>\n", | |
" <th>subject_id</th>\n", | |
" <th>id_y</th>\n", | |
" <th>Name_y</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1.0</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2.0</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" <td>1.0</td>\n", | |
" <td>Kumar</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3.0</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" <td>2.0</td>\n", | |
" <td>Bumrah</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4.0</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" <td>4.0</td>\n", | |
" <td>Kuldeep</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5.0</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" <td>5.0</td>\n", | |
" <td>Chahal</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" <td>sub3</td>\n", | |
" <td>3.0</td>\n", | |
" <td>Shami</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id_x Name_x subject_id id_y Name_y\n", | |
"0 1.0 Rohit sub1 NaN NaN\n", | |
"1 2.0 Dhawan sub2 1.0 Kumar\n", | |
"2 3.0 Virat sub4 2.0 Bumrah\n", | |
"3 4.0 Dhoni sub6 4.0 Kuldeep\n", | |
"4 5.0 Kedar sub5 5.0 Chahal\n", | |
"5 NaN NaN sub3 3.0 Shami" | |
] | |
}, | |
"execution_count": 74, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# outer join\n", | |
"\n", | |
"pd.merge(batsmen, bowler, on='subject_id', how='outer')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 75, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id_x</th>\n", | |
" <th>Name_x</th>\n", | |
" <th>subject_id</th>\n", | |
" <th>id_y</th>\n", | |
" <th>Name_y</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" <td>1</td>\n", | |
" <td>Kumar</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" <td>2</td>\n", | |
" <td>Bumrah</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" <td>4</td>\n", | |
" <td>Kuldeep</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" <td>5</td>\n", | |
" <td>Chahal</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id_x Name_x subject_id id_y Name_y\n", | |
"0 2 Dhawan sub2 1 Kumar\n", | |
"1 3 Virat sub4 2 Bumrah\n", | |
"2 4 Dhoni sub6 4 Kuldeep\n", | |
"3 5 Kedar sub5 5 Chahal" | |
] | |
}, | |
"execution_count": 75, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# inner join\n", | |
"\n", | |
"pd.merge(batsmen, bowler, on='subject_id', how='inner')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 27. Pandas concatenation operation\n", | |
"\n", | |
"\n", | |
"\n", | |
"Pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects.\n", | |
"\n", | |
"\n", | |
"The **concat()** function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.\n", | |
"\n", | |
"\n", | |
"The syntax of the **concat()** function is as follows:-\n", | |
"\n", | |
"\n", | |
"\n", | |
"`pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)`\n", | |
"\n", | |
"\n", | |
"\n", | |
"The description of the arguments is as follows:-\n", | |
"\n", | |
"\n", | |
"\n", | |
"- **objs** − This is a sequence or mapping of Series, DataFrame, or Panel objects.\n", | |
"\n", | |
"\n", | |
"- **axis** − {0, 1, ...}, default 0. This is the axis to concatenate along.\n", | |
"\n", | |
"\n", | |
"- **join** − {'inner', 'outer'}, default 'outer'. How to handle indexes on other axis(es). Outer for union and inner for intersection.\n", | |
"\n", | |
"\n", | |
"- **ignore_index** − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1.\n", | |
"\n", | |
"\n", | |
"- **join_axes** − This is the list of index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic.\n", | |
"\n", | |
"\n", | |
"- **keys** : sequence, default None. Construct hierarchical index using the passed keys as the outermost level. If multiple levels passed, should contain tuples.\n", | |
"\n", | |
"\n", | |
"- **levels** : list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.\n", | |
"\n", | |
"\n", | |
"- **names** : list, default None. Names for the levels in the resulting hierarchical index.\n", | |
"\n", | |
"\n", | |
"- **verify_integrity** : boolean, default False. Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation.\n", | |
"\n", | |
"\n", | |
"- **copy** : boolean, default True. If False, do not copy data unnecessarily.\n", | |
"\n", | |
"\n", | |
"Now, I will create two dataframes and do concatenation:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 76, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" id Name subject_id\n", | |
"0 1 Rohit sub1\n", | |
"1 2 Dhawan sub2\n", | |
"2 3 Virat sub4\n", | |
"3 4 Dhoni sub6\n", | |
"4 5 Kedar sub5\n", | |
" id Name subject_id\n", | |
"0 1 Kumar sub2\n", | |
"1 2 Bumrah sub4\n", | |
"2 3 Shami sub3\n", | |
"3 4 Kuldeep sub6\n", | |
"4 5 Chahal sub5\n" | |
] | |
} | |
], | |
"source": [ | |
"# let's create two dataframes\n", | |
"\n", | |
"batsmen = pd.DataFrame({\n", | |
" 'id':[1,2,3,4,5],\n", | |
" 'Name': ['Rohit', 'Dhawan', 'Virat', 'Dhoni', 'Kedar'],\n", | |
" 'subject_id':['sub1','sub2','sub4','sub6','sub5']})\n", | |
"\n", | |
"bowler = pd.DataFrame(\n", | |
" {'id':[1,2,3,4,5],\n", | |
" 'Name': ['Kumar', 'Bumrah', 'Shami', 'Kuldeep', 'Chahal'],\n", | |
" 'subject_id':['sub2','sub4','sub3','sub6','sub5']})\n", | |
"\n", | |
"\n", | |
"print(batsmen)\n", | |
"\n", | |
"\n", | |
"print(bowler)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 77, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id</th>\n", | |
" <th>Name</th>\n", | |
" <th>subject_id</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Kumar</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Bumrah</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Shami</td>\n", | |
" <td>sub3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Kuldeep</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Chahal</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id Name subject_id\n", | |
"0 1 Rohit sub1\n", | |
"1 2 Dhawan sub2\n", | |
"2 3 Virat sub4\n", | |
"3 4 Dhoni sub6\n", | |
"4 5 Kedar sub5\n", | |
"0 1 Kumar sub2\n", | |
"1 2 Bumrah sub4\n", | |
"2 3 Shami sub3\n", | |
"3 4 Kuldeep sub6\n", | |
"4 5 Chahal sub5" | |
] | |
}, | |
"execution_count": 77, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# concatenate the dataframes\n", | |
"\n", | |
"\n", | |
"team=[batsmen, bowler]\n", | |
"\n", | |
"pd.concat(team)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 78, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th>id</th>\n", | |
" <th>Name</th>\n", | |
" <th>subject_id</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th rowspan=\"5\" valign=\"top\">x</th>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"5\" valign=\"top\">y</th>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Kumar</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Bumrah</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Shami</td>\n", | |
" <td>sub3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Kuldeep</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Chahal</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id Name subject_id\n", | |
"x 0 1 Rohit sub1\n", | |
" 1 2 Dhawan sub2\n", | |
" 2 3 Virat sub4\n", | |
" 3 4 Dhoni sub6\n", | |
" 4 5 Kedar sub5\n", | |
"y 0 1 Kumar sub2\n", | |
" 1 2 Bumrah sub4\n", | |
" 2 3 Shami sub3\n", | |
" 3 4 Kuldeep sub6\n", | |
" 4 5 Chahal sub5" | |
] | |
}, | |
"execution_count": 78, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# associate keys with the dataframes\n", | |
"\n", | |
"pd.concat(team, keys=['x', 'y'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see the index of the resultant dataframe is duplicated. So each index is repeated.\n", | |
"\n", | |
"If the resultant object has to follow its own indexing, we can set **ignore_index** option to True as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 79, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id</th>\n", | |
" <th>Name</th>\n", | |
" <th>subject_id</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>1</td>\n", | |
" <td>Kumar</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>2</td>\n", | |
" <td>Bumrah</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>3</td>\n", | |
" <td>Shami</td>\n", | |
" <td>sub3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>4</td>\n", | |
" <td>Kuldeep</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>5</td>\n", | |
" <td>Chahal</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id Name subject_id\n", | |
"0 1 Rohit sub1\n", | |
"1 2 Dhawan sub2\n", | |
"2 3 Virat sub4\n", | |
"3 4 Dhoni sub6\n", | |
"4 5 Kedar sub5\n", | |
"5 1 Kumar sub2\n", | |
"6 2 Bumrah sub4\n", | |
"7 3 Shami sub3\n", | |
"8 4 Kuldeep sub6\n", | |
"9 5 Chahal sub5" | |
] | |
}, | |
"execution_count": 79, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"pd.concat(team, keys=['x', 'y'], ignore_index=True)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that the index changes completely and the Keys are also overridden." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If two objects need to be added along axis=1, then the new columns will be appended as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 80, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id</th>\n", | |
" <th>Name</th>\n", | |
" <th>subject_id</th>\n", | |
" <th>id</th>\n", | |
" <th>Name</th>\n", | |
" <th>subject_id</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" <td>1</td>\n", | |
" <td>Kumar</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" <td>2</td>\n", | |
" <td>Bumrah</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" <td>3</td>\n", | |
" <td>Shami</td>\n", | |
" <td>sub3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" <td>4</td>\n", | |
" <td>Kuldeep</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" <td>5</td>\n", | |
" <td>Chahal</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id Name subject_id id Name subject_id\n", | |
"0 1 Rohit sub1 1 Kumar sub2\n", | |
"1 2 Dhawan sub2 2 Bumrah sub4\n", | |
"2 3 Virat sub4 3 Shami sub3\n", | |
"3 4 Dhoni sub6 4 Kuldeep sub6\n", | |
"4 5 Kedar sub5 5 Chahal sub5" | |
] | |
}, | |
"execution_count": 80, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"pd.concat(team, axis=1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Concatenating using append\n", | |
"\n", | |
"\n", | |
"A useful shortcut to concat are the append instance methods on Series and DataFrame. These methods actually predated concat. They concatenate along axis=0, namely the index as follows:−" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 81, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id</th>\n", | |
" <th>Name</th>\n", | |
" <th>subject_id</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Rohit</td>\n", | |
" <td>sub1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Dhawan</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Virat</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Dhoni</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Kedar</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Kumar</td>\n", | |
" <td>sub2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>Bumrah</td>\n", | |
" <td>sub4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Shami</td>\n", | |
" <td>sub3</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Kuldeep</td>\n", | |
" <td>sub6</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Chahal</td>\n", | |
" <td>sub5</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" id Name subject_id\n", | |
"0 1 Rohit sub1\n", | |
"1 2 Dhawan sub2\n", | |
"2 3 Virat sub4\n", | |
"3 4 Dhoni sub6\n", | |
"4 5 Kedar sub5\n", | |
"0 1 Kumar sub2\n", | |
"1 2 Bumrah sub4\n", | |
"2 3 Shami sub3\n", | |
"3 4 Kuldeep sub6\n", | |
"4 5 Chahal sub5" | |
] | |
}, | |
"execution_count": 81, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"batsmen.append(bowler)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 28. Reshaping by melt and pivot\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Melt creates wide-to-long format dataframe\n", | |
"\n", | |
"\n", | |
"\n", | |
"When we take a closer look at our original dataframe, we can see that our dataset is not in the tidy data format.\n", | |
"\n", | |
"The columns `Product_Category_1`, `Product_Category_2` and `Product_Category_3` contain values of product_category rather than variables. We should reorganize our dataframe into tidy data format.\n", | |
"\n", | |
"The **melt()** function is useful to convert a DataFrame from **wide-to-long** format where one or more columns are identifier variables, while all other columns are considered measured variables. The measured variables are then \"unpivoted\" to the row axis, leaving non-identifier columns, \"variable\" and \"value\". The names of those columns can be customized by supplying the var_name and value_name parameters.\n", | |
"\n", | |
"We can convert our dataset into long data format using the **melt()** function as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 82, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Index(['User_ID', 'Product_ID', 'Gender', 'Age', 'Occupation', 'City_Category',\n", | |
" 'Stay_In_Current_City_Years', 'Marital_Status', 'Product_Category_1',\n", | |
" 'Product_Category_2', 'Product_Category_3', 'Purchase'],\n", | |
" dtype='object')" | |
] | |
}, | |
"execution_count": 82, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df11=df.copy()\n", | |
"\n", | |
"df11.columns" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 83, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>User_ID</th>\n", | |
" <th>Product_ID</th>\n", | |
" <th>Gender</th>\n", | |
" <th>Age</th>\n", | |
" <th>Occupation</th>\n", | |
" <th>City_Category</th>\n", | |
" <th>Marital_Status</th>\n", | |
" <th>Purchase</th>\n", | |
" <th>Product_Category</th>\n", | |
" <th>Amount</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00069042</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>0</td>\n", | |
" <td>8370</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>3.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00248942</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>0</td>\n", | |
" <td>15200</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00087842</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>0</td>\n", | |
" <td>1422</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>12.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>1000001</td>\n", | |
" <td>P00085442</td>\n", | |
" <td>F</td>\n", | |
" <td>0-17</td>\n", | |
" <td>10</td>\n", | |
" <td>A</td>\n", | |
" <td>0</td>\n", | |
" <td>1057</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>12.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>1000002</td>\n", | |
" <td>P00285442</td>\n", | |
" <td>M</td>\n", | |
" <td>55+</td>\n", | |
" <td>16</td>\n", | |
" <td>C</td>\n", | |
" <td>0</td>\n", | |
" <td>7969</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>8.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>1000003</td>\n", | |
" <td>P00193542</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>15</td>\n", | |
" <td>A</td>\n", | |
" <td>0</td>\n", | |
" <td>15227</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P00184942</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>19215</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P00346142</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>15854</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>1000004</td>\n", | |
" <td>P0097242</td>\n", | |
" <td>M</td>\n", | |
" <td>46-50</td>\n", | |
" <td>7</td>\n", | |
" <td>B</td>\n", | |
" <td>1</td>\n", | |
" <td>15686</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>1000005</td>\n", | |
" <td>P00274942</td>\n", | |
" <td>M</td>\n", | |
" <td>26-35</td>\n", | |
" <td>20</td>\n", | |
" <td>A</td>\n", | |
" <td>1</td>\n", | |
" <td>7871</td>\n", | |
" <td>Product_Category_1</td>\n", | |
" <td>8.0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" User_ID Product_ID Gender Age Occupation City_Category Marital_Status \\\n", | |
"0 1000001 P00069042 F 0-17 10 A 0 \n", | |
"1 1000001 P00248942 F 0-17 10 A 0 \n", | |
"2 1000001 P00087842 F 0-17 10 A 0 \n", | |
"3 1000001 P00085442 F 0-17 10 A 0 \n", | |
"4 1000002 P00285442 M 55+ 16 C 0 \n", | |
"5 1000003 P00193542 M 26-35 15 A 0 \n", | |
"6 1000004 P00184942 M 46-50 7 B 1 \n", | |
"7 1000004 P00346142 M 46-50 7 B 1 \n", | |
"8 1000004 P0097242 M 46-50 7 B 1 \n", | |
"9 1000005 P00274942 M 26-35 20 A 1 \n", | |
"\n", | |
" Purchase Product_Category Amount \n", | |
"0 8370 Product_Category_1 3.0 \n", | |
"1 15200 Product_Category_1 1.0 \n", | |
"2 1422 Product_Category_1 12.0 \n", | |
"3 1057 Product_Category_1 12.0 \n", | |
"4 7969 Product_Category_1 8.0 \n", | |
"5 15227 Product_Category_1 1.0 \n", | |
"6 19215 Product_Category_1 1.0 \n", | |
"7 15854 Product_Category_1 1.0 \n", | |
"8 15686 Product_Category_1 1.0 \n", | |
"9 7871 Product_Category_1 8.0 " | |
] | |
}, | |
"execution_count": 83, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df12=(pd.melt(frame=df11, id_vars=['User_ID','Product_ID', 'Gender','Age','Occupation','City_Category',\n", | |
" 'Marital_Status','Purchase'], \n", | |
" value_vars=['Product_Category_1','Product_Category_2','Product_Category_3'], \n", | |
" var_name='Product_Category', value_name='Amount'))\n", | |
"\n", | |
"df12.head(10)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Pivot creates long-to-wide format dataframe\n", | |
"\n", | |
"\n", | |
"I have melt three columns `Product_Category_1`, `Product_Category_2` and `Product_Category_3` into a single column named\n", | |
"`Product_Category` with **melt()** function. So, I have converted the above dataframe from wide to long format.\n", | |
"\n", | |
"\n", | |
"Now, I will convert the above column `Product_Category` from long to wide format with **pivot()** function. \n", | |
"**pivot()** function takes 3 arguments with the following names - index, columns, and values. As a value for each of these parameters we need to specify a column name in the original table. Then the **pivot()** function will create a new table, \n", | |
"whose row and column indices are the unique values of the respective parameters. The cell values of the new table are taken \n", | |
"from column given as the values parameter.\n", | |
"\n", | |
"\n", | |
"This is illustrated below:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 84, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th>Product_Category</th>\n", | |
" <th>Product_Category_1</th>\n", | |
" <th>Product_Category_2</th>\n", | |
" <th>Product_Category_3</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>3.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>12.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>12.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>8.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>8.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>5.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>8.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>8.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>5.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>4.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>2.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>5.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>5.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>8.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>8.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>8.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>24</th>\n", | |
" <td>1.0</td>\n", | |
" <td>NaN</td>\n", | |
" <td>NaN</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
"Product_Category Product_Category_1 Product_Category_2 Product_Category_3\n", | |
"0 3.0 NaN NaN\n", | |
"1 1.0 NaN NaN\n", | |
"2 12.0 NaN NaN\n", | |
"3 12.0 NaN NaN\n", | |
"4 8.0 NaN NaN\n", | |
"5 1.0 NaN NaN\n", | |
"6 1.0 NaN NaN\n", | |
"7 1.0 NaN NaN\n", | |
"8 1.0 NaN NaN\n", | |
"9 8.0 NaN NaN\n", | |
"10 5.0 NaN NaN\n", | |
"11 8.0 NaN NaN\n", | |
"12 8.0 NaN NaN\n", | |
"13 1.0 NaN NaN\n", | |
"14 5.0 NaN NaN\n", | |
"15 4.0 NaN NaN\n", | |
"16 2.0 NaN NaN\n", | |
"17 5.0 NaN NaN\n", | |
"18 1.0 NaN NaN\n", | |
"19 1.0 NaN NaN\n", | |
"20 5.0 NaN NaN\n", | |
"21 8.0 NaN NaN\n", | |
"22 8.0 NaN NaN\n", | |
"23 8.0 NaN NaN\n", | |
"24 1.0 NaN NaN" | |
] | |
}, | |
"execution_count": 84, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df13=df12[['Product_Category', 'Amount']]\n", | |
"\n", | |
"df14=df13.pivot(index=None, columns='Product_Category', values='Amount')\n", | |
"\n", | |
"df14.head(25)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Reshaping with pivot_table function\n", | |
"\n", | |
"\n", | |
"Before calling the pivot() function, we need to ensure that our dataset does not have rows with duplicate values for the specified columns. If there are duplicate entries for rows in the dataset, the pivot() function, will throw a value error.\n", | |
"\n", | |
"In this case, the **pivot_table()** method comes to rescue. It works like pivot, but it aggregates the values from rows with duplicate entries for the specified columns. The syntax of the pivot_table() function is given below:-\n", | |
"\n", | |
"\n", | |
"`df.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, \n", | |
"margins=False, dropna=True, margins_name='All')`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 29. Reshaping by stacking and unstacking\n", | |
"\n", | |
"\n", | |
"There are two other methods called **stack()** and **unstack()** which closely resemble the **pivot()** method. These methods are designed to work together with multiindex objects. The functionality of these methods is described below:-\n", | |
"\n", | |
"\n", | |
"\n", | |
"### Stacking\n", | |
"\n", | |
"\n", | |
"Stacking means \"pivot\" a level of the (possibly hierarchical) column labels, returning a DataFrame with an index with a new inner-most level of row labels. So. stacking a dataframe means moving or pivoting the innermost column index to become the innermost row index. \n", | |
"\n", | |
"\n", | |
"It return a reshaped dataframe or series having a multi-level index with one or more new inner-most levels compared to the current dataframe. The new inner-most levels are created by pivoting the columns of the current dataframe.\n", | |
"\n", | |
"\n", | |
"\n", | |
"- if the columns have a single level, the output is a Series.\n", | |
"\n", | |
"\n", | |
"- if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.\n", | |
"\n", | |
"\n", | |
"In this case, we look at a dataframe with single level hierarchical indices on both axes. Stacking takes the most-inner column index (height, weight), makes it the most inner row index and reshuffles the cell values accordingly. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 85, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead tr th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr>\n", | |
" <th></th>\n", | |
" <th colspan=\"2\" halign=\"left\">weight</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th></th>\n", | |
" <th>kg</th>\n", | |
" <th>pounds</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>husband</th>\n", | |
" <td>75</td>\n", | |
" <td>165</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>wife</th>\n", | |
" <td>60</td>\n", | |
" <td>132</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" weight \n", | |
" kg pounds\n", | |
"husband 75 165\n", | |
"wife 60 132" | |
] | |
}, | |
"execution_count": 85, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"cols=pd.MultiIndex.from_tuples([('weight', 'kg'), ('weight', 'pounds')])\n", | |
"\n", | |
"df15=pd.DataFrame([[75,165], [60, 132]],\n", | |
" index=['husband', 'wife'],\n", | |
" columns=cols)\n", | |
"\n", | |
"df15" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 86, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th>weight</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">husband</th>\n", | |
" <th>kg</th>\n", | |
" <td>75</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>pounds</th>\n", | |
" <td>165</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th rowspan=\"2\" valign=\"top\">wife</th>\n", | |
" <th>kg</th>\n", | |
" <td>60</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>pounds</th>\n", | |
" <td>132</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" weight\n", | |
"husband kg 75\n", | |
" pounds 165\n", | |
"wife kg 60\n", | |
" pounds 132" | |
] | |
}, | |
"execution_count": 86, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df16=df15.stack()\n", | |
"\n", | |
"df16" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Unstacking\n", | |
"\n", | |
"\n", | |
"It is the inverse operation of stacking. It means \"pivot\" a level of the (possibly hierarchical) row index to the column axis, producing a reshaped dataframe with a new inner-most level of column labels.\n", | |
"\n", | |
"\n", | |
"I will convert the stacked dataframe df16 back to original form as follows:-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 87, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead tr th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr>\n", | |
" <th></th>\n", | |
" <th colspan=\"2\" halign=\"left\">weight</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th></th>\n", | |
" <th>kg</th>\n", | |
" <th>pounds</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>husband</th>\n", | |
" <td>75</td>\n", | |
" <td>165</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>wife</th>\n", | |
" <td>60</td>\n", | |
" <td>132</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" weight \n", | |
" kg pounds\n", | |
"husband 75 165\n", | |
"wife 60 132" | |
] | |
}, | |
"execution_count": 87, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df16.unstack()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 30. Options and customization with pandas\n", | |
"\n", | |
"\n", | |
"\n", | |
"Pandas provide API to customize some aspects of its behavior. In most cases, we would like to adjust the display related options.\n", | |
"\n", | |
"\n", | |
"The API is composed of five relevant functions. They are as follows :−\n", | |
"\n", | |
"\n", | |
"- 1. **get_option()**\n", | |
"\n", | |
"\n", | |
"- 2. **set_option()**\n", | |
"\n", | |
"\n", | |
"- 3. **reset_option()**\n", | |
"\n", | |
"\n", | |
"- 4. **describe_option()**\n", | |
"\n", | |
"\n", | |
"- 5. **option_context()**\n", | |
"\n", | |
"\n", | |
"Let us now understand how the functions operate." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 1. get_option(param)\n", | |
"\n", | |
"\n", | |
"\n", | |
"**get_option()** takes a single parameter and returns the value as given in the output below −" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 88, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"60" | |
] | |
}, | |
"execution_count": 88, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# display maximum rows\n", | |
"\n", | |
"pd.get_option(\"display.max_rows\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 89, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"20" | |
] | |
}, | |
"execution_count": 89, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# display maximum columns\n", | |
"\n", | |
"pd.get_option(\"display.max_columns\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 2. set_option(param,value)\n", | |
"\n", | |
"\n", | |
"**set_option()** takes two arguments and sets the value to the parameter as shown below −" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 90, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"80" | |
] | |
}, | |
"execution_count": 90, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# set maximum rows\n", | |
"\n", | |
"pd.set_option(\"display.max_rows\", 80)\n", | |
"\n", | |
"pd.get_option(\"display.max_rows\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 91, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"30" | |
] | |
}, | |
"execution_count": 91, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# set maximum columns\n", | |
"\n", | |
"pd.set_option(\"display.max_columns\", 30)\n", | |
"\n", | |
"pd.get_option(\"display.max_columns\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 3. reset_option(param)\n", | |
"\n", | |
"\n", | |
"**reset_option()** takes an argument and sets the value back to the default value." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 92, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"60" | |
] | |
}, | |
"execution_count": 92, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# display maximum rows\n", | |
"\n", | |
"pd.reset_option(\"display.max_rows\")\n", | |
"\n", | |
"pd.get_option(\"display.max_rows\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 93, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"20" | |
] | |
}, | |
"execution_count": 93, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# display maximum columns\n", | |
"\n", | |
"pd.reset_option(\"display.max_columns\")\n", | |
"\n", | |
"pd.get_option(\"display.max_columns\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 4. describe_option(param)\n", | |
"\n", | |
"\n", | |
"**describe_option()** prints the description of the argument." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 94, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"display.max_rows : int\n", | |
" If max_rows is exceeded, switch to truncate view. Depending on\n", | |
" `large_repr`, objects are either centrally truncated or printed as\n", | |
" a summary view. 'None' value means unlimited.\n", | |
"\n", | |
" In case python/IPython is running in a terminal and `large_repr`\n", | |
" equals 'truncate' this can be set to 0 and pandas will auto-detect\n", | |
" the height of the terminal and print a truncated object which fits\n", | |
" the screen height. The IPython notebook, IPython qtconsole, or\n", | |
" IDLE do not run in a terminal and hence it is not possible to do\n", | |
" correct auto-detection.\n", | |
" [default: 60] [currently: 60]\n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"# description of the display maximum rows parameter\n", | |
"\n", | |
"pd.describe_option(\"display.max_rows\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 5. option_context()\n", | |
"\n", | |
"\n", | |
"**option_context()** context manager is used to set the option in with statement temporarily. Option values are restored automatically when you exit with block." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 95, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"10\n", | |
"10\n" | |
] | |
} | |
], | |
"source": [ | |
"# set the parameter value with option_context\n", | |
"\n", | |
"with pd.option_context(\"display.max_rows\",10):\n", | |
" print(pd.get_option(\"display.max_rows\"))\n", | |
" print(pd.get_option(\"display.max_rows\"))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There is a difference between the first and the second print statements. The first statement prints the value set by **option_context()** which is temporary within the with context itself. After the with context, the second print statement prints the configured value." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This concludes our discussion on Pandas and its data analysis tools." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment