Skip to content

Instantly share code, notes, and snippets.

@hygull
Last active November 20, 2018 08:25
Show Gist options
  • Save hygull/7b0c48713e64163c4480926a014646b6 to your computer and use it in GitHub Desktop.
Save hygull/7b0c48713e64163c4480926a014646b6 to your computer and use it in GitHub Desktop.
pandas, data science, groupby(), loc
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tasks:-\n",
"\n",
"+ https://pandas.pydata.org/pandas-docs/stable/groupby.html\n",
"\n",
"+ https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html\n",
"\n",
"> using `groupby()` and `loc[a:b, c:d]`\n",
"\n",
"1. How many people belong to **United States**?\n",
"\n",
"2. How many people belong to **France** and who live in **Nice** city? \n",
"\n",
"3. Find total number of **BA** degree holders in the table.\n",
"\n",
"4. Find total number of **BA** degree holders from **United States**.\n",
"\n",
"> Add 1 more column named **Fullname** using a for loop (Something like `A Z`, `B T`, `P M` etc.) in the existing DataFrame\n",
"\n",
"5. Find the **maximum** salary/income of PHD students available in the table.\n",
"\n",
"6. Who is that/they (with maximum salary/income)? `Display fullname(s)`?\n",
"\n",
"7. Number of people with salary equal to 10000.\n",
"\n",
"8. List all those (with income equal to 10000).\n",
"\n",
"9. Find number of all Female **PHD** o holders.\n",
"\n",
"10. Find total number of high school students from Paris?\n",
"\n",
"11. Who are they? Just display their names with **City** and **Education** both. `Only 3 columns`\n",
"\n",
"12. Update value **United Kingdome** in **Country** column to **United Kingdom** (As we know, it's not correct, typing mistake)."
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"# Import pandas`\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>High School</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>PHD</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>PHD</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>United Kingdome</td>\n",
" <td>Birmingham</td>\n",
" <td>PHD</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>United Kingdome</td>\n",
" <td>Birmingham</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"0 United States New York High School 100000 Female\n",
"1 United States New York High School 105000 Male\n",
"2 United States New York High School 112000 Female\n",
"3 United States New York BA 150000 Male\n",
"4 United States Las Vegas BA 155000 Female\n",
"5 United States Las Vegas BA 160000 Male\n",
"6 United States Las Vegas PHD 190000 Female\n",
"7 United States Miami PHD 205521 Male\n",
"8 United States Miami PHD 210050 Female\n",
"9 United Kingdome London High School 100000 Female\n",
"10 United Kingdome London High School 105000 Male\n",
"11 United Kingdome London High School 112000 Female\n",
"12 United Kingdome London BA 150000 Male\n",
"13 United Kingdome London BA 155000 Female\n",
"14 United Kingdome London BA 160000 Male\n",
"15 United Kingdome Manchester PHD 190000 Female\n",
"16 United Kingdome Manchester PHD 205521 Male\n",
"17 United Kingdome Manchester PHD 210050 Female\n",
"18 United Kingdome Manchester High School 100000 Female\n",
"19 United Kingdome Liverpol High School 105000 Male\n",
"20 United Kingdome Liverpol PHD 112000 Female\n",
"21 United Kingdome Liverpol PHD 150000 Male\n",
"22 United Kingdome Birmingham PHD 155000 Female\n",
"23 United Kingdome Birmingham BA 160000 Male\n",
"24 France Paris High School 100000 Female\n",
"25 France Nice High School 105000 Male\n",
"26 France Paris BA 112000 Female\n",
"27 France Nice BA 150000 Male\n",
"28 France Paris BA 155000 Female\n",
"29 France Nice PHD 160000 Male\n",
"30 France Paris PHD 190000 Female\n",
"31 France Nice PHD 205521 Male"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_excel(\"DataScience_Practice.xlsx\") # Incom\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 1. How many people belong to **United States**?"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000000000A7A05F8>"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grouped = df.groupby([\"Country\"])\n",
"grouped"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'France': Int64Index([24, 25, 26, 27, 28, 29, 30, 31], dtype='int64'),\n",
" 'United Kingdome': Int64Index([9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], dtype='int64'),\n",
" 'United States': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')}"
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# All groups\n",
"\n",
"grouped.groups"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"0 United States New York High School 100000 Female\n",
"1 United States New York High School 105000 Male\n",
"2 United States New York High School 112000 Female\n",
"3 United States New York BA 150000 Male\n",
"4 United States Las Vegas BA 155000 Female\n",
"5 United States Las Vegas BA 160000 Male\n",
"6 United States Las Vegas PHD 190000 Female\n",
"7 United States Miami PHD 205521 Male\n",
"8 United States Miami PHD 210050 Female"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_df = grouped.get_group(\"United States\")\n",
"united_states_df"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"45"
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_df.size # To get number of items"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(9, 5)"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_df.shape # row, col"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_df.ndim # getting dimensions"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(united_states_df)"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(9, 5)"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get shape >=> (rows, colums)\n",
"united_states_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"9"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Finally\n",
"united_states_df.shape[0]"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"45"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_df.size"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('France', 'Nice'): Int64Index([25, 27, 29, 31], dtype='int64'),\n",
" ('France', 'Paris'): Int64Index([24, 26, 28, 30], dtype='int64'),\n",
" ('United Kingdome', 'Birmingham'): Int64Index([22, 23], dtype='int64'),\n",
" ('United Kingdome', 'Liverpol'): Int64Index([19, 20, 21], dtype='int64'),\n",
" ('United Kingdome',\n",
" 'London'): Int64Index([9, 10, 11, 12, 13, 14], dtype='int64'),\n",
" ('United Kingdome',\n",
" 'Manchester'): Int64Index([15, 16, 17, 18], dtype='int64'),\n",
" ('United States', 'Las Vegas'): Int64Index([4, 5, 6], dtype='int64'),\n",
" ('United States', 'Miami'): Int64Index([7, 8], dtype='int64'),\n",
" ('United States', 'New York'): Int64Index([0, 1, 2, 3], dtype='int64')}"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"country_city_grouped = df.groupby([\"Country\", \"City\"])\n",
"country_city_grouped.groups"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"25 France Nice High School 105000 Male\n",
"27 France Nice BA 150000 Male\n",
"29 France Nice PHD 160000 Male\n",
"31 France Nice PHD 205521 Male"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"country_city_grouped.get_group(('France', 'Nice'))"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"24 France Paris High School 100000 Female\n",
"26 France Paris BA 112000 Female\n",
"28 France Paris BA 155000 Female\n",
"30 France Paris PHD 190000 Female"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"country_city_grouped.get_group(('France', 'Paris'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 2. How many people belong to **France** and who live in **Nice** city?"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"24 France Paris High School 100000 Female\n",
"25 France Nice High School 105000 Male\n",
"26 France Paris BA 112000 Female\n",
"27 France Nice BA 150000 Male\n",
"28 France Paris BA 155000 Female\n",
"29 France Nice PHD 160000 Male\n",
"30 France Paris PHD 190000 Female\n",
"31 France Nice PHD 205521 Male"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"france_df = grouped.get_group(\"France\") \n",
"france_df"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000000000A7C0828>"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"france_city_grouped = france_df.groupby([\"City\"])\n",
"france_city_grouped"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Nice': Int64Index([25, 27, 29, 31], dtype='int64'),\n",
" 'Paris': Int64Index([24, 26, 28, 30], dtype='int64')}"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"france_city_grouped.groups"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"25 France Nice High School 105000 Male\n",
"27 France Nice BA 150000 Male\n",
"29 France Nice PHD 160000 Male\n",
"31 France Nice PHD 205521 Male"
]
},
"execution_count": 123,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"france_nice_city_df = france_city_grouped.get_group(\"Nice\")\n",
"france_nice_city_df"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"france_nice_city_df.shape[0]"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"25 France Nice High School 105000 Male\n",
"27 France Nice BA 150000 Male\n",
"29 France Nice PHD 160000 Male\n",
"31 France Nice PHD 205521 Male"
]
},
"execution_count": 125,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Single statement opeartion for above one\n",
"df.groupby([\"Country\"]).get_group(\"France\").groupby([\"City\"]).get_group(\"Nice\")"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"620521"
]
},
"execution_count": 126,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Total income of all the students who belong to Nice city in France\n",
"df.groupby([\"Country\"]).get_group(\"France\").groupby([\"City\"]).get_group(\"Nice\").loc[:, \"Income\"].sum()"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"29 France Nice PHD 160000 Male\n",
"31 France Nice PHD 205521 Male"
]
},
"execution_count": 127,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 2nd Q\n",
"df.groupby([\"Country\", \"City\", \"Education\"]).get_group((\"France\", \"Nice\", \"PHD\")) # .loc[:, \"Education\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 3. Find total number of **BA** degree holders in the table."
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000000000A7C0C50>"
]
},
"execution_count": 128,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"education_grouped = df.groupby(\"Education\")\n",
"education_grouped"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'BA': Int64Index([3, 4, 5, 12, 13, 14, 23, 26, 27, 28], dtype='int64'),\n",
" 'High School': Int64Index([0, 1, 2, 9, 10, 11, 18, 19, 24, 25], dtype='int64'),\n",
" 'PHD': Int64Index([6, 7, 8, 15, 16, 17, 20, 21, 22, 29, 30, 31], dtype='int64')}"
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"education_grouped.groups"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>United Kingdome</td>\n",
" <td>Birmingham</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"3 United States New York BA 150000 Male\n",
"4 United States Las Vegas BA 155000 Female\n",
"5 United States Las Vegas BA 160000 Male\n",
"12 United Kingdome London BA 150000 Male\n",
"13 United Kingdome London BA 155000 Female\n",
"14 United Kingdome London BA 160000 Male\n",
"23 United Kingdome Birmingham BA 160000 Male\n",
"26 France Paris BA 112000 Female\n",
"27 France Nice BA 150000 Male\n",
"28 France Paris BA 155000 Female"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ba_students_df = education_grouped.get_group(\"BA\")\n",
"ba_students_df"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ba_students_df.shape[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 4. Find total number of **BA** degree holders from **United States**."
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 A\n",
"1 B\n",
"2 C\n",
"3 D\n",
"4 E\n",
"5 F\n",
"6 G\n",
"7 H\n",
"8 I\n",
"9 J\n",
"10 K\n",
"11 L\n",
"12 M\n",
"13 N\n",
"14 O\n",
"15 P\n",
"16 Q\n",
"17 R\n",
"18 S\n",
"19 T\n",
"20 U\n",
"21 V\n",
"22 W\n",
"23 X\n",
"24 Y\n",
"25 Z\n",
"26 a\n",
"27 b\n",
"28 c\n",
"29 d\n",
"30 e\n",
"31 f\n",
"dtype: object"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rows = df.shape[0]\n",
"\n",
"fullnames = [chr(i) for i in range(65, 64 + 27)] + [chr(j) for j in range(97, 97+6)]\n",
"fullnames\n",
"\n",
"s = pd.Series(fullnames)\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {},
"outputs": [],
"source": [
"# df2 = df.append(d, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"0 United States New York High School 100000 Female\n",
"1 United States New York High School 105000 Male\n",
"2 United States New York High School 112000 Female\n",
"3 United States New York BA 150000 Male\n",
"4 United States Las Vegas BA 155000 Female\n",
"5 United States Las Vegas BA 160000 Male\n",
"6 United States Las Vegas PHD 190000 Female\n",
"7 United States Miami PHD 205521 Male\n",
"8 United States Miami PHD 210050 Female"
]
},
"execution_count": 134,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_df"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000000000A7DF320>"
]
},
"execution_count": 135,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_ba_grouped = united_states_df.groupby(\"Education\")\n",
"united_states_ba_grouped "
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'BA': Int64Index([3, 4, 5], dtype='int64'),\n",
" 'High School': Int64Index([0, 1, 2], dtype='int64'),\n",
" 'PHD': Int64Index([6, 7, 8], dtype='int64')}"
]
},
"execution_count": 136,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_ba_grouped.groups"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"3 United States New York BA 150000 Male\n",
"4 United States Las Vegas BA 155000 Female\n",
"5 United States Las Vegas BA 160000 Male"
]
},
"execution_count": 137,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"united_states_ba_df = united_states_ba_grouped.get_group(\"BA\")\n",
"united_states_ba_df"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 138,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Finally\n",
"united_states_ba_df.shape[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 5. Find the **maximum** salary/income of PHD students available in the table.\n"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'BA': Int64Index([3, 4, 5, 12, 13, 14, 23, 26, 27, 28], dtype='int64'),\n",
" 'High School': Int64Index([0, 1, 2, 9, 10, 11, 18, 19, 24, 25], dtype='int64'),\n",
" 'PHD': Int64Index([6, 7, 8, 15, 16, 17, 20, 21, 22, 29, 30, 31], dtype='int64')}"
]
},
"execution_count": 139,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"education_grouped.groups"
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>PHD</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>PHD</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>United Kingdome</td>\n",
" <td>Birmingham</td>\n",
" <td>PHD</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"6 United States Las Vegas PHD 190000 Female\n",
"7 United States Miami PHD 205521 Male\n",
"8 United States Miami PHD 210050 Female\n",
"15 United Kingdome Manchester PHD 190000 Female\n",
"16 United Kingdome Manchester PHD 205521 Male\n",
"17 United Kingdome Manchester PHD 210050 Female\n",
"20 United Kingdome Liverpol PHD 112000 Female\n",
"21 United Kingdome Liverpol PHD 150000 Male\n",
"22 United Kingdome Birmingham PHD 155000 Female\n",
"29 France Nice PHD 160000 Male\n",
"30 France Paris PHD 190000 Female\n",
"31 France Nice PHD 205521 Male"
]
},
"execution_count": 140,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"phd_holders_df = education_grouped.get_group(\"PHD\")\n",
"phd_holders_df"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"210050"
]
},
"execution_count": 141,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Finally\n",
"phd_holders_max_income = phd_holders_df[\"Income\"].max()\n",
"phd_holders_max_income"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 6. Who is that/they (with maximum salary/income)? `Display fullname(s)`?\n"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender\n",
"8 United States Miami PHD 210050 Female\n",
"17 United Kingdome Manchester PHD 210050 Female"
]
},
"execution_count": 142,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"phd_holders_df.loc[phd_holders_df[\"Income\"] == phd_holders_max_income]"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"32"
]
},
"execution_count": 143,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape[0]"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2018-09-01', '2018-09-02', '2018-09-03', '2018-09-04',\n",
" '2018-09-05', '2018-09-06', '2018-09-07', '2018-09-08',\n",
" '2018-09-09', '2018-09-10', '2018-09-11', '2018-09-12',\n",
" '2018-09-13', '2018-09-14', '2018-09-15', '2018-09-16',\n",
" '2018-09-17', '2018-09-18', '2018-09-19', '2018-09-20',\n",
" '2018-09-21', '2018-09-22', '2018-09-23', '2018-09-24',\n",
" '2018-09-25', '2018-09-26', '2018-09-27', '2018-09-28',\n",
" '2018-09-29', '2018-09-30', '2018-10-01', '2018-10-02'],\n",
" dtype='datetime64[ns]', freq='D')"
]
},
"execution_count": 144,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dri = pd.date_range(\"20180901\", periods=df.shape[0], freq=\"d\")\n",
"dri"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.indexes.datetimes.DatetimeIndex"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(dri)"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"DataScience_Practice_before.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>City</th>\n",
" <th>Education</th>\n",
" <th>Income</th>\n",
" <th>Gender</th>\n",
" <th>date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>High School</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>United States</td>\n",
" <td>New York</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-04</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>United States</td>\n",
" <td>Las Vegas</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" <td>2018-09-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>United States</td>\n",
" <td>Miami</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" <td>2018-09-09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>High School</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>United Kingdome</td>\n",
" <td>London</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" <td>2018-09-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>PHD</td>\n",
" <td>210050</td>\n",
" <td>Female</td>\n",
" <td>2018-09-18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>United Kingdome</td>\n",
" <td>Manchester</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>PHD</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>United Kingdome</td>\n",
" <td>Liverpol</td>\n",
" <td>PHD</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>United Kingdome</td>\n",
" <td>Birmingham</td>\n",
" <td>PHD</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-23</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>United Kingdome</td>\n",
" <td>Birmingham</td>\n",
" <td>BA</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>High School</td>\n",
" <td>100000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>High School</td>\n",
" <td>105000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>112000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>BA</td>\n",
" <td>150000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>BA</td>\n",
" <td>155000</td>\n",
" <td>Female</td>\n",
" <td>2018-09-29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>160000</td>\n",
" <td>Male</td>\n",
" <td>2018-09-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>France</td>\n",
" <td>Paris</td>\n",
" <td>PHD</td>\n",
" <td>190000</td>\n",
" <td>Female</td>\n",
" <td>2018-10-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>France</td>\n",
" <td>Nice</td>\n",
" <td>PHD</td>\n",
" <td>205521</td>\n",
" <td>Male</td>\n",
" <td>2018-10-02</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country City Education Income Gender date\n",
"0 United States New York High School 100000 Female 2018-09-01\n",
"1 United States New York High School 105000 Male 2018-09-02\n",
"2 United States New York High School 112000 Female 2018-09-03\n",
"3 United States New York BA 150000 Male 2018-09-04\n",
"4 United States Las Vegas BA 155000 Female 2018-09-05\n",
"5 United States Las Vegas BA 160000 Male 2018-09-06\n",
"6 United States Las Vegas PHD 190000 Female 2018-09-07\n",
"7 United States Miami PHD 205521 Male 2018-09-08\n",
"8 United States Miami PHD 210050 Female 2018-09-09\n",
"9 United Kingdome London High School 100000 Female 2018-09-10\n",
"10 United Kingdome London High School 105000 Male 2018-09-11\n",
"11 United Kingdome London High School 112000 Female 2018-09-12\n",
"12 United Kingdome London BA 150000 Male 2018-09-13\n",
"13 United Kingdome London BA 155000 Female 2018-09-14\n",
"14 United Kingdome London BA 160000 Male 2018-09-15\n",
"15 United Kingdome Manchester PHD 190000 Female 2018-09-16\n",
"16 United Kingdome Manchester PHD 205521 Male 2018-09-17\n",
"17 United Kingdome Manchester PHD 210050 Female 2018-09-18\n",
"18 United Kingdome Manchester High School 100000 Female 2018-09-19\n",
"19 United Kingdome Liverpol High School 105000 Male 2018-09-20\n",
"20 United Kingdome Liverpol PHD 112000 Female 2018-09-21\n",
"21 United Kingdome Liverpol PHD 150000 Male 2018-09-22\n",
"22 United Kingdome Birmingham PHD 155000 Female 2018-09-23\n",
"23 United Kingdome Birmingham BA 160000 Male 2018-09-24\n",
"24 France Paris High School 100000 Female 2018-09-25\n",
"25 France Nice High School 105000 Male 2018-09-26\n",
"26 France Paris BA 112000 Female 2018-09-27\n",
"27 France Nice BA 150000 Male 2018-09-28\n",
"28 France Paris BA 155000 Female 2018-09-29\n",
"29 France Nice PHD 160000 Male 2018-09-30\n",
"30 France Paris PHD 190000 Female 2018-10-01\n",
"31 France Nice PHD 205521 Male 2018-10-02"
]
},
"execution_count": 146,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"date\"] = pd.to_datetime(dri)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas._libs.tslibs.timestamps.Timestamp"
]
},
"execution_count": 147,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df[\"date\"][0])"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0xa7efcf8>"
]
},
"execution_count": 148,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df[\"Income\"].plot.line() # FOCUS POINT >=> Controlling the labels in X, Y axis"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"DataScience_Practice_after.csv\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Country City Education Income Gender date
0 United States New York High School 100000 Female 2018-09-01
1 United States New York High School 105000 Male 2018-09-02
2 United States New York High School 112000 Female 2018-09-03
3 United States New York BA 150000 Male 2018-09-04
4 United States Las Vegas BA 155000 Female 2018-09-05
5 United States Las Vegas BA 160000 Male 2018-09-06
6 United States Las Vegas PHD 190000 Female 2018-09-07
7 United States Miami PHD 205521 Male 2018-09-08
8 United States Miami PHD 210050 Female 2018-09-09
9 United Kingdome London High School 100000 Female 2018-09-10
10 United Kingdome London High School 105000 Male 2018-09-11
11 United Kingdome London High School 112000 Female 2018-09-12
12 United Kingdome London BA 150000 Male 2018-09-13
13 United Kingdome London BA 155000 Female 2018-09-14
14 United Kingdome London BA 160000 Male 2018-09-15
15 United Kingdome Manchester PHD 190000 Female 2018-09-16
16 United Kingdome Manchester PHD 205521 Male 2018-09-17
17 United Kingdome Manchester PHD 210050 Female 2018-09-18
18 United Kingdome Manchester High School 100000 Female 2018-09-19
19 United Kingdome Liverpol High School 105000 Male 2018-09-20
20 United Kingdome Liverpol PHD 112000 Female 2018-09-21
21 United Kingdome Liverpol PHD 150000 Male 2018-09-22
22 United Kingdome Birmingham PHD 155000 Female 2018-09-23
23 United Kingdome Birmingham BA 160000 Male 2018-09-24
24 France Paris High School 100000 Female 2018-09-25
25 France Nice High School 105000 Male 2018-09-26
26 France Paris BA 112000 Female 2018-09-27
27 France Nice BA 150000 Male 2018-09-28
28 France Paris BA 155000 Female 2018-09-29
29 France Nice PHD 160000 Male 2018-09-30
30 France Paris PHD 190000 Female 2018-10-01
31 France Nice PHD 205521 Male 2018-10-02
Country City Education Income Gender
0 United States New York High School 100000 Female
1 United States New York High School 105000 Male
2 United States New York High School 112000 Female
3 United States New York BA 150000 Male
4 United States Las Vegas BA 155000 Female
5 United States Las Vegas BA 160000 Male
6 United States Las Vegas PHD 190000 Female
7 United States Miami PHD 205521 Male
8 United States Miami PHD 210050 Female
9 United Kingdome London High School 100000 Female
10 United Kingdome London High School 105000 Male
11 United Kingdome London High School 112000 Female
12 United Kingdome London BA 150000 Male
13 United Kingdome London BA 155000 Female
14 United Kingdome London BA 160000 Male
15 United Kingdome Manchester PHD 190000 Female
16 United Kingdome Manchester PHD 205521 Male
17 United Kingdome Manchester PHD 210050 Female
18 United Kingdome Manchester High School 100000 Female
19 United Kingdome Liverpol High School 105000 Male
20 United Kingdome Liverpol PHD 112000 Female
21 United Kingdome Liverpol PHD 150000 Male
22 United Kingdome Birmingham PHD 155000 Female
23 United Kingdome Birmingham BA 160000 Male
24 France Paris High School 100000 Female
25 France Nice High School 105000 Male
26 France Paris BA 112000 Female
27 France Nice BA 150000 Male
28 France Paris BA 155000 Female
29 France Nice PHD 160000 Male
30 France Paris PHD 190000 Female
31 France Nice PHD 205521 Male
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment