Skip to content

Instantly share code, notes, and snippets.

@uysalserkan
Created June 3, 2021 13:32
Show Gist options
  • Save uysalserkan/e9a5f5a301a69b9b48b22debb5fbabbd to your computer and use it in GitHub Desktop.
Save uysalserkan/e9a5f5a301a69b9b48b22debb5fbabbd to your computer and use it in GitHub Desktop.
Assignment 2
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "optimum-nutrition",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 125,
"id": "rocky-envelope",
"metadata": {},
"outputs": [],
"source": [
"data = pd.read_csv('country_vaccination_stats.csv')"
]
},
{
"cell_type": "code",
"execution_count": 126,
"id": "single-palace",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>date</th>\n",
" <th>daily_vaccinations</th>\n",
" <th>vaccines</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Argentina</td>\n",
" <td>12/29/2020</td>\n",
" <td>NaN</td>\n",
" <td>Sputnik V</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Argentina</td>\n",
" <td>12/30/2020</td>\n",
" <td>15656.0</td>\n",
" <td>Sputnik V</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Argentina</td>\n",
" <td>12/31/2020</td>\n",
" <td>15656.0</td>\n",
" <td>Sputnik V</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Argentina</td>\n",
" <td>1/1/2021</td>\n",
" <td>11070.0</td>\n",
" <td>Sputnik V</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Argentina</td>\n",
" <td>1/2/2021</td>\n",
" <td>8776.0</td>\n",
" <td>Sputnik V</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1497</th>\n",
" <td>Wales</td>\n",
" <td>1/20/2021</td>\n",
" <td>11105.0</td>\n",
" <td>Oxford/AstraZeneca, Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1498</th>\n",
" <td>Wales</td>\n",
" <td>1/21/2021</td>\n",
" <td>12318.0</td>\n",
" <td>Oxford/AstraZeneca, Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1499</th>\n",
" <td>Wales</td>\n",
" <td>1/22/2021</td>\n",
" <td>15148.0</td>\n",
" <td>Oxford/AstraZeneca, Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1500</th>\n",
" <td>Wales</td>\n",
" <td>1/23/2021</td>\n",
" <td>17371.0</td>\n",
" <td>Oxford/AstraZeneca, Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1501</th>\n",
" <td>Wales</td>\n",
" <td>1/24/2021</td>\n",
" <td>17063.0</td>\n",
" <td>Oxford/AstraZeneca, Pfizer/BioNTech</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1502 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" country date daily_vaccinations \\\n",
"0 Argentina 12/29/2020 NaN \n",
"1 Argentina 12/30/2020 15656.0 \n",
"2 Argentina 12/31/2020 15656.0 \n",
"3 Argentina 1/1/2021 11070.0 \n",
"4 Argentina 1/2/2021 8776.0 \n",
"... ... ... ... \n",
"1497 Wales 1/20/2021 11105.0 \n",
"1498 Wales 1/21/2021 12318.0 \n",
"1499 Wales 1/22/2021 15148.0 \n",
"1500 Wales 1/23/2021 17371.0 \n",
"1501 Wales 1/24/2021 17063.0 \n",
"\n",
" vaccines \n",
"0 Sputnik V \n",
"1 Sputnik V \n",
"2 Sputnik V \n",
"3 Sputnik V \n",
"4 Sputnik V \n",
"... ... \n",
"1497 Oxford/AstraZeneca, Pfizer/BioNTech \n",
"1498 Oxford/AstraZeneca, Pfizer/BioNTech \n",
"1499 Oxford/AstraZeneca, Pfizer/BioNTech \n",
"1500 Oxford/AstraZeneca, Pfizer/BioNTech \n",
"1501 Oxford/AstraZeneca, Pfizer/BioNTech \n",
"\n",
"[1502 rows x 4 columns]"
]
},
"execution_count": 126,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"id": "thrown-tulsa",
"metadata": {},
"source": [
"### Question 1\n",
"\n",
"Code Implementation Task: Implement code to fill the missing data (impute) in daily_vaccinations column per country with the minimum daily vaccination number of relevant countries. \n",
"Note: If a country does not have any valid vaccination number yet, fill it with “0” (zero). "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "connected-strength",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>date</th>\n",
" <th>daily_vaccinations</th>\n",
" <th>vaccines</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>930</th>\n",
" <td>Northern Ireland</td>\n",
" <td>12/27/2020</td>\n",
" <td>2135.0</td>\n",
" <td>Oxford/AstraZeneca, Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1389</th>\n",
" <td>United Kingdom</td>\n",
" <td>12/23/2020</td>\n",
" <td>46423.0</td>\n",
" <td>Oxford/AstraZeneca, Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>373</th>\n",
" <td>Denmark</td>\n",
" <td>1/15/2021</td>\n",
" <td>7052.0</td>\n",
" <td>Moderna, Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>486</th>\n",
" <td>France</td>\n",
" <td>1/13/2021</td>\n",
" <td>32738.0</td>\n",
" <td>Pfizer/BioNTech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1094</th>\n",
" <td>Romania</td>\n",
" <td>1/12/2021</td>\n",
" <td>14120.0</td>\n",
" <td>Pfizer/BioNTech</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country date daily_vaccinations \\\n",
"930 Northern Ireland 12/27/2020 2135.0 \n",
"1389 United Kingdom 12/23/2020 46423.0 \n",
"373 Denmark 1/15/2021 7052.0 \n",
"486 France 1/13/2021 32738.0 \n",
"1094 Romania 1/12/2021 14120.0 \n",
"\n",
" vaccines \n",
"930 Oxford/AstraZeneca, Pfizer/BioNTech \n",
"1389 Oxford/AstraZeneca, Pfizer/BioNTech \n",
"373 Moderna, Pfizer/BioNTech \n",
"486 Pfizer/BioNTech \n",
"1094 Pfizer/BioNTech "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.sample(5)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "departmental-jimmy",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1502 entries, 0 to 1501\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 country 1502 non-null object \n",
" 1 date 1502 non-null object \n",
" 2 daily_vaccinations 1442 non-null float64\n",
" 3 vaccines 1502 non-null object \n",
"dtypes: float64(1), object(3)\n",
"memory usage: 47.1+ KB\n"
]
}
],
"source": [
"data.info()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "institutional-guinea",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"country False\n",
"date False\n",
"daily_vaccinations True\n",
"vaccines False\n",
"dtype: bool"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.isnull().any()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "champion-chair",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Argentina', 'Austria', 'Bahrain', 'Belgium', 'Brazil', 'Bulgaria',\n",
" 'Canada', 'Chile', 'China', 'Costa Rica', 'Croatia', 'Cyprus',\n",
" 'Czechia', 'Denmark', 'Ecuador', 'England', 'Estonia', 'Finland',\n",
" 'France', 'Germany', 'Gibraltar', 'Greece', 'Hungary', 'Iceland',\n",
" 'India', 'Indonesia', 'Ireland', 'Isle of Man', 'Israel', 'Italy',\n",
" 'Kuwait', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Mexico',\n",
" 'Netherlands', 'Northern Ireland', 'Norway', 'Oman', 'Panama',\n",
" 'Poland', 'Portugal', 'Romania', 'Russia', 'Saudi Arabia',\n",
" 'Scotland', 'Serbia', 'Seychelles', 'Singapore', 'Slovakia',\n",
" 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',\n",
" 'United Arab Emirates', 'United Kingdom', 'United States', 'Wales'],\n",
" dtype=object)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data['country'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 127,
"id": "usual-memphis",
"metadata": {},
"outputs": [],
"source": [
"countries = data['country'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 128,
"id": "optical-mainstream",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Argentina ülkesindeki minimum aşılama: 6483.0\n",
"Austria ülkesindeki minimum aşılama: 3368.0\n",
"Bahrain ülkesindeki minimum aşılama: 943.0\n",
"Belgium ülkesindeki minimum aşılama: 1.0\n",
"Brazil ülkesindeki minimum aşılama: 112.0\n",
"Bulgaria ülkesindeki minimum aşılama: 472.0\n",
"Canada ülkesindeki minimum aşılama: 866.0\n",
"Chile ülkesindeki minimum aşılama: 51.0\n",
"China ülkesindeki minimum aşılama: 187500.0\n",
"Costa Rica ülkesindeki minimum aşılama: 240.0\n",
"Croatia ülkesindeki minimum aşılama: 989.0\n",
"Cyprus ülkesindeki minimum aşılama: 534.0\n",
"Czechia ülkesindeki minimum aşılama: 1822.0\n",
"Denmark ülkesindeki minimum aşılama: 1994.0\n",
"Ecuador ülkesindeki minimum aşılama: 108.0\n",
"England ülkesindeki minimum aşılama: 37625.0\n",
"Estonia ülkesindeki minimum aşılama: 309.0\n",
"Finland ülkesindeki minimum aşılama: 459.0\n",
"France ülkesindeki minimum aşılama: 5000.0\n",
"Germany ülkesindeki minimum aşılama: 19721.0\n",
"Gibraltar ülkesindeki minimum aşılama: 520.0\n",
"Greece ülkesindeki minimum aşılama: 549.0\n",
"Hungary ülkesindeki minimum aşılama: 1339.0\n",
"Iceland ülkesindeki minimum aşılama: 63.0\n",
"India ülkesindeki minimum aşılama: 112150.0\n",
"Indonesia ülkesindeki minimum aşılama: 11823.0\n",
"Ireland ülkesindeki minimum aşılama: 550.0\n",
"Isle of Man ülkesindeki minimum aşılama: 90.0\n",
"Israel ülkesindeki minimum aşılama: 6887.0\n",
"Italy ülkesindeki minimum aşılama: 926.0\n",
"Kuwait ülkesinde her hangi bir aşılama olmamıştır..\n",
"Latvia ülkesindeki minimum aşılama: 338.0\n",
"Lithuania ülkesindeki minimum aşılama: 311.0\n",
"Luxembourg ülkesindeki minimum aşılama: 62.0\n",
"Malta ülkesindeki minimum aşılama: 259.0\n",
"Mexico ülkesindeki minimum aşılama: 1300.0\n",
"Netherlands ülkesindeki minimum aşılama: 4000.0\n",
"Northern Ireland ülkesindeki minimum aşılama: 1563.0\n",
"Norway ülkesindeki minimum aşılama: 224.0\n",
"Oman ülkesindeki minimum aşılama: 817.0\n",
"Panama ülkesindeki minimum aşılama: 6.0\n",
"Poland ülkesindeki minimum aşılama: 4300.0\n",
"Portugal ülkesindeki minimum aşılama: 2791.0\n",
"Romania ülkesindeki minimum aşılama: 712.0\n",
"Russia ülkesindeki minimum aşılama: 3357.0\n",
"Saudi Arabia ülkesindeki minimum aşılama: 14153.0\n",
"Scotland ülkesindeki minimum aşılama: 3047.0\n",
"Serbia ülkesindeki minimum aşılama: 1150.0\n",
"Seychelles ülkesindeki minimum aşılama: 500.0\n",
"Singapore ülkesindeki minimum aşılama: 2800.0\n",
"Slovakia ülkesindeki minimum aşılama: 425.0\n",
"Slovenia ülkesindeki minimum aşılama: 1270.0\n",
"Spain ülkesindeki minimum aşılama: 41496.0\n",
"Sweden ülkesindeki minimum aşılama: 3859.0\n",
"Switzerland ülkesindeki minimum aşılama: 3000.0\n",
"Turkey ülkesindeki minimum aşılama: 67192.0\n",
"United Arab Emirates ülkesindeki minimum aşılama: 30698.0\n",
"United Kingdom ülkesindeki minimum aşılama: 46423.0\n",
"United States ülkesindeki minimum aşılama: 57909.0\n",
"Wales ülkesindeki minimum aşılama: 1763.0\n"
]
}
],
"source": [
"for c in countries:\n",
" min_vac = data[data['country']==c].daily_vaccinations.min()\n",
" if np.isnan(min_vac):\n",
" print(c, \" ülkesinde her hangi bir aşılama olmamıştır..\")\n",
" values = {'daily_vaccinations': 0}\n",
" else:\n",
" print(c, \"ülkesindeki minimum aşılama: \", min_vac)\n",
" values = {'daily_vaccinations': min_vac}\n",
" data[data['country'] == c] = data[data['country'] == c].fillna(value=values)\n"
]
},
{
"cell_type": "code",
"execution_count": 129,
"id": "occasional-shoulder",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1502 entries, 0 to 1501\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 country 1502 non-null object \n",
" 1 date 1502 non-null object \n",
" 2 daily_vaccinations 1502 non-null float64\n",
" 3 vaccines 1502 non-null object \n",
"dtypes: float64(1), object(3)\n",
"memory usage: 47.1+ KB\n"
]
}
],
"source": [
"data.info()"
]
},
{
"cell_type": "markdown",
"id": "surprising-swimming",
"metadata": {},
"source": [
"### Question 2\n",
"\n",
"Code Implementation Task: Implement code to list the top-3 countries with highest median daily vaccination numbers by considering missing values imputed version of dataset."
]
},
{
"cell_type": "code",
"execution_count": 320,
"id": "proper-organic",
"metadata": {},
"outputs": [],
"source": [
"first_three_median = {}\n",
"for c in countries[0:3]:\n",
" first_three_median[data[data['country']==c].median()[0]]=c"
]
},
{
"cell_type": "code",
"execution_count": 321,
"id": "collectible-hybrid",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{11645.0: 'Argentina', 7850.0: 'Austria', 3662.0: 'Bahrain'}"
]
},
"execution_count": 321,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_three_median"
]
},
{
"cell_type": "code",
"execution_count": 325,
"id": "primary-qatar",
"metadata": {},
"outputs": [],
"source": [
"for c in countries[3:]:\n",
" if data[data['country']==c].median()[0] > min(first_three_median.keys()):\n",
" first_three_median.pop(min(first_three_median.keys()))\n",
" first_three_median[data[data['country']==c].median()[0]]=c"
]
},
{
"cell_type": "code",
"execution_count": 326,
"id": "departmental-loading",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{276786.0: 'China', 173922.0: 'India', 378253.0: 'United States'}"
]
},
"execution_count": 326,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_three_median"
]
},
{
"cell_type": "markdown",
"id": "middle-portfolio",
"metadata": {},
"source": [
"### Question 3\n",
"\n",
"What is the number of total vaccinations done on 1/6/2021 (MM/DD/YYYY) by considering missing values imputed version of dataset?"
]
},
{
"cell_type": "code",
"execution_count": 277,
"id": "thrown-armor",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8 8173.0\n",
"29 3368.0\n",
"62 1594.0\n",
"86 736.0\n",
"121 472.0\n",
"164 14887.0\n",
"197 272.0\n",
"239 455357.0\n",
"267 763.0\n",
"287 1334.0\n",
"307 534.0\n",
"334 2439.0\n",
"364 8525.0\n",
"403 98984.0\n",
"431 589.0\n",
"457 839.0\n",
"479 8500.0\n",
"509 42015.0\n",
"552 2680.0\n",
"581 3959.0\n",
"607 63.0\n",
"658 1624.0\n",
"700 114634.0\n",
"730 45417.0\n",
"760 338.0\n",
"792 592.0\n",
"819 62.0\n",
"847 304.0\n",
"879 4027.0\n",
"899 4000.0\n",
"940 3954.0\n",
"969 1772.0\n",
"997 1280.0\n",
"1031 17723.0\n",
"1060 4416.0\n",
"1088 7555.0\n",
"1130 39532.0\n",
"1138 14153.0\n",
"1174 4858.0\n",
"1240 3309.0\n",
"1259 2110.0\n",
"1280 45248.0\n",
"1310 5165.0\n",
"1336 3000.0\n",
"1366 30698.0\n",
"1403 111155.0\n",
"1439 358887.0\n",
"1483 3359.0\n",
"Name: daily_vaccinations, dtype: float64"
]
},
"execution_count": 277,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[data['date']==\"1/6/2021\"].daily_vaccinations"
]
},
{
"cell_type": "code",
"execution_count": 279,
"id": "manual-showcase",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total Vac: 1485255.0\n"
]
}
],
"source": [
"print(\"Total Vac: \",data[data['date']==\"1/6/2021\"].daily_vaccinations.sum())"
]
},
{
"cell_type": "markdown",
"id": "legal-priority",
"metadata": {},
"source": [
"### Question 4\n",
"\n",
"Code Implementation Task: If this list would be a database table, please provide SQL query to fill in the missing daily vaccination numbers with discrete median of country as similar to question a. \n",
"Please provide the link to your code as answer to this question. \n",
"Note: This time SQL equivalent is requested, and imputation value is median of each country, not minimum. Please remember filling countries with zero if they do not have any valid daily_vaccination records like Kuwait."
]
},
{
"cell_type": "markdown",
"id": "mediterranean-shift",
"metadata": {},
"source": [
"##### SQL\n",
"\n",
"```mysql\n",
" UPDATE table_name\n",
" SET daily_vaccination = 0\n",
" WHERE avg(daily_vaccination) IS NaN\n",
"```\n",
"\n",
"```mysql\n",
" UPDATE table_name\n",
" SET daily_vaccination = median(daily_vaccination)\n",
" WHERE daily_vaccination IS NaN\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "representative-cargo",
"metadata": {},
"source": [
"*Serkan UYSAL*"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.5 64-bit",
"language": "python",
"name": "python38564bit9b13fc8a8c5449f8ad604cc2782fb3a5"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment