Last active
September 7, 2021 12:09
-
-
Save larsyencken/bebc86cc82496dc1ebf4d2617fb49217 to your computer and use it in GitHub Desktop.
OWID: Jupyter notebook to download full data dump of the WHO Global Health Observatory's API
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "1bb8c4a3-c162-40a0-964c-ed9268cee03b", | |
"metadata": {}, | |
"source": [ | |
"# WHO GHO data download" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "c3b359e4-1e8a-4a19-a52b-65983530c7c4", | |
"metadata": {}, | |
"source": [ | |
"_Fetch the full data as a local dump._" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "6261dbbe-f0cc-4333-a03b-8482e405647d", | |
"metadata": {}, | |
"source": [ | |
"## Summary" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b9433b3d-ab59-4883-9dc2-731ddeccf530", | |
"metadata": {}, | |
"source": [ | |
"This notebook walks the GHO API to discover all dimensions and indicators, downloads every indicator into a `who/` folder, then zips it to `who.zip`.\n", | |
"\n", | |
"We also save some metadata:\n", | |
"\n", | |
"- Dimensions: `who/_dimensions.csv`\n", | |
"- Dimension values: `who/_dim_values.csv`\n", | |
"- Indicators: `who/_indicators.csv`\n", | |
"\n", | |
"https://www.who.int/data/gho/info/gho-odata-api" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e455c8fc-333f-45e2-bf21-1e27e1c6ba9d", | |
"metadata": {}, | |
"source": [ | |
"## Preamble" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "f78f4c53-b59b-4319-86c1-1b052e3ecacc", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import requests" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "7b0f3a03-7e94-462b-a668-bf89d30df4c0", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from glob import glob\n", | |
"from os import path" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "ecb094ed-a3ad-4b87-8103-2a670933736e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"pd.set_option('display.max_rows', 50)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "2967ef48-95eb-422f-a2c0-6959ad0e16c8", | |
"metadata": {}, | |
"source": [ | |
"## Dimensions" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "fa4d22cb-f226-4d3a-b98e-38649ac6e0eb", | |
"metadata": {}, | |
"source": [ | |
"A dimension for the WHO is a domain for variables (e.g. `COUNTRY`)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "372a8ed3-3ddc-4ca1-adaf-9b95c172847c", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Title</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>Code</th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>ADVERTISINGTYPE</th>\n", | |
" <td>SUBSTANCE_ABUSE_ADVERTISING_TYPES</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>AGEGROUP</th>\n", | |
" <td>Age Group</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>ALCOHOLTYPE</th>\n", | |
" <td>Beverage Types</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>AMRGLASSCATEGORY</th>\n", | |
" <td>AMR GLASS Category</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>ARCHIVE</th>\n", | |
" <td>Archive date</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>AWARENESSACTIVITYTYPE</th>\n", | |
" <td>SUBSTANCE_ABUSE_AWARENESS_ACTIVITY_TYPES</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>BACGROUP</th>\n", | |
" <td>SUBSTANCE_ABUSE_BAC_GROUPS</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>BEVERAGETYPE</th>\n", | |
" <td>SUBSTANCE_ABUSE_BEVERAGE_TYPES</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>CAREPATIENT</th>\n", | |
" <td>Patient type</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>CARESECTOR</th>\n", | |
" <td>Care sector</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Title\n", | |
"Code \n", | |
"ADVERTISINGTYPE SUBSTANCE_ABUSE_ADVERTISING_TYPES\n", | |
"AGEGROUP Age Group\n", | |
"ALCOHOLTYPE Beverage Types\n", | |
"AMRGLASSCATEGORY AMR GLASS Category\n", | |
"ARCHIVE Archive date\n", | |
"AWARENESSACTIVITYTYPE SUBSTANCE_ABUSE_AWARENESS_ACTIVITY_TYPES\n", | |
"BACGROUP SUBSTANCE_ABUSE_BAC_GROUPS\n", | |
"BEVERAGETYPE SUBSTANCE_ABUSE_BEVERAGE_TYPES\n", | |
"CAREPATIENT Patient type\n", | |
"CARESECTOR Care sector" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dims_url = \"https://ghoapi.azureedge.net/api/Dimension\"\n", | |
"\n", | |
"resp = requests.get(dims_url)\n", | |
"dims_json = resp.json()\n", | |
"\n", | |
"dims = pd.DataFrame.from_records(dims_json['value']).set_index('Code')\n", | |
"dims.head(10)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3836bd09-fd10-4c17-9971-464aecb1393f", | |
"metadata": {}, | |
"source": [ | |
"Can we get the values for each dimension?" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b50091df-0b79-4bdd-9797-360905ff6158", | |
"metadata": {}, | |
"source": [ | |
"## Dimension values" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "0476a71c-c661-489a-b509-6ece277d7224", | |
"metadata": {}, | |
"source": [ | |
"A dimension value is a particular value for a dimension. E.g. \"Australia\" for \"COUNTRY\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "3b26e7f8-308f-4b31-acf2-f17e659d5900", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dim_url = 'https://ghoapi.azureedge.net/api/DIMENSION/{code}/DimensionValues'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "1f739d9f-2826-49ff-908d-c26a41f25a94", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dim_values_frames = []\n", | |
"for code in dims.index:\n", | |
" url = dim_url.format(code=code)\n", | |
" value_json = requests.get(url).json()['value']\n", | |
" value_df = pd.DataFrame.from_records(value_json)\n", | |
" dim_values_frames.append(value_df)\n", | |
" \n", | |
"dim_values = pd.concat(dim_values_frames)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "fc06d73a-fd3a-4e75-b8b1-fa184efbf178", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Code</th>\n", | |
" <th>Title</th>\n", | |
" <th>Dimension</th>\n", | |
" <th>ParentDimension</th>\n", | |
" <th>ParentCode</th>\n", | |
" <th>ParentTitle</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>SA_BEER_ADS</td>\n", | |
" <td>Beer Ads</td>\n", | |
" <td>ADVERTISINGTYPE</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>SA_SPIRITS_ADS</td>\n", | |
" <td>Spirits Ads</td>\n", | |
" <td>ADVERTISINGTYPE</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>SA_WINE_ADS</td>\n", | |
" <td>Wine Ads</td>\n", | |
" <td>ADVERTISINGTYPE</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>AGE0-29</td>\n", | |
" <td>0-29 years</td>\n", | |
" <td>AGEGROUP</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>AGE0-4</td>\n", | |
" <td>0-4 years</td>\n", | |
" <td>AGEGROUP</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Code Title Dimension ParentDimension ParentCode \\\n", | |
"0 SA_BEER_ADS Beer Ads ADVERTISINGTYPE None None \n", | |
"1 SA_SPIRITS_ADS Spirits Ads ADVERTISINGTYPE None None \n", | |
"2 SA_WINE_ADS Wine Ads ADVERTISINGTYPE None None \n", | |
"0 AGE0-29 0-29 years AGEGROUP None None \n", | |
"1 AGE0-4 0-4 years AGEGROUP None None \n", | |
"\n", | |
" ParentTitle \n", | |
"0 None \n", | |
"1 None \n", | |
"2 None \n", | |
"0 None \n", | |
"1 None " | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dim_values.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "2151b60c-c7fc-4861-bdfb-33d2bbcd3478", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"12703" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(dim_values)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"id": "1f4bc0d7-acef-4631-81db-23a1f6638c71", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Dimension\n", | |
"ADVERTISINGTYPE 3\n", | |
"AGEGROUP 80\n", | |
"ALCOHOLTYPE 5\n", | |
"AMRGLASSCATEGORY 25\n", | |
"ARCHIVE 1\n", | |
" ... \n", | |
"WEALTHDECILE 10\n", | |
"WEALTHQUINTILE 5\n", | |
"WHOINCOMEREGION 11\n", | |
"WORLDBANKINCOMEGROUP 8\n", | |
"YEAR 275\n", | |
"Length: 85, dtype: int64" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dim_values.groupby('Dimension').size()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "dd0a0682-b86c-4019-8dfd-0877570c8202", | |
"metadata": {}, | |
"source": [ | |
"The `GHO` dimension contains all the indicators." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "13fbbcfe-23df-4b47-9487-bda2b623e24c", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Code</th>\n", | |
" <th>Title</th>\n", | |
" <th>Dimension</th>\n", | |
" <th>ParentDimension</th>\n", | |
" <th>ParentCode</th>\n", | |
" <th>ParentTitle</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>Adult_curr_cig_smoking</td>\n", | |
" <td>Prevalence of current cigarette smoking among ...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Adult_curr_e-cig</td>\n", | |
" <td>Prevalence of current e-cigarette use among ad...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Adult_curr_smokeless</td>\n", | |
" <td>Prevalence of current smokeless tobacco use am...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>Adult_curr_tob_smoking</td>\n", | |
" <td>Prevalence of current tobacco smoking among ad...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>Adult_curr_tob_use</td>\n", | |
" <td>Prevalence of current tobacco use among adults...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>Adult_daily_cig_smoking</td>\n", | |
" <td>Prevalence of daily cigarette smoking among ad...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>Adult_daily_e-cig</td>\n", | |
" <td>Prevalence of daily e-cigarette use among adul...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>Adult_daily_smokeless</td>\n", | |
" <td>Prevalence of daily smokeless tobacco use amon...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>Adult_daily_tob_smoking</td>\n", | |
" <td>Prevalence of daily tobacco smoking among adul...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>Adult_daily_tob_use</td>\n", | |
" <td>Prevalence of daily tobacco use among adults (%)</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>Adult_nonsmoked_age_grp</td>\n", | |
" <td>Age range of latest adult prevalence survey (n...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>Adult_nonsmoked_national</td>\n", | |
" <td>Representation of latest adult prevalence surv...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>Adult_nonsmoked_svy_title</td>\n", | |
" <td>Name of latest adult prevalence survey (nonsmo...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>Adult_nonsmoked_svy_yr</td>\n", | |
" <td>Year of latest adult prevalence survey (nonsmo...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>Adult_survey_title</td>\n", | |
" <td>Title of most recent survey of adults</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>Adult_svy_age_range</td>\n", | |
" <td>Age range of most recent survey of adults</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>Adult_svy_national</td>\n", | |
" <td>Representativeness of most recent survey of ad...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>Adult_svy_yr</td>\n", | |
" <td>Year of most recent survey of adults</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>AIR_1</td>\n", | |
" <td>Ambient air pollution attributable deaths</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>AIR_10</td>\n", | |
" <td>Ambient air pollution attributable DALYs per ...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Code \\\n", | |
"0 Adult_curr_cig_smoking \n", | |
"1 Adult_curr_e-cig \n", | |
"2 Adult_curr_smokeless \n", | |
"3 Adult_curr_tob_smoking \n", | |
"4 Adult_curr_tob_use \n", | |
"5 Adult_daily_cig_smoking \n", | |
"6 Adult_daily_e-cig \n", | |
"7 Adult_daily_smokeless \n", | |
"8 Adult_daily_tob_smoking \n", | |
"9 Adult_daily_tob_use \n", | |
"10 Adult_nonsmoked_age_grp \n", | |
"11 Adult_nonsmoked_national \n", | |
"12 Adult_nonsmoked_svy_title \n", | |
"13 Adult_nonsmoked_svy_yr \n", | |
"14 Adult_survey_title \n", | |
"15 Adult_svy_age_range \n", | |
"16 Adult_svy_national \n", | |
"17 Adult_svy_yr \n", | |
"18 AIR_1 \n", | |
"19 AIR_10 \n", | |
"\n", | |
" Title Dimension \\\n", | |
"0 Prevalence of current cigarette smoking among ... GHO \n", | |
"1 Prevalence of current e-cigarette use among ad... GHO \n", | |
"2 Prevalence of current smokeless tobacco use am... GHO \n", | |
"3 Prevalence of current tobacco smoking among ad... GHO \n", | |
"4 Prevalence of current tobacco use among adults... GHO \n", | |
"5 Prevalence of daily cigarette smoking among ad... GHO \n", | |
"6 Prevalence of daily e-cigarette use among adul... GHO \n", | |
"7 Prevalence of daily smokeless tobacco use amon... GHO \n", | |
"8 Prevalence of daily tobacco smoking among adul... GHO \n", | |
"9 Prevalence of daily tobacco use among adults (%) GHO \n", | |
"10 Age range of latest adult prevalence survey (n... GHO \n", | |
"11 Representation of latest adult prevalence surv... GHO \n", | |
"12 Name of latest adult prevalence survey (nonsmo... GHO \n", | |
"13 Year of latest adult prevalence survey (nonsmo... GHO \n", | |
"14 Title of most recent survey of adults GHO \n", | |
"15 Age range of most recent survey of adults GHO \n", | |
"16 Representativeness of most recent survey of ad... GHO \n", | |
"17 Year of most recent survey of adults GHO \n", | |
"18 Ambient air pollution attributable deaths GHO \n", | |
"19 Ambient air pollution attributable DALYs per ... GHO \n", | |
"\n", | |
" ParentDimension ParentCode ParentTitle \n", | |
"0 None None None \n", | |
"1 None None None \n", | |
"2 None None None \n", | |
"3 None None None \n", | |
"4 None None None \n", | |
"5 None None None \n", | |
"6 None None None \n", | |
"7 None None None \n", | |
"8 None None None \n", | |
"9 None None None \n", | |
"10 None None None \n", | |
"11 None None None \n", | |
"12 None None None \n", | |
"13 None None None \n", | |
"14 None None None \n", | |
"15 None None None \n", | |
"16 None None None \n", | |
"17 None None None \n", | |
"18 None None None \n", | |
"19 None None None " | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dim_values[dim_values.Dimension == 'GHO'].head(20)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b6bd764e-c2c4-4e07-92da-8c7c50c3f54d", | |
"metadata": {}, | |
"source": [ | |
"Hypothesis: it looks like the values for the the `GHO` dimension are the indicator codes." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "d2beee42-3b03-4b7f-8b5e-4616fa8f8fcf", | |
"metadata": {}, | |
"source": [ | |
"## Indicators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "7c584598-6c80-4d9b-bac6-20b2a85ae181", | |
"metadata": {}, | |
"source": [ | |
"Indicators are our metrics. Let's fetch the whole list of them." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"id": "337ec7c0-d892-44f7-a947-a4f4e937f94b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"indicator_url = 'https://ghoapi.azureedge.net/api/Indicator'\n", | |
"\n", | |
"ind_json = requests.get(indicator_url).json()['value']\n", | |
"ind = pd.DataFrame.from_records(ind_json).set_index('IndicatorCode')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"id": "640e899b-2ad6-48ea-9225-8e78ed44cf50", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>IndicatorName</th>\n", | |
" <th>Language</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>IndicatorCode</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>AIR_1</th>\n", | |
" <td>Ambient air pollution attributable deaths</td>\n", | |
" <td>EN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>AIR_10</th>\n", | |
" <td>Ambient air pollution attributable DALYs per ...</td>\n", | |
" <td>EN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>AIR_11</th>\n", | |
" <td>Household air pollution attributable deaths</td>\n", | |
" <td>EN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>AIR_12</th>\n", | |
" <td>Household air pollution attributable deaths in...</td>\n", | |
" <td>EN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>AIR_13</th>\n", | |
" <td>Household air pollution attributable deaths pe...</td>\n", | |
" <td>EN</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" IndicatorName Language\n", | |
"IndicatorCode \n", | |
"AIR_1 Ambient air pollution attributable deaths EN\n", | |
"AIR_10 Ambient air pollution attributable DALYs per ... EN\n", | |
"AIR_11 Household air pollution attributable deaths EN\n", | |
"AIR_12 Household air pollution attributable deaths in... EN\n", | |
"AIR_13 Household air pollution attributable deaths pe... EN" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ind.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "44952502-ca98-43f9-85ce-e6d711ee93a3", | |
"metadata": {}, | |
"source": [ | |
"Do we have the same number here as listed in `dim_values` under `GHO`?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "a4d10405-86be-45a6-ab9e-a110e2726867", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"2366" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(ind)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"id": "eeb4eae2-d2b9-438d-8526-472041c8a3bd", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"2380" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(dim_values[dim_values.Dimension == 'GHO'])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"id": "061f8341-6866-4283-b3bd-768e2bd9543a", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"bonus_14 = set(\n", | |
" # the codes we get from one api\n", | |
" dim_values[dim_values.Dimension == 'GHO'].Code\n", | |
").difference(\n", | |
" # the codes from the indicators api\n", | |
" set(ind.index)\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"id": "e9ee835f-5a21-47b3-bd0b-15e4f3916bbe", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>Code</th>\n", | |
" <th>Title</th>\n", | |
" <th>Dimension</th>\n", | |
" <th>ParentDimension</th>\n", | |
" <th>ParentCode</th>\n", | |
" <th>ParentTitle</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1040</th>\n", | |
" <td>NLIS_NU_CA_010</td>\n", | |
" <td>Median urinary iodine concentration in childre...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1041</th>\n", | |
" <td>NLIS_NU_CA_017</td>\n", | |
" <td>NLIS: Population using improved drinking-water...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1042</th>\n", | |
" <td>NLIS_NU_CA_018</td>\n", | |
" <td>NLIS: Population using improved sanitation fac...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1043</th>\n", | |
" <td>NLIS_NU_CA_023</td>\n", | |
" <td>Nutrition Governance score</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1044</th>\n", | |
" <td>NLIS_NU_CA_044</td>\n", | |
" <td>International Code on Marketing of Breast-milk...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1045</th>\n", | |
" <td>NLIS_NU_CA_046</td>\n", | |
" <td>GDP per capita annual growth rate (%)</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1046</th>\n", | |
" <td>NLIS_NU_CA_048</td>\n", | |
" <td>Any antenatal iron supplementation (%)</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1047</th>\n", | |
" <td>NLIS_NU_CA_060</td>\n", | |
" <td>NLIS: Children aged 6-59 months receiving vita...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1048</th>\n", | |
" <td>NLIS_NU_CA_061</td>\n", | |
" <td>NLIS: Children aged 6-59 months receiving vita...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1049</th>\n", | |
" <td>NLIS_NU_CA_078</td>\n", | |
" <td>Subclinical vitamin A deficiency in preschool-...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1050</th>\n", | |
" <td>NLIS_NU_CA_079</td>\n", | |
" <td>Clinical vitamin A deficiency in women (histor...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1051</th>\n", | |
" <td>NLIS_NU_CA_084</td>\n", | |
" <td>Population below the international poverty lin...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1052</th>\n", | |
" <td>NLIS_NU_CA_106</td>\n", | |
" <td>Births in baby-friendly facilities (%)</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1053</th>\n", | |
" <td>NLIS_NU_CA_108</td>\n", | |
" <td>Maternity protection: Compliance with internat...</td>\n", | |
" <td>GHO</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" Code Title \\\n", | |
"1040 NLIS_NU_CA_010 Median urinary iodine concentration in childre... \n", | |
"1041 NLIS_NU_CA_017 NLIS: Population using improved drinking-water... \n", | |
"1042 NLIS_NU_CA_018 NLIS: Population using improved sanitation fac... \n", | |
"1043 NLIS_NU_CA_023 Nutrition Governance score \n", | |
"1044 NLIS_NU_CA_044 International Code on Marketing of Breast-milk... \n", | |
"1045 NLIS_NU_CA_046 GDP per capita annual growth rate (%) \n", | |
"1046 NLIS_NU_CA_048 Any antenatal iron supplementation (%) \n", | |
"1047 NLIS_NU_CA_060 NLIS: Children aged 6-59 months receiving vita... \n", | |
"1048 NLIS_NU_CA_061 NLIS: Children aged 6-59 months receiving vita... \n", | |
"1049 NLIS_NU_CA_078 Subclinical vitamin A deficiency in preschool-... \n", | |
"1050 NLIS_NU_CA_079 Clinical vitamin A deficiency in women (histor... \n", | |
"1051 NLIS_NU_CA_084 Population below the international poverty lin... \n", | |
"1052 NLIS_NU_CA_106 Births in baby-friendly facilities (%) \n", | |
"1053 NLIS_NU_CA_108 Maternity protection: Compliance with internat... \n", | |
"\n", | |
" Dimension ParentDimension ParentCode ParentTitle \n", | |
"1040 GHO None None None \n", | |
"1041 GHO None None None \n", | |
"1042 GHO None None None \n", | |
"1043 GHO None None None \n", | |
"1044 GHO None None None \n", | |
"1045 GHO None None None \n", | |
"1046 GHO None None None \n", | |
"1047 GHO None None None \n", | |
"1048 GHO None None None \n", | |
"1049 GHO None None None \n", | |
"1050 GHO None None None \n", | |
"1051 GHO None None None \n", | |
"1052 GHO None None None \n", | |
"1053 GHO None None None " | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dim_values[dim_values.Code.isin(bonus_14)]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "05e975c5-fcaf-4591-92ff-97a519fe6abe", | |
"metadata": {}, | |
"source": [ | |
"Looks totally random.\n", | |
"\n", | |
"Hypothesis: these are indicators they don't have data for." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "d8283ec6-c22a-4a17-9f87-7447a1548e7a", | |
"metadata": {}, | |
"source": [ | |
"## Dump data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "01177e24-6ff2-46ae-af40-1bad10b5095a", | |
"metadata": {}, | |
"source": [ | |
"For each indicator, fetch all the data and save it to CSV." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"id": "fcff1dbf-2214-4f72-8547-552b98576482", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"data_url = 'https://ghoapi.azureedge.net/api/{code}'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"id": "21c99f28-28a9-442d-bc3c-932180aae09e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"!mkdir -p who_gho" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "71fd05a3-4b0a-4eb5-aaa3-ae8a5ea9b3e4", | |
"metadata": {}, | |
"source": [ | |
"### Indicators" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"id": "9e748355-356f-4214-8066-90efd4579f11", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"url = data_url.format(code='WHOSIS_000001')\n", | |
"\n", | |
"def get_ind_data(code):\n", | |
" url = data_url.format(code=code)\n", | |
" data_json = requests.get(url).json()\n", | |
" data_df = pd.DataFrame.from_records(data_json['value']).set_index('Id')\n", | |
" return data_df" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"id": "ff654ec2-15bd-4416-8dd1-c7381269c780", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................\n" | |
] | |
} | |
], | |
"source": [ | |
"for ind_code in ind.index:\n", | |
" print('.', end='', flush=True)\n", | |
" data = get_ind_data(ind_code)\n", | |
" data.to_csv(f'who_gho/{ind_code}.csv')\n", | |
"print()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"id": "e8cbf424-7241-45d4-a1b8-d4255480104a", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1.3G\twho_gho\n" | |
] | |
} | |
], | |
"source": [ | |
"!du -hs who_gho" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "d76329c2-1f65-4a65-ab45-e78c85205f5a", | |
"metadata": {}, | |
"source": [ | |
"### Metadata" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"id": "a720049c-09f5-4ff6-855d-4c470703950e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"ind.to_csv('who_gho/_indicators.csv')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"id": "b2b14b4c-f932-4617-8496-f219eb019d6b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dims.to_csv('who_gho/_dimensions.csv')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"id": "106edff4-8ad9-4dc5-b677-da3fc3949115", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dim_values.to_csv('who_gho/_dim_values.csv')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"id": "8d814ac5-6e1f-4af9-8821-51664c2b138c", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"!zip -q -r -9 who_gho.zip who_gho" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.9.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment