Skip to content

Instantly share code, notes, and snippets.

@pletchm
Last active August 25, 2024 12:58
Show Gist options
  • Save pletchm/f1302a61d81285838d6e4255d460b16f to your computer and use it in GitHub Desktop.
Save pletchm/f1302a61d81285838d6e4255d460b16f to your computer and use it in GitHub Desktop.
Xarray Example Cheatsheet
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cheatsheet Outline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Common Xarray-related Imports\n",
"2. Creating Xarray DataArrays\n",
"3. Converting from Xarray to Pandas\n",
"4. Converting from Pandas to Xarray\n",
"5. Reading and writing Xarrays to netCDF files\n",
"6. Slicing and dicing data\n",
"7. Changing values\n",
"8. Data Reduction\n",
"9. Vectorized operations\n",
"10. Changing and adding coordinates/Expanding or broadcasting dimensions\n",
"11. Datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Common Xarray-related Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import xarray as xr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Creating Xarray DataArrays\n",
"There are two kinds of data structures in Xarray: DataArrays and Datasets. We'll start with DataArrays, because Datasets are actually just a collection of DataArrays. Also Datasets are less commonly useful. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"da = xr.DataArray(\n",
" data=np.random.random([2, 3, 11, 100]),\n",
" dims=[\"sex_id\", \"age_group_id\", \"year_id\", \"draw\"],\n",
" coords={\n",
" \"sex_id\": [1, 2],\n",
" \"age_group_id\": [11, 12, 13],\n",
" \"year_id\": range(1990, 2000+1),\n",
" \"draw\": range(100),\n",
" },\n",
" name=\"fake_thing\"\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
"array([[[[0.996779, ..., 0.193241],\n",
" ...,\n",
" [0.50371 , ..., 0.654273]],\n",
"\n",
" ...,\n",
"\n",
" [[0.227811, ..., 0.912856],\n",
" ...,\n",
" [0.24642 , ..., 0.581184]]],\n",
"\n",
"\n",
" [[[0.072334, ..., 0.684663],\n",
" ...,\n",
" [0.628984, ..., 0.358811]],\n",
"\n",
" ...,\n",
"\n",
" [[0.491698, ..., 0.876439],\n",
" ...,\n",
" [0.829525, ..., 0.611719]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Converting from Xarray to Pandas"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age_group_id</th>\n",
" <th>sex_id</th>\n",
" <th>year_id</th>\n",
" <th>draw_0</th>\n",
" <th>draw_1</th>\n",
" <th>draw_2</th>\n",
" <th>draw_3</th>\n",
" <th>draw_4</th>\n",
" <th>draw_5</th>\n",
" <th>draw_6</th>\n",
" <th>...</th>\n",
" <th>draw_90</th>\n",
" <th>draw_91</th>\n",
" <th>draw_92</th>\n",
" <th>draw_93</th>\n",
" <th>draw_94</th>\n",
" <th>draw_95</th>\n",
" <th>draw_96</th>\n",
" <th>draw_97</th>\n",
" <th>draw_98</th>\n",
" <th>draw_99</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1990</td>\n",
" <td>0.996779</td>\n",
" <td>0.761632</td>\n",
" <td>0.350849</td>\n",
" <td>0.750393</td>\n",
" <td>0.433888</td>\n",
" <td>0.764425</td>\n",
" <td>0.122375</td>\n",
" <td>...</td>\n",
" <td>0.225819</td>\n",
" <td>0.836557</td>\n",
" <td>0.885162</td>\n",
" <td>0.222884</td>\n",
" <td>0.641429</td>\n",
" <td>0.393851</td>\n",
" <td>0.381577</td>\n",
" <td>0.294711</td>\n",
" <td>0.650573</td>\n",
" <td>0.193241</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1991</td>\n",
" <td>0.786725</td>\n",
" <td>0.014690</td>\n",
" <td>0.184935</td>\n",
" <td>0.269309</td>\n",
" <td>0.493112</td>\n",
" <td>0.365666</td>\n",
" <td>0.573797</td>\n",
" <td>...</td>\n",
" <td>0.196058</td>\n",
" <td>0.190651</td>\n",
" <td>0.266525</td>\n",
" <td>0.453888</td>\n",
" <td>0.333859</td>\n",
" <td>0.377547</td>\n",
" <td>0.304548</td>\n",
" <td>0.035076</td>\n",
" <td>0.905141</td>\n",
" <td>0.262088</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1992</td>\n",
" <td>0.851437</td>\n",
" <td>0.367362</td>\n",
" <td>0.736778</td>\n",
" <td>0.500674</td>\n",
" <td>0.885498</td>\n",
" <td>0.350236</td>\n",
" <td>0.837336</td>\n",
" <td>...</td>\n",
" <td>0.526494</td>\n",
" <td>0.398270</td>\n",
" <td>0.609992</td>\n",
" <td>0.480893</td>\n",
" <td>0.261509</td>\n",
" <td>0.537468</td>\n",
" <td>0.326550</td>\n",
" <td>0.393128</td>\n",
" <td>0.236991</td>\n",
" <td>0.239981</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1993</td>\n",
" <td>0.431025</td>\n",
" <td>0.786596</td>\n",
" <td>0.385705</td>\n",
" <td>0.140987</td>\n",
" <td>0.742205</td>\n",
" <td>0.380742</td>\n",
" <td>0.247266</td>\n",
" <td>...</td>\n",
" <td>0.811025</td>\n",
" <td>0.964106</td>\n",
" <td>0.484327</td>\n",
" <td>0.387248</td>\n",
" <td>0.862704</td>\n",
" <td>0.320871</td>\n",
" <td>0.288251</td>\n",
" <td>0.752603</td>\n",
" <td>0.482269</td>\n",
" <td>0.423913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1994</td>\n",
" <td>0.715613</td>\n",
" <td>0.937093</td>\n",
" <td>0.276558</td>\n",
" <td>0.155267</td>\n",
" <td>0.892415</td>\n",
" <td>0.782576</td>\n",
" <td>0.620654</td>\n",
" <td>...</td>\n",
" <td>0.038820</td>\n",
" <td>0.025020</td>\n",
" <td>0.422900</td>\n",
" <td>0.139842</td>\n",
" <td>0.229250</td>\n",
" <td>0.092306</td>\n",
" <td>0.262763</td>\n",
" <td>0.009972</td>\n",
" <td>0.457518</td>\n",
" <td>0.653466</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 103 columns</p>\n",
"</div>"
],
"text/plain": [
" age_group_id sex_id year_id draw_0 draw_1 draw_2 draw_3 \\\n",
"0 11 1 1990 0.996779 0.761632 0.350849 0.750393 \n",
"1 11 1 1991 0.786725 0.014690 0.184935 0.269309 \n",
"2 11 1 1992 0.851437 0.367362 0.736778 0.500674 \n",
"3 11 1 1993 0.431025 0.786596 0.385705 0.140987 \n",
"4 11 1 1994 0.715613 0.937093 0.276558 0.155267 \n",
"\n",
" draw_4 draw_5 draw_6 ... draw_90 draw_91 draw_92 draw_93 \\\n",
"0 0.433888 0.764425 0.122375 ... 0.225819 0.836557 0.885162 0.222884 \n",
"1 0.493112 0.365666 0.573797 ... 0.196058 0.190651 0.266525 0.453888 \n",
"2 0.885498 0.350236 0.837336 ... 0.526494 0.398270 0.609992 0.480893 \n",
"3 0.742205 0.380742 0.247266 ... 0.811025 0.964106 0.484327 0.387248 \n",
"4 0.892415 0.782576 0.620654 ... 0.038820 0.025020 0.422900 0.139842 \n",
"\n",
" draw_94 draw_95 draw_96 draw_97 draw_98 draw_99 \n",
"0 0.641429 0.393851 0.381577 0.294711 0.650573 0.193241 \n",
"1 0.333859 0.377547 0.304548 0.035076 0.905141 0.262088 \n",
"2 0.261509 0.537468 0.326550 0.393128 0.236991 0.239981 \n",
"3 0.862704 0.320871 0.288251 0.752603 0.482269 0.423913 \n",
"4 0.229250 0.092306 0.262763 0.009972 0.457518 0.653466 \n",
"\n",
"[5 rows x 103 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"draw_cols = [\"draw_{}\".format(i) for i in range(100)]\n",
"da_draw_dim = da.assign_coords(draw=draw_cols)\n",
"ds = da_draw_dim.to_dataset(dim=\"draw\")\n",
"df = ds.to_dataframe().reset_index()\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n",
" [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
" 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n",
" [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
" 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
" [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da.mean(\"draw\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sex_id</th>\n",
" <th>age_group_id</th>\n",
" <th>year_id</th>\n",
" <th>mean</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1990</td>\n",
" <td>0.515046</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1991</td>\n",
" <td>0.482924</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1992</td>\n",
" <td>0.519905</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1993</td>\n",
" <td>0.461850</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1994</td>\n",
" <td>0.465647</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sex_id age_group_id year_id mean\n",
"0 1 11 1990 0.515046\n",
"1 1 11 1991 0.482924\n",
"2 1 11 1992 0.519905\n",
"3 1 11 1993 0.461850\n",
"4 1 11 1994 0.465647"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_df = da.mean(\"draw\").rename(\"mean\").to_dataframe().reset_index()\n",
"mean_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Converting from Pandas to Xarray"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'mean' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n",
" [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
" 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n",
" [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
" 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
" [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da = mean_df.set_index([\"sex_id\", \"age_group_id\", \"year_id\"]).to_xarray()[\"mean\"]\n",
"mean_da"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age_group_id</th>\n",
" <th>sex_id</th>\n",
" <th>year_id</th>\n",
" <th>draw_0</th>\n",
" <th>draw_1</th>\n",
" <th>draw_2</th>\n",
" <th>draw_3</th>\n",
" <th>draw_4</th>\n",
" <th>draw_5</th>\n",
" <th>draw_6</th>\n",
" <th>...</th>\n",
" <th>draw_90</th>\n",
" <th>draw_91</th>\n",
" <th>draw_92</th>\n",
" <th>draw_93</th>\n",
" <th>draw_94</th>\n",
" <th>draw_95</th>\n",
" <th>draw_96</th>\n",
" <th>draw_97</th>\n",
" <th>draw_98</th>\n",
" <th>draw_99</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1990</td>\n",
" <td>0.996779</td>\n",
" <td>0.761632</td>\n",
" <td>0.350849</td>\n",
" <td>0.750393</td>\n",
" <td>0.433888</td>\n",
" <td>0.764425</td>\n",
" <td>0.122375</td>\n",
" <td>...</td>\n",
" <td>0.225819</td>\n",
" <td>0.836557</td>\n",
" <td>0.885162</td>\n",
" <td>0.222884</td>\n",
" <td>0.641429</td>\n",
" <td>0.393851</td>\n",
" <td>0.381577</td>\n",
" <td>0.294711</td>\n",
" <td>0.650573</td>\n",
" <td>0.193241</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1991</td>\n",
" <td>0.786725</td>\n",
" <td>0.014690</td>\n",
" <td>0.184935</td>\n",
" <td>0.269309</td>\n",
" <td>0.493112</td>\n",
" <td>0.365666</td>\n",
" <td>0.573797</td>\n",
" <td>...</td>\n",
" <td>0.196058</td>\n",
" <td>0.190651</td>\n",
" <td>0.266525</td>\n",
" <td>0.453888</td>\n",
" <td>0.333859</td>\n",
" <td>0.377547</td>\n",
" <td>0.304548</td>\n",
" <td>0.035076</td>\n",
" <td>0.905141</td>\n",
" <td>0.262088</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1992</td>\n",
" <td>0.851437</td>\n",
" <td>0.367362</td>\n",
" <td>0.736778</td>\n",
" <td>0.500674</td>\n",
" <td>0.885498</td>\n",
" <td>0.350236</td>\n",
" <td>0.837336</td>\n",
" <td>...</td>\n",
" <td>0.526494</td>\n",
" <td>0.398270</td>\n",
" <td>0.609992</td>\n",
" <td>0.480893</td>\n",
" <td>0.261509</td>\n",
" <td>0.537468</td>\n",
" <td>0.326550</td>\n",
" <td>0.393128</td>\n",
" <td>0.236991</td>\n",
" <td>0.239981</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1993</td>\n",
" <td>0.431025</td>\n",
" <td>0.786596</td>\n",
" <td>0.385705</td>\n",
" <td>0.140987</td>\n",
" <td>0.742205</td>\n",
" <td>0.380742</td>\n",
" <td>0.247266</td>\n",
" <td>...</td>\n",
" <td>0.811025</td>\n",
" <td>0.964106</td>\n",
" <td>0.484327</td>\n",
" <td>0.387248</td>\n",
" <td>0.862704</td>\n",
" <td>0.320871</td>\n",
" <td>0.288251</td>\n",
" <td>0.752603</td>\n",
" <td>0.482269</td>\n",
" <td>0.423913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1994</td>\n",
" <td>0.715613</td>\n",
" <td>0.937093</td>\n",
" <td>0.276558</td>\n",
" <td>0.155267</td>\n",
" <td>0.892415</td>\n",
" <td>0.782576</td>\n",
" <td>0.620654</td>\n",
" <td>...</td>\n",
" <td>0.038820</td>\n",
" <td>0.025020</td>\n",
" <td>0.422900</td>\n",
" <td>0.139842</td>\n",
" <td>0.229250</td>\n",
" <td>0.092306</td>\n",
" <td>0.262763</td>\n",
" <td>0.009972</td>\n",
" <td>0.457518</td>\n",
" <td>0.653466</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 103 columns</p>\n",
"</div>"
],
"text/plain": [
" age_group_id sex_id year_id draw_0 draw_1 draw_2 draw_3 \\\n",
"0 11 1 1990 0.996779 0.761632 0.350849 0.750393 \n",
"1 11 1 1991 0.786725 0.014690 0.184935 0.269309 \n",
"2 11 1 1992 0.851437 0.367362 0.736778 0.500674 \n",
"3 11 1 1993 0.431025 0.786596 0.385705 0.140987 \n",
"4 11 1 1994 0.715613 0.937093 0.276558 0.155267 \n",
"\n",
" draw_4 draw_5 draw_6 ... draw_90 draw_91 draw_92 draw_93 \\\n",
"0 0.433888 0.764425 0.122375 ... 0.225819 0.836557 0.885162 0.222884 \n",
"1 0.493112 0.365666 0.573797 ... 0.196058 0.190651 0.266525 0.453888 \n",
"2 0.885498 0.350236 0.837336 ... 0.526494 0.398270 0.609992 0.480893 \n",
"3 0.742205 0.380742 0.247266 ... 0.811025 0.964106 0.484327 0.387248 \n",
"4 0.892415 0.782576 0.620654 ... 0.038820 0.025020 0.422900 0.139842 \n",
"\n",
" draw_94 draw_95 draw_96 draw_97 draw_98 draw_99 \n",
"0 0.641429 0.393851 0.381577 0.294711 0.650573 0.193241 \n",
"1 0.333859 0.377547 0.304548 0.035076 0.905141 0.262088 \n",
"2 0.261509 0.537468 0.326550 0.393128 0.236991 0.239981 \n",
"3 0.862704 0.320871 0.288251 0.752603 0.482269 0.423913 \n",
"4 0.229250 0.092306 0.262763 0.009972 0.457518 0.653466 \n",
"\n",
"[5 rows x 103 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
"array([[[[0.996779, ..., 0.193241],\n",
" ...,\n",
" [0.50371 , ..., 0.654273]],\n",
"\n",
" ...,\n",
"\n",
" [[0.227811, ..., 0.912856],\n",
" ...,\n",
" [0.24642 , ..., 0.581184]]],\n",
"\n",
"\n",
" [[[0.072334, ..., 0.684663],\n",
" ...,\n",
" [0.628984, ..., 0.358811]],\n",
"\n",
" ...,\n",
"\n",
" [[0.491698, ..., 0.876439],\n",
" ...,\n",
" [0.829525, ..., 0.611719]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"back_to_da = pd.wide_to_long(\n",
" df, stubnames=\"draw_\", i=[\"sex_id\", \"age_group_id\", \"year_id\"], j=\"draw\").to_xarray()[\"draw_\"].rename(\"fake_thing\")\n",
"back_to_da"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"back_to_da.identical(da)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 5. Reading and writing Xarrays to netCDF files\n",
"Can include metadata/attributes."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"da.attrs[\"metric\"] = \"rate\"\n",
"da.attrs[\"author\"] = \"Me\"\n",
"da.to_netcdf(\"data.nc\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
"array([[[[0.996779, ..., 0.193241],\n",
" ...,\n",
" [0.50371 , ..., 0.654273]],\n",
"\n",
" ...,\n",
"\n",
" [[0.227811, ..., 0.912856],\n",
" ...,\n",
" [0.24642 , ..., 0.581184]]],\n",
"\n",
"\n",
" [[[0.072334, ..., 0.684663],\n",
" ...,\n",
" [0.628984, ..., 0.358811]],\n",
"\n",
" ...,\n",
"\n",
" [[0.491698, ..., 0.876439],\n",
" ...,\n",
" [0.829525, ..., 0.611719]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
"Attributes:\n",
" metric: rate\n",
" author: Me"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And of course, we use Unix/Linux-based systems here, so file extensions are really just for use humans to quickly determine\n",
"what file type a file claims to be. However, there is restriction/rule/utility from the computer's perspective."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"da.to_netcdf(\"data.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"da.to_netcdf(\"data.nc_martin\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
"array([[[[0.996779, ..., 0.193241],\n",
" ...,\n",
" [0.50371 , ..., 0.654273]],\n",
"\n",
" ...,\n",
"\n",
" [[0.227811, ..., 0.912856],\n",
" ...,\n",
" [0.24642 , ..., 0.581184]]],\n",
"\n",
"\n",
" [[[0.072334, ..., 0.684663],\n",
" ...,\n",
" [0.628984, ..., 0.358811]],\n",
"\n",
" ...,\n",
"\n",
" [[0.491698, ..., 0.876439],\n",
" ...,\n",
" [0.829525, ..., 0.611719]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
"Attributes:\n",
" metric: rate\n",
" author: Me"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"read_da1 = xr.open_dataarray(\"data.nc\")\n",
"read_da1"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
"array([[[[0.996779, ..., 0.193241],\n",
" ...,\n",
" [0.50371 , ..., 0.654273]],\n",
"\n",
" ...,\n",
"\n",
" [[0.227811, ..., 0.912856],\n",
" ...,\n",
" [0.24642 , ..., 0.581184]]],\n",
"\n",
"\n",
" [[[0.072334, ..., 0.684663],\n",
" ...,\n",
" [0.628984, ..., 0.358811]],\n",
"\n",
" ...,\n",
"\n",
" [[0.491698, ..., 0.876439],\n",
" ...,\n",
" [0.829525, ..., 0.611719]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
"Attributes:\n",
" metric: rate\n",
" author: Me"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"read_da2 = xr.open_dataarray(\"data.csv\")\n",
"read_da2"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
"array([[[[0.996779, ..., 0.193241],\n",
" ...,\n",
" [0.50371 , ..., 0.654273]],\n",
"\n",
" ...,\n",
"\n",
" [[0.227811, ..., 0.912856],\n",
" ...,\n",
" [0.24642 , ..., 0.581184]]],\n",
"\n",
"\n",
" [[[0.072334, ..., 0.684663],\n",
" ...,\n",
" [0.628984, ..., 0.358811]],\n",
"\n",
" ...,\n",
"\n",
" [[0.491698, ..., 0.876439],\n",
" ...,\n",
" [0.829525, ..., 0.611719]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
"Attributes:\n",
" metric: rate\n",
" author: Me"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"read_da3 = xr.open_dataarray(\"data.nc_martin\")\n",
"read_da3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I mention this for two reasons:\n",
"\n",
"1) You have to be careful and get used to saving files with the ``.nc`` file extension, if you're super used to saving ``.csv``s or something else\n",
"2) Sometimes we do save netCDFs with other file extensions than just a .nc, for example, for our risk attributable pipeline we save files partitioned over cause, draw and year, so we decided it was useful to name files in the following file name format: ``{acause}.ncdraw:range(100, 200)year_id:2017)``"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 6. Slicing and dicing data"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"mean_da = da.mean(\"draw\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 2)>\n",
"array([[[0.438642, 0.484231]],\n",
"\n",
" [[0.482908, 0.469676]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11\n",
" * year_id (year_id) int64 1995 1996"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(year_id=[1995, 1996], age_group_id=[11])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"point coordinates versus single coord dimensions"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 1, age_group_id: 1, year_id: 11)>\n",
"array([[[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 2\n",
" * age_group_id (age_group_id) int64 11\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(sex_id=[2], age_group_id=[11])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
"array([0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908, 0.469676,\n",
" 0.505307, 0.528676, 0.498105, 0.496416])\n",
"Coordinates:\n",
" sex_id int64 2\n",
" age_group_id int64 11\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(sex_id=2, age_group_id=11)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
"array([0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908, 0.469676,\n",
" 0.505307, 0.528676, 0.498105, 0.496416])\n",
"Coordinates:\n",
" * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(sex_id=2, age_group_id=11, drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** you'll want to avoid the following. "
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n",
"array([[0.438642, 0.484231],\n",
" [0.482908, 0.469676]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
"Dimensions without coordinates: year_id"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(year_id=[1995, 1996], age_group_id=11, drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead do one of these:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n",
"array([[0.438642, 0.484231],\n",
" [0.482908, 0.469676]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * year_id (year_id) int64 1995 1996"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(year_id=[1995, 1996], age_group_id=11).drop(\"age_group_id\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n",
"array([[0.438642, 0.484231],\n",
" [0.482908, 0.469676]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * year_id (year_id) int64 1995 1996"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(year_id=[1995, 1996]).sel(age_group_id=11, drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of slicing to specific coords, you can also exlcude specific coords"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
"array([[[0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
" 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
"\n",
" [[0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.drop([11, 12], \"age_group_id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 7. Changing values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another way to slice data is with the `.loc[]` method but I recommend only doing that when you're actually\n",
"_changing_ values for a given slice of the data, because unlike the `.sel` method, it does not return a deep copy\n",
"-- that is it's still pointing at the original data (unless you save the slice to another variable, BUT\n",
"still don't risk it!!!)."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"mean_da_cp = mean_da.copy()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da_cp.sel(age_group_id=[11])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next operation can't be done:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"ename": "SyntaxError",
"evalue": "can't assign to function call (<ipython-input-29-815bf6d51d25>, line 1)",
"output_type": "error",
"traceback": [
"\u001b[0;36m File \u001b[0;32m\"<ipython-input-29-815bf6d51d25>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m mean_da_cp.sel(age_group_id=[11]) = mean_da_cp.sel(age_group_id=[11]) + 100\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to function call\n"
]
}
],
"source": [
"mean_da_cp.sel(age_group_id=[11]) = mean_da_cp.sel(age_group_id=[11]) + 100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using ``.loc``"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"mean_da_cp.loc[dict(age_group_id=[11])] = mean_da_cp.loc[dict(age_group_id=[11])] + 200"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
"array([[[200.515046, 200.482924, 200.519905, 200.46185 , 200.465647,\n",
" 200.438642, 200.484231, 200.543931, 200.512703, 200.495674,\n",
" 200.490514]],\n",
"\n",
" [[200.480634, 200.501324, 200.48249 , 200.489984, 200.434139,\n",
" 200.482908, 200.469676, 200.505307, 200.528676, 200.498105,\n",
" 200.496416]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da_cp.sel(age_group_id=[11])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 8. Data Reduction\n",
"\n",
"Taking the sum, mean, quantile, or product over one or more dimensions.\n",
"\n",
"Also taking diff, cumprod, cumsum, etc."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Summing over dimensions"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
"array([[[51.504646, 48.292353, 51.99048 , 46.18502 , 46.564747, 43.864225,\n",
" 48.423078, 54.393145, 51.270341, 49.567356, 49.051368],\n",
" [47.937805, 48.381889, 53.578497, 55.738359, 49.353499, 50.416803,\n",
" 48.22734 , 47.17919 , 50.830201, 46.291025, 48.116702],\n",
" [50.16666 , 52.051829, 52.148661, 51.15924 , 55.499361, 52.081173,\n",
" 51.564503, 57.55673 , 48.786959, 47.284275, 54.801283]],\n",
"\n",
" [[48.063404, 50.13239 , 48.248955, 48.998371, 43.413857, 48.290827,\n",
" 46.967581, 50.530699, 52.867631, 49.810484, 49.641557],\n",
" [47.680474, 53.340312, 53.079849, 54.45405 , 51.181648, 48.059735,\n",
" 49.768662, 48.72965 , 47.350284, 47.70728 , 45.04015 ],\n",
" [50.742754, 50.526813, 47.339263, 50.463871, 49.240083, 47.83956 ,\n",
" 45.524688, 45.940104, 55.508125, 50.005635, 52.937063]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da.sum(\"draw\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Taking means"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (age_group_id: 3, year_id: 11)>\n",
"array([[0.49784 , 0.492124, 0.501197, 0.475917, 0.449893, 0.460775, 0.476953,\n",
" 0.524619, 0.52069 , 0.496889, 0.493465],\n",
" [0.478091, 0.508611, 0.533292, 0.550962, 0.502676, 0.492383, 0.48998 ,\n",
" 0.479544, 0.490902, 0.469992, 0.465784],\n",
" [0.504547, 0.512893, 0.49744 , 0.508116, 0.523697, 0.499604, 0.485446,\n",
" 0.517484, 0.521475, 0.48645 , 0.538692]])\n",
"Coordinates:\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da.mean([\"draw\", \"sex_id\"])"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' ()>\n",
"array(3289.684552)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da.sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Taking quantiles"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (quantile: 2, sex_id: 2, age_group_id: 3, year_id: 11)>\n",
"array([[[[0.946091, 0.947006, 0.989351, 0.938826, 0.957233, 0.961054,\n",
" 0.962361, 0.9847 , 0.991939, 0.992401, 0.931928],\n",
" [0.96409 , 0.989095, 0.971059, 0.932266, 0.992066, 0.995731,\n",
" 0.961511, 0.95729 , 0.929298, 0.971658, 0.961813],\n",
" [0.985317, 0.95646 , 0.971972, 0.984797, 0.965313, 0.9662 ,\n",
" 0.967894, 0.966558, 0.983227, 0.975196, 0.969915]],\n",
"\n",
" [[0.968146, 0.976099, 0.977238, 0.909349, 0.948866, 0.948609,\n",
" 0.951318, 0.960428, 0.949223, 0.983426, 0.994721],\n",
" [0.953117, 0.944793, 0.99642 , 0.971521, 0.951413, 0.926127,\n",
" 0.945339, 0.983831, 0.976795, 0.989499, 0.965297],\n",
" [0.978185, 0.975026, 0.967635, 0.991123, 0.945305, 0.965985,\n",
" 0.937724, 0.966592, 0.944686, 0.947914, 0.969859]]],\n",
"\n",
"\n",
" [[[0.035937, 0.032809, 0.036074, 0.054938, 0.020817, 0.016693,\n",
" 0.010495, 0.052086, 0.032586, 0.063762, 0.043832],\n",
" [0.040247, 0.042332, 0.038767, 0.022039, 0.02663 , 0.073381,\n",
" 0.016937, 0.019657, 0.063141, 0.015542, 0.033979],\n",
" [0.019053, 0.023236, 0.041205, 0.024052, 0.025952, 0.046415,\n",
" 0.013963, 0.082439, 0.039138, 0.028172, 0.087198]],\n",
"\n",
" [[0.022962, 0.039559, 0.045023, 0.056532, 0.038189, 0.013773,\n",
" 0.024106, 0.024428, 0.078114, 0.020636, 0.034975],\n",
" [0.053146, 0.026986, 0.03948 , 0.03409 , 0.055288, 0.030514,\n",
" 0.04947 , 0.028957, 0.035929, 0.0153 , 0.021199],\n",
" [0.040169, 0.051678, 0.023015, 0.046181, 0.02122 , 0.017725,\n",
" 0.034586, 0.014844, 0.019773, 0.030668, 0.030164]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * quantile (quantile) float64 0.975 0.025"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da.quantile([0.975, 0.025], dim=\"draw\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cummulative sums and products"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
"array([0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642, 0.484231,\n",
" 0.543931, 0.512703, 0.495674, 0.490514])\n",
"Coordinates:\n",
" * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_slice_da = mean_da.sel(sex_id=1, age_group_id=11, drop=True)\n",
"data_slice_da"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
"array([0.515046, 0.99797 , 1.517875, 1.979725, 2.445372, 2.884015, 3.368245,\n",
" 3.912177, 4.42488 , 4.920554, 5.411068])\n",
"Coordinates:\n",
" * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_slice_da.cumsum(\"year_id\")"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
"array([5.150465e-01, 2.487281e-01, 1.293149e-01, 5.972412e-02, 2.781038e-02,\n",
" 1.219881e-02, 5.907039e-03, 3.213024e-03, 1.647328e-03, 8.165372e-04,\n",
" 4.005227e-04])\n",
"Coordinates:\n",
" * year_id (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_slice_da.cumprod(\"year_id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 9. Vectorized operations\n",
"\n",
"Adding, multiplying, dividing two or more arrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Xarray lines up all of the dimensions and coordinates for you when performing arithmetic between two or more arrays.\n",
"Of course, if you line things up yourself before had computation will be faster (can discuss that more later)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's okay if they don't share dimensions -- data will automatically broadcast for dimensions that don't exist. e.g.,\n",
"below, the right operand doesn't have a draw dimension, but the value of each slice is applied to each draw of the corresponding slice from the left operand\n",
"\n",
"that is\n",
"\n",
"``da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=i)`` for all ``i`` in ``[0, 99]`` is applied to\n",
"``mean_da.sel(age_group_id=11, sex_id=2, year_id=1996)``"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"result_da = da + mean_da # try with operators!"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (draw: 3)>\n",
"array([0.956272, 0.969894, 1.22502 ])\n",
"Coordinates:\n",
" sex_id int64 2\n",
" age_group_id int64 11\n",
" year_id int64 1996\n",
" * draw (draw) int64 0 1 2"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result_da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=[0, 1, 2])"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (draw: 3)>\n",
"array([0.486596, 0.500218, 0.755344])\n",
"Coordinates:\n",
" sex_id int64 2\n",
" age_group_id int64 11\n",
" year_id int64 1996\n",
" * draw (draw) int64 0 1 2\n",
"Attributes:\n",
" metric: rate\n",
" author: Me"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=[0, 1, 2])"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' ()>\n",
"array(0.469676)\n",
"Coordinates:\n",
" sex_id int64 2\n",
" age_group_id int64 11\n",
" year_id int64 1996"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.sel(age_group_id=11, sex_id=2, year_id=1996)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Xarray also defaults to taking the intersection of the coordinates from each of the operands"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
"array([[[1.003333, 1.041037, 1.042973, 1.023185, 1.109987, 1.041623,\n",
" 1.03129 , 1.151135, 0.975739, 0.945686, 1.096026]],\n",
"\n",
" [[1.014855, 1.010536, 0.946785, 1.009277, 0.984802, 0.956791,\n",
" 0.910494, 0.918802, 1.110162, 1.000113, 1.058741]]])\n",
"Coordinates:\n",
" * age_group_id (age_group_id) int64 13\n",
" * sex_id (sex_id) int64 1 2\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"3 * mean_da - mean_da.sel(age_group_id=[13])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are ways around the above if you want the union of the two. Of course the output will have NaNs where the operands\n",
"don't line up (i.e. where they don't have the same coordinates of a dimension they share)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
"array([[[ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan],\n",
" [ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan],\n",
" [1.003333, 1.041037, 1.042973, 1.023185, 1.109987, 1.041623,\n",
" 1.03129 , 1.151135, 0.975739, 0.945686, 1.096026]],\n",
"\n",
" [[ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan],\n",
" [ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan],\n",
" [1.014855, 1.010536, 0.946785, 1.009277, 0.984802, 0.956791,\n",
" 0.910494, 0.918802, 1.110162, 1.000113, 1.058741]]])\n",
"Coordinates:\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * sex_id (sex_id) int64 1 2\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with xr.set_options(arithmetic_join=\"outer\"):\n",
" result = 3 * mean_da - mean_da.sel(age_group_id=[13])\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Want to point out another subtle feature of xarray I used just above: applying a scalar (float) to the data. Even this simple task is quite a bit more work in pandas, because you don't want to add 1000 to your metadata as well as your data."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
"array([[[1000.515046, 1000.482924, 1000.519905, 1000.46185 , 1000.465647,\n",
" 1000.438642, 1000.484231, 1000.543931, 1000.512703, 1000.495674,\n",
" 1000.490514],\n",
" [1000.479378, 1000.483819, 1000.535785, 1000.557384, 1000.493535,\n",
" 1000.504168, 1000.482273, 1000.471792, 1000.508302, 1000.46291 ,\n",
" 1000.481167],\n",
" [1000.501667, 1000.520518, 1000.521487, 1000.511592, 1000.554994,\n",
" 1000.520812, 1000.515645, 1000.575567, 1000.48787 , 1000.472843,\n",
" 1000.548013]],\n",
"\n",
" [[1000.480634, 1000.501324, 1000.48249 , 1000.489984, 1000.434139,\n",
" 1000.482908, 1000.469676, 1000.505307, 1000.528676, 1000.498105,\n",
" 1000.496416],\n",
" [1000.476805, 1000.533403, 1000.530798, 1000.54454 , 1000.511816,\n",
" 1000.480597, 1000.497687, 1000.487296, 1000.473503, 1000.477073,\n",
" 1000.450402],\n",
" [1000.507428, 1000.505268, 1000.473393, 1000.504639, 1000.492401,\n",
" 1000.478396, 1000.455247, 1000.459401, 1000.555081, 1000.500056,\n",
" 1000.529371]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da + 1000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 10. Changing and adding coordinates/Expanding or broadcasting dimensions"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"broadcasted_da = mean_da.expand_dims(draw=range(5), location_id=[102, 6])"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (draw: 5, location_id: 2)>\n",
"array([[0.482908, 0.482908],\n",
" [0.482908, 0.482908],\n",
" [0.482908, 0.482908],\n",
" [0.482908, 0.482908],\n",
" [0.482908, 0.482908]])\n",
"Coordinates:\n",
" * draw (draw) int64 0 1 2 3 4\n",
" * location_id (location_id) int64 102 6\n",
" sex_id int64 2\n",
" age_group_id int64 11\n",
" year_id int64 1995"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"broadcasted_da.sel(age_group_id=11, sex_id=2, year_id=1995)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Trying to limit use of our internal FHS code, but we do have a really nice/fast tool for this: expanding an existing dimension to include new coordinates."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"from fbd_core.etl import expand_dimensions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: the difference between broadcasting and expanding\n",
"* **broadcasting** here means altering the array to include dimensions it didn't previously, where each coordinate on a given new dimension points to an identical slice\n",
"* **Expanding** here means altering the array to include coordinates it didn't previously but on a dimension that already did exist."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 5, year_id: 11, scenario: 3)>\n",
"array([[[[0.515046, ..., 0.515046],\n",
" ...,\n",
" [0.490514, ..., 0.490514]],\n",
"\n",
" ...,\n",
"\n",
" [[ nan, ..., nan],\n",
" ...,\n",
" [ nan, ..., nan]]],\n",
"\n",
"\n",
" [[[0.480634, ..., 0.480634],\n",
" ...,\n",
" [0.496416, ..., 0.496416]],\n",
"\n",
" ...,\n",
"\n",
" [[ nan, ..., nan],\n",
" ...,\n",
" [ nan, ..., nan]]]])\n",
"Coordinates:\n",
" * age_group_id (age_group_id) int64 11 12 13 14 15\n",
" * sex_id (sex_id) int64 1 2\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * scenario (scenario) int64 0 1 -1"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"expand_dimensions(mean_da, age_group_id=[14, 15], scenario=[0, 1, -1])"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 5, year_id: 11, scenario: 3)>\n",
"array([[[[5.150465e-01, ..., 5.150465e-01],\n",
" ...,\n",
" [4.905137e-01, ..., 4.905137e-01]],\n",
"\n",
" ...,\n",
"\n",
" [[9.999000e+03, ..., 9.999000e+03],\n",
" ...,\n",
" [9.999000e+03, ..., 9.999000e+03]]],\n",
"\n",
"\n",
" [[[4.806340e-01, ..., 4.806340e-01],\n",
" ...,\n",
" [4.964156e-01, ..., 4.964156e-01]],\n",
"\n",
" ...,\n",
"\n",
" [[9.999000e+03, ..., 9.999000e+03],\n",
" ...,\n",
" [9.999000e+03, ..., 9.999000e+03]]]])\n",
"Coordinates:\n",
" * age_group_id (age_group_id) int64 11 12 13 14 15\n",
" * sex_id (sex_id) int64 1 2\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * scenario (scenario) int64 0 1 -1"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"expand_dimensions(mean_da, age_group_id=[14, 15], scenario=[0, 1, -1], fill_value=9999)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Changing coordinate or dimension labels"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"renamed_da = mean_da.rename({\"sex_id\": \"sex_name\", \"age_group_id\": \"age_group_name\"})"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_name: 2, age_group_name: 3, year_id: 11)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n",
" [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
" 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n",
" [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
" 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
" [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
"Coordinates:\n",
" * sex_name (sex_name) <U6 'Male' 'Female'\n",
" * age_group_name (age_group_name) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"renamed_da.assign_coords(sex_name=[\"Male\", \"Female\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Concatenating multiple dataarrays into one dataarray"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"future_da = xr.DataArray(\n",
" data=np.random.random([2, 3, 4]),\n",
" dims=[\"sex_id\", \"age_group_id\", \"year_id\"],\n",
" coords={\n",
" \"sex_id\": [1, 2],\n",
" \"age_group_id\": [11, 12, 13],\n",
" \"year_id\": range(2001, 2005),\n",
" },\n",
" name=\"fake_thing\"\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
" 0.001631, 0.115671, 0.44467 ],\n",
" [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
" 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n",
" 0.813101, 0.781387, 0.394234],\n",
" [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
" 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n",
" 0.491746, 0.697288, 0.206781]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
" 0.925087, 0.867205, 0.221227],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
" 0.63895 , 0.727136, 0.085397],\n",
" [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n",
" 0.217386, 0.575926, 0.349022]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xr.concat([mean_da, future_da], dim=\"year_id\")"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
" 0.001631, 0.115671, 0.44467 ],\n",
" [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
" 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n",
" 0.813101, 0.781387, 0.394234],\n",
" [ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan, 0.811391,\n",
" 0.491746, 0.697288, 0.206781]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
" 0.925087, 0.867205, 0.221227],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
" 0.63895 , 0.727136, 0.085397],\n",
" [ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan, 0.610459,\n",
" 0.217386, 0.575926, 0.349022]]])\n",
"Coordinates:\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * sex_id (sex_id) int64 1 2\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xr.concat([mean_da.sel(age_group_id=[11, 12]), future_da], dim=\"year_id\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"summary_da = xr.concat([\n",
" da.mean(\"draw\").assign_coords(summary_val=\"mean\"),\n",
" da.quantile([0.975, 0.025], \"draw\").rename({\"quantile\": \"summary_val\"}).assign_coords(summary_val=[\"upper\", \"lower\"])\n",
" ], dim=\"summary_val\")"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, summary_val: 3)>\n",
"array([[[[0.515046, 0.946091, 0.035937],\n",
" [0.482924, 0.947006, 0.032809],\n",
" [0.519905, 0.989351, 0.036074],\n",
" [0.46185 , 0.938826, 0.054938],\n",
" [0.465647, 0.957233, 0.020817],\n",
" [0.438642, 0.961054, 0.016693],\n",
" [0.484231, 0.962361, 0.010495],\n",
" [0.543931, 0.9847 , 0.052086],\n",
" [0.512703, 0.991939, 0.032586],\n",
" [0.495674, 0.992401, 0.063762],\n",
" [0.490514, 0.931928, 0.043832]],\n",
"\n",
" [[0.479378, 0.96409 , 0.040247],\n",
" [0.483819, 0.989095, 0.042332],\n",
" [0.535785, 0.971059, 0.038767],\n",
" [0.557384, 0.932266, 0.022039],\n",
" [0.493535, 0.992066, 0.02663 ],\n",
" [0.504168, 0.995731, 0.073381],\n",
" [0.482273, 0.961511, 0.016937],\n",
" [0.471792, 0.95729 , 0.019657],\n",
" [0.508302, 0.929298, 0.063141],\n",
" [0.46291 , 0.971658, 0.015542],\n",
" [0.481167, 0.961813, 0.033979]],\n",
"\n",
" [[0.501667, 0.985317, 0.019053],\n",
" [0.520518, 0.95646 , 0.023236],\n",
" [0.521487, 0.971972, 0.041205],\n",
" [0.511592, 0.984797, 0.024052],\n",
" [0.554994, 0.965313, 0.025952],\n",
" [0.520812, 0.9662 , 0.046415],\n",
" [0.515645, 0.967894, 0.013963],\n",
" [0.575567, 0.966558, 0.082439],\n",
" [0.48787 , 0.983227, 0.039138],\n",
" [0.472843, 0.975196, 0.028172],\n",
" [0.548013, 0.969915, 0.087198]]],\n",
"\n",
"\n",
" [[[0.480634, 0.968146, 0.022962],\n",
" [0.501324, 0.976099, 0.039559],\n",
" [0.48249 , 0.977238, 0.045023],\n",
" [0.489984, 0.909349, 0.056532],\n",
" [0.434139, 0.948866, 0.038189],\n",
" [0.482908, 0.948609, 0.013773],\n",
" [0.469676, 0.951318, 0.024106],\n",
" [0.505307, 0.960428, 0.024428],\n",
" [0.528676, 0.949223, 0.078114],\n",
" [0.498105, 0.983426, 0.020636],\n",
" [0.496416, 0.994721, 0.034975]],\n",
"\n",
" [[0.476805, 0.953117, 0.053146],\n",
" [0.533403, 0.944793, 0.026986],\n",
" [0.530798, 0.99642 , 0.03948 ],\n",
" [0.54454 , 0.971521, 0.03409 ],\n",
" [0.511816, 0.951413, 0.055288],\n",
" [0.480597, 0.926127, 0.030514],\n",
" [0.497687, 0.945339, 0.04947 ],\n",
" [0.487296, 0.983831, 0.028957],\n",
" [0.473503, 0.976795, 0.035929],\n",
" [0.477073, 0.989499, 0.0153 ],\n",
" [0.450402, 0.965297, 0.021199]],\n",
"\n",
" [[0.507428, 0.978185, 0.040169],\n",
" [0.505268, 0.975026, 0.051678],\n",
" [0.473393, 0.967635, 0.023015],\n",
" [0.504639, 0.991123, 0.046181],\n",
" [0.492401, 0.945305, 0.02122 ],\n",
" [0.478396, 0.965985, 0.017725],\n",
" [0.455247, 0.937724, 0.034586],\n",
" [0.459401, 0.966592, 0.014844],\n",
" [0.555081, 0.944686, 0.019773],\n",
" [0.500056, 0.947914, 0.030668],\n",
" [0.529371, 0.969859, 0.030164]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * summary_val (summary_val) <U5 'mean' 'upper' 'lower'"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"summary_da"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"``combine_first`` is another tool, which similar but has a nice additional feature -- if the array you're appending\n",
"data for coords that already exist original array, then only the data will be used."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
" 0.001631, 0.115671, 0.44467 ],\n",
" [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
" 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n",
" 0.813101, 0.781387, 0.394234],\n",
" [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
" 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n",
" 0.491746, 0.697288, 0.206781]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
" 0.925087, 0.867205, 0.221227],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
" 0.63895 , 0.727136, 0.085397],\n",
" [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n",
" 0.217386, 0.575926, 0.349022]]])\n",
"Coordinates:\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.combine_first(future_da)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 6)>\n",
"array([[[-5.000000e+00, -5.000000e+00, 9.976917e-01, 1.631312e-03,\n",
" 1.156705e-01, 4.446702e-01],\n",
" [-5.000000e+00, -5.000000e+00, 8.696002e-01, 8.131013e-01,\n",
" 7.813870e-01, 3.942344e-01],\n",
" [-5.000000e+00, -5.000000e+00, 8.113905e-01, 4.917461e-01,\n",
" 6.972879e-01, 2.067813e-01]],\n",
"\n",
" [[-5.000000e+00, -5.000000e+00, 2.771139e-01, 9.250868e-01,\n",
" 8.672051e-01, 2.212266e-01],\n",
" [-5.000000e+00, -5.000000e+00, 5.082614e-01, 6.389501e-01,\n",
" 7.271363e-01, 8.539651e-02],\n",
" [-5.000000e+00, -5.000000e+00, 6.104589e-01, 2.173860e-01,\n",
" 5.759261e-01, 3.490221e-01]]])\n",
"Coordinates:\n",
" * year_id (year_id) int64 1999 2000 2001 2002 2003 2004\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"future_with_past = expand_dimensions(future_da, year_id=[1999, 2000], fill_value=-5)\n",
"future_with_past"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
"array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
" 0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
" 0.001631, 0.115671, 0.44467 ],\n",
" [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
" 0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696 ,\n",
" 0.813101, 0.781387, 0.394234],\n",
" [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
" 0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n",
" 0.491746, 0.697288, 0.206781]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
" 0.925087, 0.867205, 0.221227],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
" 0.63895 , 0.727136, 0.085397],\n",
" [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n",
" 0.217386, 0.575926, 0.349022]]])\n",
"Coordinates:\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_da.combine_first(future_with_past)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 11. Datasets\n",
"\n",
"what are they. when they're useful. How to make them.\n",
"\n",
"Datasets are very useful in some situations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A typical use case is when you want to keep several related data variables in one data structure but when they have inconsistent dimensions. For example, you want to store SDI and mortality in a dataset together, but mortality has age-group and sex dimensions, while SDI does not. They do however share year and location as dimensions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Merging two or more dataarrays into one dataset: Note that dataarrays have to be named"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ds = xr.merge([mean_da.rename(\"mean\"), da.rename(\"draws\")])"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (age_group_id: 3, sex_id: 2, year_id: 11)\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
"Data variables:\n",
" draw_0 (sex_id, age_group_id, year_id) float64 0.9968 ... 0.8295\n",
" draw_1 (sex_id, age_group_id, year_id) float64 0.7616 ... 0.01835\n",
" draw_2 (sex_id, age_group_id, year_id) float64 0.3508 ... 0.6153\n",
" draw_3 (sex_id, age_group_id, year_id) float64 0.7504 ... 0.6143\n",
" draw_4 (sex_id, age_group_id, year_id) float64 0.4339 ... 0.4588\n",
" draw_5 (sex_id, age_group_id, year_id) float64 0.7644 ... 0.995\n",
" draw_6 (sex_id, age_group_id, year_id) float64 0.1224 ... 0.0544\n",
" draw_7 (sex_id, age_group_id, year_id) float64 0.5721 ... 0.6756\n",
" draw_8 (sex_id, age_group_id, year_id) float64 0.6604 ... 0.04322\n",
" draw_9 (sex_id, age_group_id, year_id) float64 0.7618 ... 0.6469\n",
" draw_10 (sex_id, age_group_id, year_id) float64 0.319 ... 0.9751\n",
" draw_11 (sex_id, age_group_id, year_id) float64 0.524 ... 0.7128\n",
" draw_12 (sex_id, age_group_id, year_id) float64 0.6512 ... 0.3474\n",
" draw_13 (sex_id, age_group_id, year_id) float64 0.562 0.619 ... 0.5368\n",
" draw_14 (sex_id, age_group_id, year_id) float64 0.2113 ... 0.5242\n",
" draw_15 (sex_id, age_group_id, year_id) float64 0.4212 ... 0.7537\n",
" draw_16 (sex_id, age_group_id, year_id) float64 0.9361 ... 0.6975\n",
" draw_17 (sex_id, age_group_id, year_id) float64 0.2602 ... 0.4517\n",
" draw_18 (sex_id, age_group_id, year_id) float64 0.9366 ... 0.5681\n",
" draw_19 (sex_id, age_group_id, year_id) float64 0.2167 ... 0.6634\n",
" draw_20 (sex_id, age_group_id, year_id) float64 0.4639 ... 0.1377\n",
" draw_21 (sex_id, age_group_id, year_id) float64 0.8358 ... 0.9574\n",
" draw_22 (sex_id, age_group_id, year_id) float64 0.8781 ... 0.8257\n",
" draw_23 (sex_id, age_group_id, year_id) float64 0.646 ... 0.7914\n",
" draw_24 (sex_id, age_group_id, year_id) float64 0.7519 ... 0.9622\n",
" draw_25 (sex_id, age_group_id, year_id) float64 0.1052 ... 0.411\n",
" draw_26 (sex_id, age_group_id, year_id) float64 0.453 ... 0.6014\n",
" draw_27 (sex_id, age_group_id, year_id) float64 0.97 0.3091 ... 0.6086\n",
" draw_28 (sex_id, age_group_id, year_id) float64 0.4998 ... 0.4429\n",
" draw_29 (sex_id, age_group_id, year_id) float64 0.05933 ... 0.06421\n",
" draw_30 (sex_id, age_group_id, year_id) float64 0.4804 ... 0.06488\n",
" draw_31 (sex_id, age_group_id, year_id) float64 0.8592 ... 0.8677\n",
" draw_32 (sex_id, age_group_id, year_id) float64 0.7863 ... 0.3912\n",
" draw_33 (sex_id, age_group_id, year_id) float64 0.8053 ... 0.6144\n",
" draw_34 (sex_id, age_group_id, year_id) float64 0.4348 ... 0.4366\n",
" draw_35 (sex_id, age_group_id, year_id) float64 0.06214 ... 0.9133\n",
" draw_36 (sex_id, age_group_id, year_id) float64 0.2246 ... 0.3688\n",
" draw_37 (sex_id, age_group_id, year_id) float64 0.8678 ... 0.09394\n",
" draw_38 (sex_id, age_group_id, year_id) float64 0.07461 ... 0.3763\n",
" draw_39 (sex_id, age_group_id, year_id) float64 0.07531 ... 0.3003\n",
" draw_40 (sex_id, age_group_id, year_id) float64 0.693 ... 0.9037\n",
" draw_41 (sex_id, age_group_id, year_id) float64 0.1258 ... 0.0871\n",
" draw_42 (sex_id, age_group_id, year_id) float64 0.8201 ... 0.1401\n",
" draw_43 (sex_id, age_group_id, year_id) float64 0.0862 ... 0.09171\n",
" draw_44 (sex_id, age_group_id, year_id) float64 0.0325 ... 0.8637\n",
" draw_45 (sex_id, age_group_id, year_id) float64 0.891 ... 0.7595\n",
" draw_46 (sex_id, age_group_id, year_id) float64 0.7296 ... 0.4517\n",
" draw_47 (sex_id, age_group_id, year_id) float64 0.2217 ... 0.8892\n",
" draw_48 (sex_id, age_group_id, year_id) float64 0.9126 ... 0.7594\n",
" draw_49 (sex_id, age_group_id, year_id) float64 0.2288 ... 0.07483\n",
" draw_50 (sex_id, age_group_id, year_id) float64 0.04443 ... 0.04533\n",
" draw_51 (sex_id, age_group_id, year_id) float64 0.9039 ... 0.2432\n",
" draw_52 (sex_id, age_group_id, year_id) float64 0.7614 ... 0.3947\n",
" draw_53 (sex_id, age_group_id, year_id) float64 0.4993 ... 0.6609\n",
" draw_54 (sex_id, age_group_id, year_id) float64 0.2617 ... 0.5335\n",
" draw_55 (sex_id, age_group_id, year_id) float64 0.4381 ... 0.233\n",
" draw_56 (sex_id, age_group_id, year_id) float64 0.8289 ... 0.3777\n",
" draw_57 (sex_id, age_group_id, year_id) float64 0.891 0.321 ... 0.5784\n",
" draw_58 (sex_id, age_group_id, year_id) float64 0.4284 ... 0.9219\n",
" draw_59 (sex_id, age_group_id, year_id) float64 0.8465 ... 0.3912\n",
" draw_60 (sex_id, age_group_id, year_id) float64 0.355 ... 0.9802\n",
" draw_61 (sex_id, age_group_id, year_id) float64 0.02395 ... 0.8892\n",
" draw_62 (sex_id, age_group_id, year_id) float64 0.2704 ... 0.5783\n",
" draw_63 (sex_id, age_group_id, year_id) float64 0.5877 ... 0.8256\n",
" draw_64 (sex_id, age_group_id, year_id) float64 0.8006 ... 0.3911\n",
" draw_65 (sex_id, age_group_id, year_id) float64 0.9209 ... 0.964\n",
" draw_66 (sex_id, age_group_id, year_id) float64 0.8785 ... 0.1937\n",
" draw_67 (sex_id, age_group_id, year_id) float64 0.6067 ... 0.9608\n",
" draw_68 (sex_id, age_group_id, year_id) float64 0.03497 ... 0.3151\n",
" draw_69 (sex_id, age_group_id, year_id) float64 0.5407 ... 0.8746\n",
" draw_70 (sex_id, age_group_id, year_id) float64 0.2832 ... 0.5551\n",
" draw_71 (sex_id, age_group_id, year_id) float64 0.2203 ... 0.117\n",
" draw_72 (sex_id, age_group_id, year_id) float64 0.7725 ... 0.6808\n",
" draw_73 (sex_id, age_group_id, year_id) float64 0.6947 ... 0.1008\n",
" draw_74 (sex_id, age_group_id, year_id) float64 0.6666 ... 0.6387\n",
" draw_75 (sex_id, age_group_id, year_id) float64 0.5149 ... 0.005526\n",
" draw_76 (sex_id, age_group_id, year_id) float64 0.9547 ... 0.7345\n",
" draw_77 (sex_id, age_group_id, year_id) float64 0.3364 ... 0.005635\n",
" draw_78 (sex_id, age_group_id, year_id) float64 0.572 ... 0.8498\n",
" draw_79 (sex_id, age_group_id, year_id) float64 0.4541 ... 0.646\n",
" draw_80 (sex_id, age_group_id, year_id) float64 0.2224 ... 0.9263\n",
" draw_81 (sex_id, age_group_id, year_id) float64 0.1243 ... 0.4103\n",
" draw_82 (sex_id, age_group_id, year_id) float64 0.7503 ... 0.1586\n",
" draw_83 (sex_id, age_group_id, year_id) float64 0.1514 ... 0.8157\n",
" draw_84 (sex_id, age_group_id, year_id) float64 0.7995 ... 0.05216\n",
" draw_85 (sex_id, age_group_id, year_id) float64 0.4792 ... 0.424\n",
" draw_86 (sex_id, age_group_id, year_id) float64 0.09267 ... 0.6082\n",
" draw_87 (sex_id, age_group_id, year_id) float64 0.5104 ... 0.6742\n",
" draw_88 (sex_id, age_group_id, year_id) float64 0.9314 ... 0.3405\n",
" draw_89 (sex_id, age_group_id, year_id) float64 0.037 ... 0.8594\n",
" draw_90 (sex_id, age_group_id, year_id) float64 0.2258 ... 0.835\n",
" draw_91 (sex_id, age_group_id, year_id) float64 0.8366 ... 0.4034\n",
" draw_92 (sex_id, age_group_id, year_id) float64 0.8852 ... 0.8596\n",
" draw_93 (sex_id, age_group_id, year_id) float64 0.2229 ... 0.9087\n",
" draw_94 (sex_id, age_group_id, year_id) float64 0.6414 ... 0.6106\n",
" draw_95 (sex_id, age_group_id, year_id) float64 0.3939 ... 0.2564\n",
" draw_96 (sex_id, age_group_id, year_id) float64 0.3816 ... 0.3988\n",
" draw_97 (sex_id, age_group_id, year_id) float64 0.2947 ... 0.2535\n",
" draw_98 (sex_id, age_group_id, year_id) float64 0.6506 ... 0.4103\n",
" draw_99 (sex_id, age_group_id, year_id) float64 0.1932 ... 0.6117"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is similar to merging in pandas actually, except it defaults to outer merge which is the union of the arrays,\n",
"but alternatively you can take inner merge/or intersection of the arrays."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (age_group_id: 3, draw: 100, sex_id: 1, year_id: 11)\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
"Data variables:\n",
" mean (sex_id, age_group_id, year_id) float64 0.4806 ... 0.5294\n",
" draws (sex_id, age_group_id, year_id, draw) float64 0.07233 ... 0.6117"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"only_females_ds = xr.merge([mean_da.rename(\"mean\").sel(sex_id=[2]), da.rename(\"draws\")], join=\"inner\")\n",
"only_females_ds"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (age_group_id: 3, draw: 100, sex_id: 2, year_id: 11)\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
"Data variables:\n",
" mean (sex_id, age_group_id, year_id) float64 nan nan ... 0.5294\n",
" draws (sex_id, age_group_id, year_id, draw) float64 0.9968 ... 0.6117"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with_nans_da = xr.merge([mean_da.rename(\"mean\").sel(sex_id=[2]), da.rename(\"draws\")], join=\"outer\")\n",
"with_nans_da"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'mean' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
"array([[[ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan],\n",
" [ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan],\n",
" [ nan, nan, nan, nan, nan, nan,\n",
" nan, nan, nan, nan, nan]],\n",
"\n",
" [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
" 0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
" [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
" 0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
" [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
" 0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with_nans_da[\"mean\"]"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray 'draws' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
"array([[[[0.996779, ..., 0.193241],\n",
" ...,\n",
" [0.50371 , ..., 0.654273]],\n",
"\n",
" ...,\n",
"\n",
" [[0.227811, ..., 0.912856],\n",
" ...,\n",
" [0.24642 , ..., 0.581184]]],\n",
"\n",
"\n",
" [[[0.072334, ..., 0.684663],\n",
" ...,\n",
" [0.628984, ..., 0.358811]],\n",
"\n",
" ...,\n",
"\n",
" [[0.491698, ..., 0.876439],\n",
" ...,\n",
" [0.829525, ..., 0.611719]]]])\n",
"Coordinates:\n",
" * sex_id (sex_id) int64 1 2\n",
" * age_group_id (age_group_id) int64 11 12 13\n",
" * year_id (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
" * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
"Attributes:\n",
" metric: rate\n",
" author: Me"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with_nans_da[\"draws\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@shaoxiuma
Copy link

Hello, thanks for sharing. can you indicate how to in stall fbd_core package? this command does not work : conda install fbd_core

@pletchm
Copy link
Author

pletchm commented Apr 27, 2023

Hello, thanks for sharing. can you indicate how to in stall fbd_core package? this command does not work : conda install fbd_core

fbd_core is an internal library, but here's the code for the expand_dimensions() function:

import numpy as np
import xarray as xr


def expand_dimensions(data, fill_value=np.nan, **new_coords):
    """
    Expand (or add if it doesn't yet exist) the data array to fill in new
    coordinates across multiple dimensions.

    If a dimension doesn't exist in the dataarray yet, then the result will be
    `data`, broadcasted across this dimension.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, b=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 3, b: 5)>
    array([[ 1.,  1.,  1.,  1.,  1.],
           [ 2.,  2.,  2.,  2.,  2.],
           [ 3.,  3.,  3.,  3.,  3.]])
    Coordinates:
      * a        (a) int64 0 1 2
      * b        (b) int64 1 2 3 4 5

    Or, if `dim` is already a dimension in `data`, then any new coordinate
    values in `new_coords` that are not yet in `data[dim]` will be added,
    and the values corresponding to those new coordinates will be `fill_value`.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, a=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 6)>
    array([ 1.,  2.,  3.,  0.,  0.,  0.])
    Coordinates:
      * a        (a) int64 0 1 2 3 4 5

    Args:
        data (xarray.DataArray):
            Data that needs dimensions expanded.
        fill_value (scalar, optional):
            If expanding new coords this is the value of the new datum.
            Defaults to `np.nan`.
        **new_coords (list[int | str]):
            The keywords are arbitrary dimensions and the values are
            coordinates of those dimensions that the data will include after it
            has been expanded.
    Returns:
        xarray.DataArray:
            Data that had its dimensions expanded to include the new
            coordinates.
    """
    ordered_coord_dict = OrderedDict(new_coords)
    shape_da = xr.DataArray(
        np.zeros(list(map(len, ordered_coord_dict.values()))),
        coords=ordered_coord_dict,
        dims=ordered_coord_dict.keys())
    expanded_data = xr.broadcast(data, shape_da)[0].fillna(fill_value)
    return expanded_data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment