Skip to content

Instantly share code, notes, and snippets.

@arbennett
Created April 1, 2022 19:00
Show Gist options
  • Save arbennett/df50755671dff0e32e43cd46b9f06c9e to your computer and use it in GitHub Desktop.
Save arbennett/df50755671dff0e32e43cd46b9f06c9e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "ff3348ab-85f8-4d5b-89f1-8d77128b08c9",
"metadata": {},
"source": [
"# Example of training and XGBoost regressor on a spatiotemporal dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "blind-graph",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"source": [
"%pylab inline\n",
"import xarray as xr\n",
"import pandas as pd\n",
"import xgboost as xgb\n",
"import dask.array as da\n",
"import dask.distributed\n",
"from glob import glob"
]
},
{
"cell_type": "markdown",
"id": "98ece1c0-aece-4f13-8340-dc7ca3cd5cc0",
"metadata": {},
"source": [
"# Setting up the target data\n",
"\n",
"Here the goal was to predict the VIC-modeled soil moisture percentile for some simulations run in the Columbia river basin. The percentiles were pre-computed in a different piece of code, but we open them up via `xarray`, then grab some particular timestamp that we want to predict, and then finally flatten the latitudes and longitudes down to a flat vector which becomes the DataFrame's index."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "92127b1e-de71-4133-bcce-65c7e78bcbc9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>smpercentile</th>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>41.21875</th>\n",
" <th>-116.21875</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">41.28125</th>\n",
" <th>-116.28125</th>\n",
" <td>3.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-116.21875</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-116.15625</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-116.09375</th>\n",
" <td>3.333333</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" smpercentile\n",
"lat lon \n",
"41.21875 -116.21875 0.000000\n",
"41.28125 -116.28125 3.333333\n",
" -116.21875 0.000000\n",
" -116.15625 0.000000\n",
" -116.09375 3.333333"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"files = glob('./output/post_processing/cdf_results/us/vic-metdata_smpercentile_*')\n",
"dates = pd.DatetimeIndex(sorted([f.split('_')[-1].split('.')[0] for f in files]))\n",
"\n",
"smpercentile = xr.concat([\n",
" xr.open_dataset(f).expand_dims({'time': 1})\n",
" for i, f in enumerate(files)\n",
"], dim='time')\n",
"smpercentile['time'] = dates\n",
"\n",
"time_sample_idx = 24\n",
"percentiles = smpercentile.isel(time=time_sample_idx)['smpercentile']\n",
"time = percentiles['time'].values[()]\n",
"\n",
"\n",
"target_ds = percentiles.stack(z=['lat', 'lon']).dropna(dim='z').drop('time')\n",
"target_df = target_ds.to_dataframe()[['smpercentile']]\n",
"target_df.head()"
]
},
{
"cell_type": "markdown",
"id": "467983cc-6fae-43d9-a857-c0b7aa323328",
"metadata": {},
"source": [
"# Setting up the input data\n",
"\n",
"First we open up the raw data which contains the meteorological forcing data, VIC parameters, and some domain information. \n",
"As inputs to the XGBoost regressor we will use the soil density, daily minimum and maximum temperatures, daily precipitation, elevation, and the annual average precipitation. 14 days of antecedent values for the temperature and precipitation will be included as inputs. These are all merged together into a single `xarray` dataset, which is printed out below..."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d8cc83a7-dde2-447c-aacc-1e86851220a6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div><svg style=\"position: absolute; width: 0; height: 0; overflow: hidden\">\n",
"<defs>\n",
"<symbol id=\"icon-database\" viewBox=\"0 0 32 32\">\n",
"<path d=\"M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z\"></path>\n",
"<path d=\"M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
"<path d=\"M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
"</symbol>\n",
"<symbol id=\"icon-file-text2\" viewBox=\"0 0 32 32\">\n",
"<path d=\"M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z\"></path>\n",
"<path d=\"M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
"<path d=\"M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
"<path d=\"M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
"</symbol>\n",
"</defs>\n",
"</svg>\n",
"<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.\n",
" *\n",
" */\n",
"\n",
":root {\n",
" --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));\n",
" --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));\n",
" --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));\n",
" --xr-border-color: var(--jp-border-color2, #e0e0e0);\n",
" --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);\n",
" --xr-background-color: var(--jp-layout-color0, white);\n",
" --xr-background-color-row-even: var(--jp-layout-color1, white);\n",
" --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);\n",
"}\n",
"\n",
"html[theme=dark],\n",
"body.vscode-dark {\n",
" --xr-font-color0: rgba(255, 255, 255, 1);\n",
" --xr-font-color2: rgba(255, 255, 255, 0.54);\n",
" --xr-font-color3: rgba(255, 255, 255, 0.38);\n",
" --xr-border-color: #1F1F1F;\n",
" --xr-disabled-color: #515151;\n",
" --xr-background-color: #111111;\n",
" --xr-background-color-row-even: #111111;\n",
" --xr-background-color-row-odd: #313131;\n",
"}\n",
"\n",
".xr-wrap {\n",
" display: block;\n",
" min-width: 300px;\n",
" max-width: 700px;\n",
"}\n",
"\n",
".xr-text-repr-fallback {\n",
" /* fallback to plain text repr when CSS is not injected (untrusted notebook) */\n",
" display: none;\n",
"}\n",
"\n",
".xr-header {\n",
" padding-top: 6px;\n",
" padding-bottom: 6px;\n",
" margin-bottom: 4px;\n",
" border-bottom: solid 1px var(--xr-border-color);\n",
"}\n",
"\n",
".xr-header > div,\n",
".xr-header > ul {\n",
" display: inline;\n",
" margin-top: 0;\n",
" margin-bottom: 0;\n",
"}\n",
"\n",
".xr-obj-type,\n",
".xr-array-name {\n",
" margin-left: 2px;\n",
" margin-right: 10px;\n",
"}\n",
"\n",
".xr-obj-type {\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-sections {\n",
" padding-left: 0 !important;\n",
" display: grid;\n",
" grid-template-columns: 150px auto auto 1fr 20px 20px;\n",
"}\n",
"\n",
".xr-section-item {\n",
" display: contents;\n",
"}\n",
"\n",
".xr-section-item input {\n",
" display: none;\n",
"}\n",
"\n",
".xr-section-item input + label {\n",
" color: var(--xr-disabled-color);\n",
"}\n",
"\n",
".xr-section-item input:enabled + label {\n",
" cursor: pointer;\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-section-item input:enabled + label:hover {\n",
" color: var(--xr-font-color0);\n",
"}\n",
"\n",
".xr-section-summary {\n",
" grid-column: 1;\n",
" color: var(--xr-font-color2);\n",
" font-weight: 500;\n",
"}\n",
"\n",
".xr-section-summary > span {\n",
" display: inline-block;\n",
" padding-left: 0.5em;\n",
"}\n",
"\n",
".xr-section-summary-in:disabled + label {\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-section-summary-in + label:before {\n",
" display: inline-block;\n",
" content: '►';\n",
" font-size: 11px;\n",
" width: 15px;\n",
" text-align: center;\n",
"}\n",
"\n",
".xr-section-summary-in:disabled + label:before {\n",
" color: var(--xr-disabled-color);\n",
"}\n",
"\n",
".xr-section-summary-in:checked + label:before {\n",
" content: '▼';\n",
"}\n",
"\n",
".xr-section-summary-in:checked + label > span {\n",
" display: none;\n",
"}\n",
"\n",
".xr-section-summary,\n",
".xr-section-inline-details {\n",
" padding-top: 4px;\n",
" padding-bottom: 4px;\n",
"}\n",
"\n",
".xr-section-inline-details {\n",
" grid-column: 2 / -1;\n",
"}\n",
"\n",
".xr-section-details {\n",
" display: none;\n",
" grid-column: 1 / -1;\n",
" margin-bottom: 5px;\n",
"}\n",
"\n",
".xr-section-summary-in:checked ~ .xr-section-details {\n",
" display: contents;\n",
"}\n",
"\n",
".xr-array-wrap {\n",
" grid-column: 1 / -1;\n",
" display: grid;\n",
" grid-template-columns: 20px auto;\n",
"}\n",
"\n",
".xr-array-wrap > label {\n",
" grid-column: 1;\n",
" vertical-align: top;\n",
"}\n",
"\n",
".xr-preview {\n",
" color: var(--xr-font-color3);\n",
"}\n",
"\n",
".xr-array-preview,\n",
".xr-array-data {\n",
" padding: 0 5px !important;\n",
" grid-column: 2;\n",
"}\n",
"\n",
".xr-array-data,\n",
".xr-array-in:checked ~ .xr-array-preview {\n",
" display: none;\n",
"}\n",
"\n",
".xr-array-in:checked ~ .xr-array-data,\n",
".xr-array-preview {\n",
" display: inline-block;\n",
"}\n",
"\n",
".xr-dim-list {\n",
" display: inline-block !important;\n",
" list-style: none;\n",
" padding: 0 !important;\n",
" margin: 0;\n",
"}\n",
"\n",
".xr-dim-list li {\n",
" display: inline-block;\n",
" padding: 0;\n",
" margin: 0;\n",
"}\n",
"\n",
".xr-dim-list:before {\n",
" content: '(';\n",
"}\n",
"\n",
".xr-dim-list:after {\n",
" content: ')';\n",
"}\n",
"\n",
".xr-dim-list li:not(:last-child):after {\n",
" content: ',';\n",
" padding-right: 5px;\n",
"}\n",
"\n",
".xr-has-index {\n",
" font-weight: bold;\n",
"}\n",
"\n",
".xr-var-list,\n",
".xr-var-item {\n",
" display: contents;\n",
"}\n",
"\n",
".xr-var-item > div,\n",
".xr-var-item label,\n",
".xr-var-item > .xr-var-name span {\n",
" background-color: var(--xr-background-color-row-even);\n",
" margin-bottom: 0;\n",
"}\n",
"\n",
".xr-var-item > .xr-var-name:hover span {\n",
" padding-right: 5px;\n",
"}\n",
"\n",
".xr-var-list > li:nth-child(odd) > div,\n",
".xr-var-list > li:nth-child(odd) > label,\n",
".xr-var-list > li:nth-child(odd) > .xr-var-name span {\n",
" background-color: var(--xr-background-color-row-odd);\n",
"}\n",
"\n",
".xr-var-name {\n",
" grid-column: 1;\n",
"}\n",
"\n",
".xr-var-dims {\n",
" grid-column: 2;\n",
"}\n",
"\n",
".xr-var-dtype {\n",
" grid-column: 3;\n",
" text-align: right;\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-var-preview {\n",
" grid-column: 4;\n",
"}\n",
"\n",
".xr-var-name,\n",
".xr-var-dims,\n",
".xr-var-dtype,\n",
".xr-preview,\n",
".xr-attrs dt {\n",
" white-space: nowrap;\n",
" overflow: hidden;\n",
" text-overflow: ellipsis;\n",
" padding-right: 10px;\n",
"}\n",
"\n",
".xr-var-name:hover,\n",
".xr-var-dims:hover,\n",
".xr-var-dtype:hover,\n",
".xr-attrs dt:hover {\n",
" overflow: visible;\n",
" width: auto;\n",
" z-index: 1;\n",
"}\n",
"\n",
".xr-var-attrs,\n",
".xr-var-data {\n",
" display: none;\n",
" background-color: var(--xr-background-color) !important;\n",
" padding-bottom: 5px !important;\n",
"}\n",
"\n",
".xr-var-attrs-in:checked ~ .xr-var-attrs,\n",
".xr-var-data-in:checked ~ .xr-var-data {\n",
" display: block;\n",
"}\n",
"\n",
".xr-var-data > table {\n",
" float: right;\n",
"}\n",
"\n",
".xr-var-name span,\n",
".xr-var-data,\n",
".xr-attrs {\n",
" padding-left: 25px !important;\n",
"}\n",
"\n",
".xr-attrs,\n",
".xr-var-attrs,\n",
".xr-var-data {\n",
" grid-column: 1 / -1;\n",
"}\n",
"\n",
"dl.xr-attrs {\n",
" padding: 0;\n",
" margin: 0;\n",
" display: grid;\n",
" grid-template-columns: 125px auto;\n",
"}\n",
"\n",
".xr-attrs dt, dd {\n",
" padding: 0;\n",
" margin: 0;\n",
" float: left;\n",
" padding-right: 10px;\n",
" width: auto;\n",
"}\n",
"\n",
".xr-attrs dt {\n",
" font-weight: normal;\n",
" grid-column: 1;\n",
"}\n",
"\n",
".xr-attrs dt:hover span {\n",
" display: inline-block;\n",
" background: var(--xr-background-color);\n",
" padding-right: 10px;\n",
"}\n",
"\n",
".xr-attrs dd {\n",
" grid-column: 2;\n",
" white-space: pre-wrap;\n",
" word-break: break-all;\n",
"}\n",
"\n",
".xr-icon-database,\n",
".xr-icon-file-text2 {\n",
" display: inline-block;\n",
" vertical-align: middle;\n",
" width: 1em;\n",
" height: 1.5em !important;\n",
" stroke-width: 0;\n",
" stroke: currentColor;\n",
" fill: currentColor;\n",
"}\n",
"</style><pre class='xr-text-repr-fallback'>&lt;xarray.Dataset&gt;\n",
"Dimensions: (nlayer: 3, time: 14, z: 20810)\n",
"Coordinates:\n",
" * time (time) datetime64[ns] 2021-01-16 2021-01-17 ... 2021-01-29\n",
" * z (z) MultiIndex\n",
" - lat (z) float64 41.22 41.28 41.28 41.28 ... 49.09 49.09 49.09\n",
" - lon (z) float64 -116.2 -116.3 -116.2 ... -114.3 -114.2 -114.2\n",
"Dimensions without coordinates: nlayer\n",
"Data variables:\n",
" soil_density (nlayer, z) float64 2.62e+03 2.62e+03 ... 2.62e+03 2.62e+03\n",
" t_min (time, z) float32 dask.array&lt;chunksize=(14, 20810), meta=np.ndarray&gt;\n",
" t_max (time, z) float32 dask.array&lt;chunksize=(14, 20810), meta=np.ndarray&gt;\n",
" prec (time, z) float32 dask.array&lt;chunksize=(14, 20810), meta=np.ndarray&gt;\n",
" elev (z) float64 1.817e+03 1.936e+03 ... 1.791e+03 1.969e+03\n",
" annual_prec (z) float64 458.9 466.3 288.1 288.1 ... 1.407e+03 540.8 540.8</pre><div class='xr-wrap' hidden><div class='xr-header'><div class='xr-obj-type'>xarray.Dataset</div></div><ul class='xr-sections'><li class='xr-section-item'><input id='section-e6f9efa5-b1d6-4627-a285-ebea87332087' class='xr-section-summary-in' type='checkbox' disabled ><label for='section-e6f9efa5-b1d6-4627-a285-ebea87332087' class='xr-section-summary' title='Expand/collapse section'>Dimensions:</label><div class='xr-section-inline-details'><ul class='xr-dim-list'><li><span>nlayer</span>: 3</li><li><span class='xr-has-index'>time</span>: 14</li><li><span class='xr-has-index'>z</span>: 20810</li></ul></div><div class='xr-section-details'></div></li><li class='xr-section-item'><input id='section-05150a79-82bb-44bf-9d37-674374f1f8d6' class='xr-section-summary-in' type='checkbox' checked><label for='section-05150a79-82bb-44bf-9d37-674374f1f8d6' class='xr-section-summary' >Coordinates: <span>(2)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>time</span></div><div class='xr-var-dims'>(time)</div><div class='xr-var-dtype'>datetime64[ns]</div><div class='xr-var-preview xr-preview'>2021-01-16 ... 2021-01-29</div><input id='attrs-7e66605f-e1ab-4c2a-a1c1-75fefe54e118' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-7e66605f-e1ab-4c2a-a1c1-75fefe54e118' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-c8c12fbe-dbfe-4edb-a7a1-0f505e86f635' class='xr-var-data-in' type='checkbox'><label for='data-c8c12fbe-dbfe-4edb-a7a1-0f505e86f635' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>standard_name :</span></dt><dd>time</dd><dt><span>long_name :</span></dt><dd>time</dd><dt><span>axis :</span></dt><dd>T</dd></dl></div><div class='xr-var-data'><pre>array([&#x27;2021-01-16T00:00:00.000000000&#x27;, &#x27;2021-01-17T00:00:00.000000000&#x27;,\n",
" &#x27;2021-01-18T00:00:00.000000000&#x27;, &#x27;2021-01-19T00:00:00.000000000&#x27;,\n",
" &#x27;2021-01-20T00:00:00.000000000&#x27;, &#x27;2021-01-21T00:00:00.000000000&#x27;,\n",
" &#x27;2021-01-22T00:00:00.000000000&#x27;, &#x27;2021-01-23T00:00:00.000000000&#x27;,\n",
" &#x27;2021-01-24T00:00:00.000000000&#x27;, &#x27;2021-01-25T00:00:00.000000000&#x27;,\n",
" &#x27;2021-01-26T00:00:00.000000000&#x27;, &#x27;2021-01-27T00:00:00.000000000&#x27;,\n",
" &#x27;2021-01-28T00:00:00.000000000&#x27;, &#x27;2021-01-29T00:00:00.000000000&#x27;],\n",
" dtype=&#x27;datetime64[ns]&#x27;)</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>z</span></div><div class='xr-var-dims'>(z)</div><div class='xr-var-dtype'>MultiIndex</div><div class='xr-var-preview xr-preview'>(lat, lon)</div><input id='attrs-fef095a5-450a-436e-82aa-274cf8c83456' class='xr-var-attrs-in' type='checkbox' disabled><label for='attrs-fef095a5-450a-436e-82aa-274cf8c83456' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-25e5a6b4-0bff-4409-8578-8f1de505c026' class='xr-var-data-in' type='checkbox'><label for='data-25e5a6b4-0bff-4409-8578-8f1de505c026' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'></dl></div><div class='xr-var-data'><pre>array([(41.21875, -116.21875), (41.28125, -116.28125), (41.28125, -116.21875),\n",
" ..., (49.09375, -114.28125), (49.09375, -114.21875),\n",
" (49.09375, -114.15625)], dtype=object)</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span>lat</span></div><div class='xr-var-dims'>(z)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>41.22 41.28 41.28 ... 49.09 49.09</div><input id='attrs-7a3ef87f-fed1-46b9-8f8e-29fca02c4f88' class='xr-var-attrs-in' type='checkbox' disabled><label for='attrs-7a3ef87f-fed1-46b9-8f8e-29fca02c4f88' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-64da73f5-1683-419d-b74b-de206030bec7' class='xr-var-data-in' type='checkbox'><label for='data-64da73f5-1683-419d-b74b-de206030bec7' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'></dl></div><div class='xr-var-data'><pre>array([41.21875, 41.28125, 41.28125, ..., 49.09375, 49.09375, 49.09375])</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span>lon</span></div><div class='xr-var-dims'>(z)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>-116.2 -116.3 ... -114.2 -114.2</div><input id='attrs-863738e5-4a1f-49a7-ad44-e9b11fa6fc5b' class='xr-var-attrs-in' type='checkbox' disabled><label for='attrs-863738e5-4a1f-49a7-ad44-e9b11fa6fc5b' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-fede86c8-6906-4378-9002-1b4f77d132c7' class='xr-var-data-in' type='checkbox'><label for='data-fede86c8-6906-4378-9002-1b4f77d132c7' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'></dl></div><div class='xr-var-data'><pre>array([-116.21875, -116.28125, -116.21875, ..., -114.28125, -114.21875,\n",
" -114.15625])</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-c84729b7-758f-43ed-a154-0c29ee6c94a7' class='xr-section-summary-in' type='checkbox' checked><label for='section-c84729b7-758f-43ed-a154-0c29ee6c94a7' class='xr-section-summary' >Data variables: <span>(6)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-var-name'><span>soil_density</span></div><div class='xr-var-dims'>(nlayer, z)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>2.62e+03 2.62e+03 ... 2.62e+03</div><input id='attrs-e1120720-7a16-48cb-a4e9-fa7d7bac90e1' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-e1120720-7a16-48cb-a4e9-fa7d7bac90e1' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-b9cedf48-b922-4f49-b763-eea0f6d14250' class='xr-var-data-in' type='checkbox'><label for='data-b9cedf48-b922-4f49-b763-eea0f6d14250' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>units :</span></dt><dd>kg/m3</dd><dt><span>description :</span></dt><dd>Soil particle density, normally 2685 kg/m3</dd><dt><span>long_name :</span></dt><dd>soil_density</dd></dl></div><div class='xr-var-data'><pre>array([[2620.28 , 2620.28 , 2642.96 , ..., 2619.1001, 2620.28 ,\n",
" 2620.28 ],\n",
" [2620.28 , 2620.28 , 2642.96 , ..., 2619.1001, 2620.28 ,\n",
" 2620.28 ],\n",
" [2620.28 , 2620.28 , 2642.96 , ..., 2619.1001, 2620.28 ,\n",
" 2620.28 ]])</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span>t_min</span></div><div class='xr-var-dims'>(time, z)</div><div class='xr-var-dtype'>float32</div><div class='xr-var-preview xr-preview'>dask.array&lt;chunksize=(14, 20810), meta=np.ndarray&gt;</div><input id='attrs-3415bc6e-b006-454d-b4b1-9c1c4da27515' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-3415bc6e-b006-454d-b4b1-9c1c4da27515' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-a176cef8-5977-46b6-85fd-4cf30da9d581' class='xr-var-data-in' type='checkbox'><label for='data-a176cef8-5977-46b6-85fd-4cf30da9d581' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>standard_name :</span></dt><dd>tmmn</dd><dt><span>long_name :</span></dt><dd>tmmn</dd><dt><span>units :</span></dt><dd>K</dd><dt><span>description :</span></dt><dd>Daily Minimum Temperature (2m)</dd><dt><span>dimensions :</span></dt><dd>lon lat time</dd><dt><span>coordinate_system :</span></dt><dd>WGS84,EPSG:4326</dd></dl></div><div class='xr-var-data'><table>\n",
"<tr>\n",
"<td>\n",
"<table>\n",
" <thead>\n",
" <tr><td> </td><th> Array </th><th> Chunk </th></tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr><th> Bytes </th><td> 1.17 MB </td> <td> 1.17 MB </td></tr>\n",
" <tr><th> Shape </th><td> (14, 20810) </td> <td> (14, 20810) </td></tr>\n",
" <tr><th> Count </th><td> 19 Tasks </td><td> 1 Chunks </td></tr>\n",
" <tr><th> Type </th><td> float32 </td><td> numpy.ndarray </td></tr>\n",
" </tbody>\n",
"</table>\n",
"</td>\n",
"<td>\n",
"<svg width=\"170\" height=\"75\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
"\n",
" <!-- Horizontal lines -->\n",
" <line x1=\"0\" y1=\"0\" x2=\"120\" y2=\"0\" style=\"stroke-width:2\" />\n",
" <line x1=\"0\" y1=\"25\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Vertical lines -->\n",
" <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"25\" style=\"stroke-width:2\" />\n",
" <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Colored Rectangle -->\n",
" <polygon points=\"0.0,0.0 120.0,0.0 120.0,25.412616514582485 0.0,25.412616514582485\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
"\n",
" <!-- Text -->\n",
" <text x=\"60.000000\" y=\"45.412617\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >20810</text>\n",
" <text x=\"140.000000\" y=\"12.706308\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,140.000000,12.706308)\">14</text>\n",
"</svg>\n",
"</td>\n",
"</tr>\n",
"</table></div></li><li class='xr-var-item'><div class='xr-var-name'><span>t_max</span></div><div class='xr-var-dims'>(time, z)</div><div class='xr-var-dtype'>float32</div><div class='xr-var-preview xr-preview'>dask.array&lt;chunksize=(14, 20810), meta=np.ndarray&gt;</div><input id='attrs-6e4b6c9b-da96-4eff-99ac-592511ddf7af' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-6e4b6c9b-da96-4eff-99ac-592511ddf7af' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-3d84daba-2c53-4c10-bef4-a01873cc0d30' class='xr-var-data-in' type='checkbox'><label for='data-3d84daba-2c53-4c10-bef4-a01873cc0d30' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>standard_name :</span></dt><dd>tmmx</dd><dt><span>long_name :</span></dt><dd>tmmx</dd><dt><span>units :</span></dt><dd>K</dd><dt><span>description :</span></dt><dd>Daily Maximum Temperature (2m)</dd><dt><span>dimensions :</span></dt><dd>lon lat time</dd><dt><span>coordinate_system :</span></dt><dd>WGS84,EPSG:4326</dd></dl></div><div class='xr-var-data'><table>\n",
"<tr>\n",
"<td>\n",
"<table>\n",
" <thead>\n",
" <tr><td> </td><th> Array </th><th> Chunk </th></tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr><th> Bytes </th><td> 1.17 MB </td> <td> 1.17 MB </td></tr>\n",
" <tr><th> Shape </th><td> (14, 20810) </td> <td> (14, 20810) </td></tr>\n",
" <tr><th> Count </th><td> 19 Tasks </td><td> 1 Chunks </td></tr>\n",
" <tr><th> Type </th><td> float32 </td><td> numpy.ndarray </td></tr>\n",
" </tbody>\n",
"</table>\n",
"</td>\n",
"<td>\n",
"<svg width=\"170\" height=\"75\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
"\n",
" <!-- Horizontal lines -->\n",
" <line x1=\"0\" y1=\"0\" x2=\"120\" y2=\"0\" style=\"stroke-width:2\" />\n",
" <line x1=\"0\" y1=\"25\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Vertical lines -->\n",
" <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"25\" style=\"stroke-width:2\" />\n",
" <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Colored Rectangle -->\n",
" <polygon points=\"0.0,0.0 120.0,0.0 120.0,25.412616514582485 0.0,25.412616514582485\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
"\n",
" <!-- Text -->\n",
" <text x=\"60.000000\" y=\"45.412617\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >20810</text>\n",
" <text x=\"140.000000\" y=\"12.706308\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,140.000000,12.706308)\">14</text>\n",
"</svg>\n",
"</td>\n",
"</tr>\n",
"</table></div></li><li class='xr-var-item'><div class='xr-var-name'><span>prec</span></div><div class='xr-var-dims'>(time, z)</div><div class='xr-var-dtype'>float32</div><div class='xr-var-preview xr-preview'>dask.array&lt;chunksize=(14, 20810), meta=np.ndarray&gt;</div><input id='attrs-d43674c1-b21b-49b1-92bc-a0c9ac1a7ba8' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-d43674c1-b21b-49b1-92bc-a0c9ac1a7ba8' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-1f525852-f605-4e5b-8528-6c3f48f82ad1' class='xr-var-data-in' type='checkbox'><label for='data-1f525852-f605-4e5b-8528-6c3f48f82ad1' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>standard_name :</span></dt><dd>pr</dd><dt><span>long_name :</span></dt><dd>pr</dd><dt><span>units :</span></dt><dd>mm</dd><dt><span>description :</span></dt><dd>Daily Accumulated Precipitation</dd><dt><span>dimensions :</span></dt><dd>lon lat time</dd><dt><span>coordinate_system :</span></dt><dd>WGS84,EPSG:4326</dd></dl></div><div class='xr-var-data'><table>\n",
"<tr>\n",
"<td>\n",
"<table>\n",
" <thead>\n",
" <tr><td> </td><th> Array </th><th> Chunk </th></tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr><th> Bytes </th><td> 1.17 MB </td> <td> 1.17 MB </td></tr>\n",
" <tr><th> Shape </th><td> (14, 20810) </td> <td> (14, 20810) </td></tr>\n",
" <tr><th> Count </th><td> 19 Tasks </td><td> 1 Chunks </td></tr>\n",
" <tr><th> Type </th><td> float32 </td><td> numpy.ndarray </td></tr>\n",
" </tbody>\n",
"</table>\n",
"</td>\n",
"<td>\n",
"<svg width=\"170\" height=\"75\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
"\n",
" <!-- Horizontal lines -->\n",
" <line x1=\"0\" y1=\"0\" x2=\"120\" y2=\"0\" style=\"stroke-width:2\" />\n",
" <line x1=\"0\" y1=\"25\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Vertical lines -->\n",
" <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"25\" style=\"stroke-width:2\" />\n",
" <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Colored Rectangle -->\n",
" <polygon points=\"0.0,0.0 120.0,0.0 120.0,25.412616514582485 0.0,25.412616514582485\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
"\n",
" <!-- Text -->\n",
" <text x=\"60.000000\" y=\"45.412617\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >20810</text>\n",
" <text x=\"140.000000\" y=\"12.706308\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,140.000000,12.706308)\">14</text>\n",
"</svg>\n",
"</td>\n",
"</tr>\n",
"</table></div></li><li class='xr-var-item'><div class='xr-var-name'><span>elev</span></div><div class='xr-var-dims'>(z)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>1.817e+03 1.936e+03 ... 1.969e+03</div><input id='attrs-3cab6fb5-3c78-4acb-89ef-975518000c96' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-3cab6fb5-3c78-4acb-89ef-975518000c96' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-0abbb0a3-076d-41d8-b3f1-a9ec8b3b4a9f' class='xr-var-data-in' type='checkbox'><label for='data-0abbb0a3-076d-41d8-b3f1-a9ec8b3b4a9f' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>units :</span></dt><dd>m</dd><dt><span>description :</span></dt><dd>Average elevation of grid cell</dd><dt><span>long_name :</span></dt><dd>elev</dd></dl></div><div class='xr-var-data'><pre>array([1816.9 , 1935.8 , 1794.9 , ..., 1926.0928, 1790.6862,\n",
" 1969.4869])</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span>annual_prec</span></div><div class='xr-var-dims'>(z)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>458.9 466.3 288.1 ... 540.8 540.8</div><input id='attrs-17590be0-da46-4ede-869b-91b634fc1b68' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-17590be0-da46-4ede-869b-91b634fc1b68' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-f299fbe3-7f6a-431a-9f5d-5dae1aaa1f29' class='xr-var-data-in' type='checkbox'><label for='data-f299fbe3-7f6a-431a-9f5d-5dae1aaa1f29' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>units :</span></dt><dd>mm</dd><dt><span>description :</span></dt><dd>Average annual precipitation.</dd><dt><span>long_name :</span></dt><dd>annual_prec</dd></dl></div><div class='xr-var-data'><pre>array([ 458.894, 466.341, 288.096, ..., 1406.53 , 540.784, 540.784])</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-0de313cd-1137-49ea-afef-3fc8ae68be0e' class='xr-section-summary-in' type='checkbox' disabled ><label for='section-0de313cd-1137-49ea-afef-3fc8ae68be0e' class='xr-section-summary' title='Expand/collapse section'>Attributes: <span>(0)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><dl class='xr-attrs'></dl></div></li></ul></div></div>"
],
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (nlayer: 3, time: 14, z: 20810)\n",
"Coordinates:\n",
" * time (time) datetime64[ns] 2021-01-16 2021-01-17 ... 2021-01-29\n",
" * z (z) MultiIndex\n",
" - lat (z) float64 41.22 41.28 41.28 41.28 ... 49.09 49.09 49.09\n",
" - lon (z) float64 -116.2 -116.3 -116.2 ... -114.3 -114.2 -114.2\n",
"Dimensions without coordinates: nlayer\n",
"Data variables:\n",
" soil_density (nlayer, z) float64 2.62e+03 2.62e+03 ... 2.62e+03 2.62e+03\n",
" t_min (time, z) float32 dask.array<chunksize=(14, 20810), meta=np.ndarray>\n",
" t_max (time, z) float32 dask.array<chunksize=(14, 20810), meta=np.ndarray>\n",
" prec (time, z) float32 dask.array<chunksize=(14, 20810), meta=np.ndarray>\n",
" elev (z) float64 1.817e+03 1.936e+03 ... 1.791e+03 1.969e+03\n",
" annual_prec (z) float64 458.9 466.3 288.1 288.1 ... 1.407e+03 540.8 540.8"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ant_days = 14\n",
"ts = slice(time - pd.Timedelta(f'{ant_days-1}D'), time)\n",
"\n",
"input_vars = ['soil_density', 't_min', 't_max', 'prec', 'elev', 'annual_prec']\n",
"domain = xr.open_dataset('./input/domain.vic.us_columbia0.0625deg.20180117.nc')\n",
"params = xr.open_dataset('./input/us_columbia_0.0625deg.vic_5.0.0_parameters.nc')\n",
"met = xr.open_mfdataset(['./input/metGrid.nc', './input/metGridstate.nc'])\n",
"input_ds = xr.merge([domain, params, met])\n",
"input_ds = input_ds[input_vars]\n",
"input_ds = input_ds.sel(time=ts).stack(z=['lat', 'lon']).sel(z=target_ds['z'])\n",
"input_ds"
]
},
{
"cell_type": "markdown",
"id": "5cbd36e7-9832-45fd-a698-13efa1f93403",
"metadata": {},
"source": [
"### Setting up the input data continued\n",
"\n",
"Now, with a nicely preprocessed dataset we need to morph this into a tabular dataframe so that XGBoost knows how to handle it. To do so I make a column for each of the soil layers `soil_density` values, labeling them `soil_density_i` where `i` is the index of the depth layer. Then, for each of the other input variables I pull them out into columns. The variables where there is time dependence have each of the antecedent days (up to 14) pulled out separately into columns which are labeled `varname_timestep`. This is all concatenated together, giving a total of 47 input features. Explicitly they are:\n",
"\n",
"* 1 for elevation\n",
"* 1 for annual mean precipitation\n",
"* 14 for daily minimum temperature\n",
"* 14 for daily maximum temperature\n",
"* 14 for daily total precipitation\n",
"* 3 for soil layer density"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "25789f2b-d73c-4d23-abdd-4953d5c182db",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>t_min_0</th>\n",
" <th>t_min_1</th>\n",
" <th>t_min_2</th>\n",
" <th>t_min_3</th>\n",
" <th>t_min_4</th>\n",
" <th>t_min_5</th>\n",
" <th>t_min_6</th>\n",
" <th>t_min_7</th>\n",
" <th>t_min_8</th>\n",
" <th>t_min_9</th>\n",
" <th>...</th>\n",
" <th>prec_9</th>\n",
" <th>prec_10</th>\n",
" <th>prec_11</th>\n",
" <th>prec_12</th>\n",
" <th>prec_13</th>\n",
" <th>elev</th>\n",
" <th>annual_prec</th>\n",
" <th>soil_density_0</th>\n",
" <th>soil_density_1</th>\n",
" <th>soil_density_2</th>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>41.21875</th>\n",
" <th>-116.21875</th>\n",
" <td>-5.400009</td>\n",
" <td>-5.699997</td>\n",
" <td>-6.400009</td>\n",
" <td>-9.900009</td>\n",
" <td>-10.600006</td>\n",
" <td>-6.100006</td>\n",
" <td>-3.699997</td>\n",
" <td>-9.199997</td>\n",
" <td>-11.500000</td>\n",
" <td>-11.699997</td>\n",
" <td>...</td>\n",
" <td>1.2</td>\n",
" <td>0.7</td>\n",
" <td>5.8</td>\n",
" <td>3.6</td>\n",
" <td>3.2</td>\n",
" <td>1816.9000</td>\n",
" <td>458.894</td>\n",
" <td>2620.28</td>\n",
" <td>2620.28</td>\n",
" <td>2620.28</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">41.28125</th>\n",
" <th>-116.28125</th>\n",
" <td>-5.000000</td>\n",
" <td>-4.900009</td>\n",
" <td>-7.800003</td>\n",
" <td>-9.800003</td>\n",
" <td>-9.400009</td>\n",
" <td>-4.900009</td>\n",
" <td>-4.500000</td>\n",
" <td>-13.000000</td>\n",
" <td>-14.800003</td>\n",
" <td>-12.300003</td>\n",
" <td>...</td>\n",
" <td>1.6</td>\n",
" <td>0.6</td>\n",
" <td>5.5</td>\n",
" <td>3.7</td>\n",
" <td>3.3</td>\n",
" <td>1935.8000</td>\n",
" <td>466.341</td>\n",
" <td>2620.28</td>\n",
" <td>2620.28</td>\n",
" <td>2620.28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-116.21875</th>\n",
" <td>-5.199997</td>\n",
" <td>-5.600006</td>\n",
" <td>-6.699997</td>\n",
" <td>-9.800003</td>\n",
" <td>-10.400009</td>\n",
" <td>-5.699997</td>\n",
" <td>-3.900009</td>\n",
" <td>-10.199997</td>\n",
" <td>-12.300003</td>\n",
" <td>-11.500000</td>\n",
" <td>...</td>\n",
" <td>1.3</td>\n",
" <td>0.7</td>\n",
" <td>5.6</td>\n",
" <td>3.1</td>\n",
" <td>1.7</td>\n",
" <td>1794.9000</td>\n",
" <td>288.096</td>\n",
" <td>2642.96</td>\n",
" <td>2642.96</td>\n",
" <td>2642.96</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-116.15625</th>\n",
" <td>-6.100006</td>\n",
" <td>-6.500000</td>\n",
" <td>-5.600006</td>\n",
" <td>-10.400009</td>\n",
" <td>-11.699997</td>\n",
" <td>-7.199997</td>\n",
" <td>-3.800003</td>\n",
" <td>-8.000000</td>\n",
" <td>-10.500000</td>\n",
" <td>-12.300003</td>\n",
" <td>...</td>\n",
" <td>0.6</td>\n",
" <td>0.7</td>\n",
" <td>5.7</td>\n",
" <td>3.3</td>\n",
" <td>3.2</td>\n",
" <td>1775.0000</td>\n",
" <td>288.096</td>\n",
" <td>2642.96</td>\n",
" <td>2642.96</td>\n",
" <td>2642.96</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-116.09375</th>\n",
" <td>-6.400009</td>\n",
" <td>-6.400009</td>\n",
" <td>-7.000000</td>\n",
" <td>-10.600006</td>\n",
" <td>-10.699997</td>\n",
" <td>-6.600006</td>\n",
" <td>-4.500000</td>\n",
" <td>-10.600006</td>\n",
" <td>-13.100006</td>\n",
" <td>-13.000000</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.6</td>\n",
" <td>4.4</td>\n",
" <td>3.0</td>\n",
" <td>4.6</td>\n",
" <td>1971.8766</td>\n",
" <td>461.644</td>\n",
" <td>2620.28</td>\n",
" <td>2620.28</td>\n",
" <td>2620.28</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 47 columns</p>\n",
"</div>"
],
"text/plain": [
" t_min_0 t_min_1 t_min_2 t_min_3 t_min_4 \\\n",
"lat lon \n",
"41.21875 -116.21875 -5.400009 -5.699997 -6.400009 -9.900009 -10.600006 \n",
"41.28125 -116.28125 -5.000000 -4.900009 -7.800003 -9.800003 -9.400009 \n",
" -116.21875 -5.199997 -5.600006 -6.699997 -9.800003 -10.400009 \n",
" -116.15625 -6.100006 -6.500000 -5.600006 -10.400009 -11.699997 \n",
" -116.09375 -6.400009 -6.400009 -7.000000 -10.600006 -10.699997 \n",
"\n",
" t_min_5 t_min_6 t_min_7 t_min_8 t_min_9 ... \\\n",
"lat lon ... \n",
"41.21875 -116.21875 -6.100006 -3.699997 -9.199997 -11.500000 -11.699997 ... \n",
"41.28125 -116.28125 -4.900009 -4.500000 -13.000000 -14.800003 -12.300003 ... \n",
" -116.21875 -5.699997 -3.900009 -10.199997 -12.300003 -11.500000 ... \n",
" -116.15625 -7.199997 -3.800003 -8.000000 -10.500000 -12.300003 ... \n",
" -116.09375 -6.600006 -4.500000 -10.600006 -13.100006 -13.000000 ... \n",
"\n",
" prec_9 prec_10 prec_11 prec_12 prec_13 elev \\\n",
"lat lon \n",
"41.21875 -116.21875 1.2 0.7 5.8 3.6 3.2 1816.9000 \n",
"41.28125 -116.28125 1.6 0.6 5.5 3.7 3.3 1935.8000 \n",
" -116.21875 1.3 0.7 5.6 3.1 1.7 1794.9000 \n",
" -116.15625 0.6 0.7 5.7 3.3 3.2 1775.0000 \n",
" -116.09375 0.0 0.6 4.4 3.0 4.6 1971.8766 \n",
"\n",
" annual_prec soil_density_0 soil_density_1 \\\n",
"lat lon \n",
"41.21875 -116.21875 458.894 2620.28 2620.28 \n",
"41.28125 -116.28125 466.341 2620.28 2620.28 \n",
" -116.21875 288.096 2642.96 2642.96 \n",
" -116.15625 288.096 2642.96 2642.96 \n",
" -116.09375 461.644 2620.28 2620.28 \n",
"\n",
" soil_density_2 \n",
"lat lon \n",
"41.21875 -116.21875 2620.28 \n",
"41.28125 -116.28125 2620.28 \n",
" -116.21875 2642.96 \n",
" -116.15625 2642.96 \n",
" -116.09375 2620.28 \n",
"\n",
"[5 rows x 47 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"layer_var = input_ds['soil_density']\n",
"unraveled = []\n",
"for i in range(len(layer_var['nlayer'])):\n",
" unraveled.append(layer_var.isel(nlayer=i).rename( f'{layer_var.name}_{i}')) \n",
"unraveled = xr.merge(unraveled).drop('z')\n",
"soil_layer_dfs = [pd.DataFrame(unraveled[v].values.T) for v in unraveled.variables]\n",
"\n",
"for v, df in zip(unraveled.variables, soil_layer_dfs):\n",
" df.columns = [v]\n",
" df.index = target_df.index\n",
"\n",
"df_list = []\n",
"for v in input_vars:\n",
" if v == 'soil_density':\n",
" continue\n",
" var_df = pd.DataFrame(input_ds[v].values.T)\n",
" if 'time' in input_ds[v].dims:\n",
" var_df.columns = [f'{v}_{i}' for i in range(ant_days)]\n",
" else:\n",
" var_df.columns = [v]\n",
" var_df.index = target_df.index\n",
" df_list.append(var_df)\n",
"\n",
"df_list = [*df_list, *soil_layer_dfs]\n",
"input_df = pd.concat(df_list, axis=1)\n",
"input_df.head()"
]
},
{
"cell_type": "markdown",
"id": "ff1b76d5-b1fe-408a-bd37-a94060620842",
"metadata": {},
"source": [
"# Splitting out the test/validation data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "522c72fb-c318-41c6-a524-04e7b5c38b02",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training dataset has shapes: (16648, 47) (16648, 1)\n",
"Validation dataset has shapes: (4162, 47) (4162, 1)\n"
]
}
],
"source": [
"idx_randomizer = np.arange(len(input_df))\n",
"np.random.shuffle(idx_randomizer)\n",
"valid_frac = 0.2\n",
"valid_idx = int((1-valid_frac) * len(idx_randomizer))\n",
"test_idx, valid_idx = idx_randomizer[0:valid_idx], idx_randomizer[valid_idx:]\n",
"\n",
"X_train = input_df.iloc[test_idx]\n",
"y_train = target_df.iloc[test_idx]\n",
"\n",
"X_valid = input_df.iloc[valid_idx]\n",
"y_valid = target_df.iloc[valid_idx]\n",
"print('Training dataset has shapes: ', X_train.shape, y_train.shape)\n",
"print('Validation dataset has shapes: ', X_valid.shape, y_valid.shape)"
]
},
{
"cell_type": "markdown",
"id": "f23ea507-3fa0-4d57-9af5-2f625ee2f92c",
"metadata": {},
"source": [
"# Create the XGBoost regressor, and train it"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "hydraulic-slope",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
" colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
" importance_type='gain', interaction_constraints='',\n",
" learning_rate=0.03, max_delta_step=0, max_depth=6,\n",
" min_child_weight=1, missing=nan, monotone_constraints='()',\n",
" n_estimators=10000, n_jobs=12, num_parallel_tree=1, random_state=0,\n",
" reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\n",
" tree_method='exact', validate_parameters=1, verbosity=None)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = xgb.XGBRegressor(\n",
" objective='reg:squarederror', \n",
" max_depth=6, \n",
" n_estimators=10000, \n",
" n_jobs=12, \n",
" learning_rate=0.03\n",
")\n",
"model.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"id": "46094d67-fb5d-4b30-a4d2-6e7b40e6f7ef",
"metadata": {},
"source": [
"# Make a prediction and simple plot!"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "worst-violin",
"metadata": {},
"outputs": [],
"source": [
"y_hat_xgb = model.predict(X_valid)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "spectacular-grave",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7fbed2e9aa58>]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.scatter(y_valid, y_hat_xgb, alpha=0.3)\n",
"plt.xlabel('VIC Soil Moisture Percentile')\n",
"plt.ylabel('XGBoost Predicted Soil Moisture Percentile')\n",
"plt.plot([0, 100], [0,100], color='black')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "alert-horse",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "all",
"language": "python",
"name": "all"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment