Last active
May 17, 2023 11:44
-
-
Save KMarkert/bdb4596518ff0093aff6e116aa24c127 to your computer and use it in GitHub Desktop.
Example notebook on how to execute BigQuery jobs through a notebook for analyzing the National Water Model data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "bbd692f1", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Copyright 2023 Google LLC\n", | |
"#\n", | |
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n", | |
"# you may not use this file except in compliance with the License.\n", | |
"# You may obtain a copy of the License at\n", | |
"#\n", | |
"# https://www.apache.org/licenses/LICENSE-2.0\n", | |
"#\n", | |
"# Unless required by applicable law or agreed to in writing, software\n", | |
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n", | |
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", | |
"# See the License for the specific language governing permissions and\n", | |
"# limitations under the License." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e92c5ad5-eab6-4392-a093-7637b60e6550", | |
"metadata": {}, | |
"source": [ | |
"# National Water Model on Big Query\n", | |
"\n", | |
"<table align=\"left\">\n", | |
" <td>\n", | |
" <a href=https://colab.research.google.com/gist/KMarkert/bdb4596518ff0093aff6e116aa24c127/nwm_bq_flood_analysis.ipynb>\n", | |
" <img src=https://cloud.google.com/ml-engine/images/colab-logo-32px.png alt=\"Colab logo\">\n", | |
" Run in Colab\n", | |
" </a>\n", | |
" </td>\n", | |
" <td>\n", | |
" <a href=https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://gist.githubusercontent.com/KMarkert/bdb4596518ff0093aff6e116aa24c127/raw/ef533b7748af9f5b9e234818260a65183797c0ba/nwm_bq_flood_analysis.ipynb>\n", | |
" <img src=https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32 alt=\\\"Vertex AI logo\\\">\n", | |
" Open in Vertex AI Workbench\n", | |
" </a>\n", | |
" </td>\n", | |
"</table>\n", | |
"<br/><br/><br/>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "0b0f3b36", | |
"metadata": {}, | |
"source": [ | |
"## Overview\n", | |
"This tutorial demonstrates how to use the BigQuery Python client library to select National Water Model (NWM) data and use for analysis.\n", | |
"\n", | |
"### Objective\n", | |
"\n", | |
"In this tutorial, you learn how to use the `Big Query Python Client` to request NWM data and analyze for a river reach along the Sacramento River in California. \n", | |
"\n", | |
"The steps performed include:\n", | |
"* Run a job to get the historical (2018 to present) streamflow data from the analysis_assim dataset\n", | |
"* Access gauge observations and compare historical data with the observations\n", | |
"* Derive flood levels from historical data\n", | |
"* Run a job to get a forecast for the March 2023 flooding and compare with observations\n", | |
"* Plot different forecasted streamflow statistics against observed data\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "f2ea2fee-acfb-4979-a21c-b1dfefa714bb", | |
"metadata": {}, | |
"source": [ | |
"## Installation\n", | |
"\n", | |
"Install the following packages required to execute this notebook.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "25ab9885", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# install the USGS package for acquiring gauge data\n", | |
"!pip install --upgrade dataretrieval --quiet" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "fa6da301-6dd2-4989-b2c2-ec5adad4f8e2", | |
"metadata": { | |
"tags": [] | |
}, | |
"source": [ | |
"## Authenticate your Google Cloud account\n", | |
"**If you are using Vertex AI Workbench Notebook**, your environment is already authenticated. Skip this step.\n", | |
"\n", | |
"**If you are using Colab**, run the cell below and follow the instructions when prompted to authenticate your account via oAuth." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "2ac80acb-1e82-41a7-a96a-7cd74d291bf0", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# If you are running this notebook in Colab, run this cell and follow the\n", | |
"# instructions to authenticate your GCP account. This allows for you to \n", | |
"# submit Big Query jobs and access the NWM dataset.\n", | |
"\n", | |
"import os\n", | |
"import sys\n", | |
"\n", | |
"# If on Vertex AI Workbench, then don't execute this code\n", | |
"IS_COLAB = \"google.colab\" in sys.modules\n", | |
"if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", | |
" \"DL_ANACONDA_HOME\"\n", | |
"):\n", | |
" if \"google.colab\" in sys.modules:\n", | |
" from google.colab import auth as google_auth\n", | |
"\n", | |
" google_auth.authenticate_user()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b2e6ea92", | |
"metadata": {}, | |
"source": [ | |
"## Import libraries" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "74ab1db1-9a4b-4f97-b03c-72cbfd8d7e72", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import pandas as pd\n", | |
"from scipy import stats\n", | |
"import dataretrieval.nwis as nwis\n", | |
"\n", | |
"import plotly.express as px\n", | |
"import plotly.graph_objects as go\n", | |
"\n", | |
"from google.cloud.bigquery import Client, QueryJobConfig" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "9b34ae03", | |
"metadata": {}, | |
"source": [ | |
"## Visualize streamflow data\n", | |
"\n", | |
"In this section you will request data using Big Query for a reach for the `analysis_assim` dataset and visualize the time series" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ec6171c0-d88d-41ad-91a5-8b9aa0a96539", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# initialize the Big Query client for submitting jobs\n", | |
"client = Client()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "98bc752b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# define the reach feature id to run the queries for\n", | |
"nwm_fid = 15039097" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "155100d7-e063-45f9-9ff3-0be413ec98f8", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# define the BQ query\n", | |
"query = f\"\"\"\n", | |
"SELECT\n", | |
" reference_time,\n", | |
" time,\n", | |
" streamflow\n", | |
"FROM\n", | |
" `ciroh-water-demo.national_water_model_demo.channel_rt_analysis_assim`\n", | |
"WHERE\n", | |
" feature_id = {nwm_fid} \n", | |
" AND forecast_offset = 1\n", | |
"ORDER BY\n", | |
" time\n", | |
"\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "e945d5a3-0a88-4a71-a0db-2d1ed844de2c", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# submit the BQ job and load as a pandas dataframe\n", | |
"job = client.query(query)\n", | |
"df = job.to_dataframe()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "dd996174-3133-4bc1-844d-61a6917015da", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# inspect the resulting dataframe\n", | |
"df.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ae8e8dc6-b96a-4862-b2ef-41f86d6171c1", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# plot the time series\n", | |
"fig = go.Figure()\n", | |
"fig.add_trace(go.Scatter(x=df[\"time\"], y=df[\"streamflow\"], name='NWM analysis_assim'))\n", | |
"fig.update_layout(\n", | |
" title=\"NWM analysis_assim time series\",\n", | |
" yaxis_title = 'streamflow'\n", | |
")\n", | |
"fig.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "111c0948-f724-41c8-9b56-ec670136cef1", | |
"metadata": {}, | |
"source": [ | |
"## Compare data with observed gauge\n", | |
"\n", | |
"It is common practice to visualize the simulated streamflow with the observed data to compare how well the model performs. Here you will request the observed data from the USGS and plot with the `analysis_assim` time series" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "376f69f8-555e-4ab9-9078-93563f52ee3e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# specify the USGS site code for which we want data.\n", | |
"# this site corresponds with the NWM feature id we are using\n", | |
"site = '11425500'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "7fac0338-b080-47e7-ab4b-2d7653edb924", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# get the daily values for the last 40 years from the site with the parameter code 00060 (discharge)\n", | |
"gauge = nwis.get_record(sites=site, service='dv', start='1980-01-01',parameterCd='00060')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "f4355969-6857-4c51-8937-49efce8175e7", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# convert the cubic feet per second to cubic meters per second\n", | |
"# and add a new column for cms discharge\n", | |
"gauge['streamflow'] = gauge['00060_Mean'] * 0.02832" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "91ecd516-23b1-469e-a57a-b498a99e2d95", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# select the time frame from the NWM data\n", | |
"gauge_sel = gauge.loc[gauge.index >= '2018-09-01']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "e325caf6-7d7c-49f3-930c-a88a52821ec7", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# plot both the observed and simulated time series\n", | |
"fig = go.Figure()\n", | |
"fig.add_trace(go.Scatter(x=gauge_sel.index, y=gauge_sel['streamflow'],\n", | |
" name='USGS observed', line=dict(color='black')))\n", | |
"fig.add_trace(go.Scatter(x=df['time'], y=df['streamflow'],\n", | |
" name='NWM simulated'))\n", | |
"fig.update_layout(hovermode=\"x\")\n", | |
"fig.update_layout(\n", | |
" title=\"NWM analysis_assim and observed time series\",\n", | |
" yaxis_title = 'streamflow'\n", | |
")\n", | |
"\n", | |
"fig.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "256e6b54-7c00-4c54-9f0d-9667f9c43465", | |
"metadata": {}, | |
"source": [ | |
"## Derive flood levels from historical data\n", | |
"\n", | |
"Often flood indicators are given by stage height to determine a warning level and when there will be overbank flooding. Another approach is to estimate flood levels through statistical distribution of historic values. In this case, there is no stage height information so you will derive the return periods by fitting a distribution and estimating the streamflow values that correspond to the return period probabilities." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ad8a48bd-8933-4d61-aea2-81faa59e73f5", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# define the distribution model we will use to estimate flood levels\n", | |
"# in this case we will use a Gamma Distribution\n", | |
"distribution_model = stats.gamma" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "231ec4db-672e-4c52-aaac-8c9661aa21d9", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# fit the model to the data and get the parameters\n", | |
"parameters = distribution_model.fit(gauge['streamflow'])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "c23458ad-dcf9-4bf7-bd94-10ff634aa845", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# calculate the PDF curve\n", | |
"x = np.arange(0,gauge['streamflow'].max(),10)\n", | |
"y = distribution_model.pdf(x, *parameters)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "e91b280a-20cb-4baa-a2a3-30aca6a96252", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# plot the histogram from the data and estimated distribution from the model\n", | |
"fig = go.Figure(data=[go.Histogram(x=gauge['streamflow'], \n", | |
" histnorm='probability density',\n", | |
" name='Streamflow distribution')])\n", | |
"fig.add_trace(go.Scatter(x=x, y=y,\n", | |
" name='Fit distribution'))\n", | |
"fig.update_layout(hovermode=\"x\")\n", | |
"fig.update_layout(\n", | |
" title=\"Observed histogram and fitted model\",\n", | |
" yaxis_title = 'probability density'\n", | |
")\n", | |
"\n", | |
"fig.show()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "a762941c-1dc4-470f-b11c-fc5fb63615ec", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# get the flood levels for 10, 25, 50, and 100-year floods from the distribution model\n", | |
"flood_levels = dict()\n", | |
"for i in [10, 25, 50, 100]:\n", | |
" flood_levels[i] = distribution_model.ppf(1-(1/i), *parameters)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "61b53abb-091f-460e-8be5-43fe9bdec16f", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"flood_levels" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "7c2294d1-bb2d-4f86-a9b1-c4f84f9ec020", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# plot both the simulated time series with flood levels\n", | |
"\n", | |
"# define colors for different flood levels\n", | |
"colors = ['yellow', 'orange', 'red', 'magenta']\n", | |
"\n", | |
"# plot the data\n", | |
"fig = go.Figure()\n", | |
"fig.add_trace(go.Scatter(x=df['time'], y=df['streamflow'],\n", | |
" name='NWM simulated'))\n", | |
"for i,(k,v) in enumerate(flood_levels.items()):\n", | |
" fig.add_hline(y=v, name=f'{k}-year flood level',line=dict(color=colors[i]))\n", | |
"\n", | |
"fig.update_layout(hovermode=\"x\")\n", | |
"fig.update_layout(\n", | |
" title=\"NWM analysis_assim flood levels\",\n", | |
" yaxis_title = 'streamflow'\n", | |
")\n", | |
"\n", | |
"fig.show()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "94b340dc-5ff6-419a-a69d-461ce16ba505", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Calculate number of days streamflow has been above a flood level\n", | |
"# Note: this is total days above level, not within theresholds\n", | |
"for k,v in flood_levels.items():\n", | |
" n_exceedance = int(sum(df['streamflow'] > v) / 24)\n", | |
" print(f'Exceeded {k}-year flood level {n_exceedance} days')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "1f747cfe-69d9-4292-ab14-0d504dd6c78b", | |
"metadata": {}, | |
"source": [ | |
"## Analysis for flooding event\n", | |
"\n", | |
"The Sacramento River recently experienced flooding due to high amounts of precipitation for an extended period. Here you will request the data for a reach a couple of days ahead of a flood event in March 2023. These forecast data will be compared with observed values for the event." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "1c4ee7b8-e767-411e-b2b7-c90bae67efce", | |
"metadata": {}, | |
"source": [ | |
"### Visualize multiple ensembles\n", | |
"This first example for inspecting multiple ensembles produced for a single forecast initialization" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "5b81174d-95b7-46e6-801e-14d87bb3e73f", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# specify the forecast datetime of interest\n", | |
"# format: YYYY-MM-dd HH:mm:ss\n", | |
"forecast_date = '2023-03-05'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "4677e6a3-3e9a-4d23-a54b-33e370133e5c", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# define the query to get the forecast\n", | |
"query = f\"\"\"\n", | |
"SELECT\n", | |
" reference_time,\n", | |
" time,\n", | |
" streamflow,\n", | |
" ensemble\n", | |
"FROM\n", | |
" `ciroh-water-demo.national_water_model_demo.channel_rt_long_range`\n", | |
"WHERE\n", | |
" feature_id = {nwm_fid} \n", | |
" AND reference_time = '{forecast_date}'\n", | |
"ORDER BY\n", | |
" time\n", | |
"\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ddbc7b23-8541-42e9-a7a9-fba9441884bd", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# submit the BQ job and load as a pandas dataframe\n", | |
"job = client.query(query)\n", | |
"df_forecast = job.to_dataframe()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "17b8747f-145c-40e2-814d-a515c1afa93f", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# select the observed data\n", | |
"gauge_obs = gauge.loc[(gauge.index >= forecast_date) & (gauge.index <= str(df_forecast['time'].iloc[-1]))]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "24b23de3-5bde-420b-8884-b1e42722845a", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# plot the forecasts with the observed streamflow\n", | |
"fig = px.line(df_forecast, x='time', y=\"streamflow\", color='ensemble')\n", | |
"fig.add_trace(go.Scatter(x=gauge_obs.index, y=gauge_obs['streamflow'],\n", | |
" name='Observation',line=dict(color='black',dash='dash')))\n", | |
"fig.update_layout(hovermode=\"x\")\n", | |
"fig.update_layout(\n", | |
" title=f\"{forecast_date} forecast and observed streamflow\",\n", | |
")\n", | |
"\n", | |
"fig.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "676a68af-d536-41b0-b8bf-92861f2d4dba", | |
"metadata": {}, | |
"source": [ | |
"### Calculate ensemble statistics\n", | |
"\n", | |
"Next, ensemble statistics will be calculated using Big Query for the same forecast time from earlier. It is common to not use the ensembles individually but all together to get an idea of uncertainty." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "2a82c861-ce09-4b06-b449-76ddaaf86c49", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# define the query to get the forecast ensemble stats\n", | |
"query = f\"\"\"\n", | |
"SELECT\n", | |
" reference_time,\n", | |
" time,\n", | |
" AVG(streamflow) as ensemble_mean,\n", | |
" MIN(streamflow) as ensemble_min,\n", | |
" MAX(streamflow) as ensemble_max\n", | |
"FROM\n", | |
" `ciroh-water-demo.national_water_model_demo.channel_rt_long_range`\n", | |
"WHERE\n", | |
" feature_id = {nwm_fid} \n", | |
" AND reference_time = '{forecast_date}'\n", | |
"GROUP BY\n", | |
" time, reference_time\n", | |
"ORDER BY\n", | |
" time\n", | |
"\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "a6ebac27-4b9a-42e4-94d9-04d4d447bdc4", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# submit the BQ job and load as a pandas dataframe\n", | |
"job = client.query(query)\n", | |
"df_ensemble_stats = job.to_dataframe()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "20118dbf-6533-4267-8ce5-8e8afa344010", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# plot the forecasted ensembles with the flood levels\n", | |
"colors = ['yellow','orange','red','magenta']\n", | |
"fig = go.Figure()\n", | |
"\n", | |
"fig.add_trace(go.Scatter(x=df_ensemble_stats['time'], y=df_ensemble_stats['ensemble_min'], line_color='#636EFA',name=\"Ensemble min\",showlegend=False))\n", | |
"fig.add_trace(go.Scatter(x=df_ensemble_stats['time'], y=df_ensemble_stats['ensemble_max'], fill='tonexty', line_color='#636EFA',name=\"Ensemble spread\"))\n", | |
"fig.add_trace(go.Scatter(x=df_ensemble_stats['time'], y=df_ensemble_stats['ensemble_mean'], line_color='darkblue',name=\"Ensemble mean\"))\n", | |
"\n", | |
"\n", | |
"fig.add_trace(go.Scatter(x=gauge_obs.index, y=gauge_obs['streamflow'],\n", | |
" name='Observation',line=dict(color='black',dash='dash')))\n", | |
" \n", | |
"fig.update_layout(\n", | |
" title=f\"{forecast_date} forecast ensemble stats\",\n", | |
" yaxis_title = 'streamflow'\n", | |
")\n", | |
"fig.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4715d689-89d4-4e1e-a4b8-c91a68823909", | |
"metadata": {}, | |
"source": [ | |
"You can see from the figure that the model for this case over predicts the streamflow and misses the timing of flood. The NWM data includes multiple forecast initializations, let's compare different initilizations leading up to the flood with the observations to see if it improves over time." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "14d859e7-8c54-4c5e-9ccb-9876c1777980", | |
"metadata": {}, | |
"source": [ | |
"### Multiple forecast initialization\n", | |
"\n", | |
"As seen in the previous example, that specific forecast may be off so we can inspect the different forecasts for a given ensemble leading up to the flood event to see if forecasts improve. Here you will calculate ensemble statistics for every forecast initialization leading up to the flood and visualize." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "eae69ff2-5631-470c-a6c9-04de6759aecc", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# define the query to get the forecast\n", | |
"query = f\"\"\"\n", | |
"SELECT\n", | |
" time,\n", | |
" AVG(streamflow) as ensemble_mean,\n", | |
" MIN(streamflow) as ensemble_min,\n", | |
" MAX(streamflow) as ensemble_max,\n", | |
" CAST(reference_time as STRING) as reference_time\n", | |
"FROM\n", | |
" `ciroh-water-demo.national_water_model_demo.channel_rt_long_range`\n", | |
"WHERE\n", | |
" feature_id = {nwm_fid} \n", | |
" AND reference_time >= '2023-02-20'\n", | |
" AND reference_time < '2023-03-10'\n", | |
"GROUP BY\n", | |
" time, reference_time\n", | |
"ORDER BY\n", | |
" time\n", | |
"\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "de20c9f7-dc10-4488-a375-32c35ffb3151", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# submit the BQ job and load as a pandas dataframe\n", | |
"job = client.query(query)\n", | |
"df_forecast_time = job.to_dataframe()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "16bd63c8-608f-47f7-8bb3-7f850fa4ab19", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# plot the forecasts with the observed streamflow\n", | |
"\n", | |
"# get a color scale to visualize different initialization times\n", | |
"myscale = px.colors.sample_colorscale(\n", | |
" colorscale=px.colors.sequential.Turbo,\n", | |
" samplepoints=len(df_forecast_time['reference_time'].unique()),\n", | |
" low=0.0,\n", | |
" high=1.0,\n", | |
" colortype=\"rgb\",\n", | |
")\n", | |
"\n", | |
"# plot the data\n", | |
"fig = px.line(df_forecast_time, x='time', y=\"ensemble_mean\", color='reference_time',\n", | |
" color_discrete_sequence=myscale)\n", | |
"fig.add_trace(go.Scatter(x=gauge_obs.index, y=gauge_obs['streamflow'],\n", | |
" name='Observation',line=dict(color='black',dash='dash')))\n", | |
"fig.update_layout(hovermode=\"x\")\n", | |
"fig.update_layout(\n", | |
" title=f\"Forecast initilization ensemble means\",\n", | |
")\n", | |
"\n", | |
"fig.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8c1c8b62-be49-495b-b574-1d042928a19d", | |
"metadata": {}, | |
"source": [ | |
"Seems like all forecasts leading up to the flood missed the timing by a few days and overpredicted the amount of flooding, this paricular example shows that the `long_range` forecasts could be improved for this case. However, these data provide interesting insights nonetheless as to how forecast accuracy may evolve over time." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "5fac5fe2-554b-4ad3-88f3-cb174e8ebbd2", | |
"metadata": {}, | |
"source": [ | |
"## Conclusion\n", | |
"\n", | |
"In this notebook, we explored accessing and using the National Water Model data that in on Big Query. You submitted multiple queries to access historical data as well as forecast data. You analyzed historical data with estimated a flood levels. Furthermore, you accessed and compared forecast data for a recent flood event using multiple queries to look at ensembles and different forecast initializations." | |
] | |
} | |
], | |
"metadata": { | |
"environment": { | |
"kernel": "python3", | |
"name": "common-cpu.m108", | |
"type": "gcloud", | |
"uri": "gcr.io/deeplearning-platform-release/base-cpu:m108" | |
}, | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.12" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment