We are going to use xarray instead of ScentificPython to generate netcdf file. The reason of doing so is that ScientificPython is obsolete.
Anaconda: https://www.continuum.io/Anaconda-Overview
To load Anaconda on our cluster:
module load anaconda
You need to have a name for this conda environment. I am using pylab here but you can use whatever you want.
conda create -n pylab python=3.4
After that, enter this conda environment.
source activate pylab
Your shell command should look like this: (pylab) [[email protected] ~]$
We are going to use the newer packages to handle netcdf
file.
conda install scipy xarray pandas matplotlib spyder
press y
when asked
Load the Python IDE spyder
spyder
A new window should pop up. The left side is the editor and the lower right area is the Python
console. The results including plots will be shown immediately in the console. Let's do a spinning test:
- Go to https://matplotlib.org/gallery.html
- Click one of the plots to see the souce code
- Copy the source code to the spyder editor
- Execute the code and observe the output in the console
In this step, we will convert the VIC output from the ascii to the netcdf file fotmat. We will use xarray, a relatively newer and advanced package, to handle netCDF files.
import os
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
#%% input section
# netcdf file name you want to save; it must wnd with the file extension '.nc' or '.nc4'
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'
# path of the VIC output directory that contains flux*s
output_dir = '/work/nres879/ougengxin/EXAMPLE/output_txt'
#%% convert function
# columns in the VIC output file
var_names = ['PREC', 'ET', 'RO', 'BF', 'TEMP', 'SM1', 'SM2', 'SM3', 'SWE']
def vic_arcii_nc(flux_dir):
# obtaining latitudes and longitudes lists
lat = []; lon = []
for f in os.listdir(flux_dir):
if f.startswith('fluxes_'):
latlon = f.split('_')
lat.append(float(latlon[-2]))
lon.append(float(latlon[-1]))
lats = list(set(lat))
lons = list(set(lon))
lons.sort()
lats.sort()
# obtain time
df = pd.read_table(os.path.join(flux_dir, f), header=None)
times = [pd.Timestamp(int(r[0]), int(r[1]), int(r[2])) for i, r in df.iterrows()]
# create the netcdf with dimensions
nc = xr.Dataset()
nc['time'] = times
nc['lon'] = lons
nc['lat'] = lats
dims = ['time', 'lat', 'lon']
shapes = [nc.dims[k] for k in dims]
# create empty variables
for v in var_names:
nc[v] = dims, np.full(shapes, np.nan)
# read data
nvar = len(var_names)
for f in os.listdir(flux_dir):
if f.startswith('fluxes_'):
latlon = f.split('_')
df = pd.read_table(os.path.join(flux_dir, f), header=None)
for i in range(nvar):
nc[var_names[i]].loc[:, float(latlon[-2]), float(latlon[-1])] = \
df.iloc[:, 3 + i].values
return nc
#%% save the netcdf file
nc = vic_arcii_nc(output_dir)
nc.to_netcdf(ncfile)
This script may require substantial computational resources. It is recommended to run by the job scheduler.
#!/bin/sh
#SBATCH --time=99:00:00 # Run time in hh:mm:ss
#SBATCH --job-name=txt2nc
module load anaconda
source activate pylab
python /work/nres879/yourname/txt2nc.py
import os
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
#%% input section
# netcdf file name you saved
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'
nc = xr.open_dataset(ncfile)
#%% make plots
plot_var = 'ET'
plot_time = '1950-01-01'
nc[plot_var].sel(time=plot_time).plot()
#%% save plots
plot_file = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.png'
plot_var = 'ET'
plot_time = '1950-01-01'
fig, ax = plt.subplots()
nc[plot_var].sel(time=plot_time).plot(ax=ax)
fig.savefig(plot_file)
Say we want to do aggregation of the first three day data.
date1 = '1950-01-01'
date2 = '1950-01-02'
date3 = '1950-01-03'
nc_3day_sum = nc[plot_var].sel(time=date1) + nc[plot_var].sel(time=date2) + nc[plot_var].sel(time=date3)
nc_3day_mean = nc_3day_sum / 3.
Select the temperature in the summer (Jun, Jul, Aug) of 1950
nc_summer = nc['TEMP'].sel(time=slice('1950-06-01','1950-08-31'))
nc_summer
excecise: subset your model area to center the half of the original model area
To calculate the mean precip of the whole basin:
nc_basin_mean = nc[plot_var].mean(dim=['lat','lon'])
nc_basin_mean.plot()
FUNCTIONS: all any argmax argmin max mean median min prod sum std var
Smart way to do Step 7.1:
nc_et_weekly_sum_all = nc[plot_var].resample(freq='W', dim='time', how='sum')
nc_et_weekly_mean_all = nc[plot_var].resample(freq='W', dim='time', how='mean')
Other frequency you can use:
- ‘AS’: year start
- ‘QS-DEC’: quarterly, starting on December 1
- ‘MS’: month start
- ‘D’: day
- ‘H’: hour
- ‘Min’: minute
- ‘A’: annual
The following script made a plot showing the mean monthly (12 month) plot_var
between plot_time1
and plot_time2
.
import os
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import calendar
#%% input section
#%% input section
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'
plot_var = 'PREC' # precipitation
plot_time1 = '1950-01-01'
plot_time2 = '1955-12-31'
plot_file = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_12prec.png'
nc = xr.open_dataset(ncfile)
nc_prec = nc[plot_var].sel(time=slice(plot_time1, plot_time2))
nc_prec.time.values = nc_prec['time.month'].values
nc_prec = nc_prec.groupby('time', squeeze=False).mean(dim='time')
f = nc_prec.rename({'time':'month'}).plot(x='lon', y='lat', col='month', col_wrap=4, figsize=(12,8))
f.fig.savefig(plot_file)
#%% time series and trend line
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'
plot_var = 'PREC' # precipitation
plot_time1 = '1950-01-01'
plot_time2 = '1955-12-31'
plot_file = '/work/nres879/ougengxin/EXAMPLE/output_nc/time_series_trend.png'
nc = xr.open_dataset(ncfile)
nc_prec = nc[plot_var].sel(time=slice(plot_time1, plot_time2)).mean(dim=['lat', 'lon'])
nc_prec = nc_prec.resample(freq='M', dim='time', how='sum')
x = range(0, nc_prec.shape[0])
coeff = np.polyfit(x, nc_prec.values, 1)
y = x * coeff[0] + coeff[1]
fig, ax = plt.subplots(figsize=(8,5))
nc_prec.plot(ax=ax)
ax.plot(nc_prec.time.values, y, label='Trend')
fig.savefig(plot_file)
Plot the time series of monthly precip, ET, runoff and baseflow of the basin, respectively
hint:
- calculate the basin mean by
nc.mean()
- calculate the monthly sum by
nc.resample()
Calculate the annual means of precip, ET, total soil moisture, runoff, baseflow, swe
Calculate error = precip - ET - runoff - baseflow
Plot the annual error
hint:
- aggregate to annual (resample with 'A' freq)
- calculate the mean like
nc.mean(dim='time')
- calculate the error of the annual water balance