Skip to content

Instantly share code, notes, and snippets.

@ougx
Last active April 28, 2017 00:53
Show Gist options
  • Save ougx/53e14953d2572912e00df30a5c61afb8 to your computer and use it in GitHub Desktop.
Save ougx/53e14953d2572912e00df30a5c61afb8 to your computer and use it in GitHub Desktop.
Procedure to convert VIC output from the ascii file format to the netcdf file format

We are going to use xarray instead of ScentificPython to generate netcdf file. The reason of doing so is that ScientificPython is obsolete.

Step 1. Activate Anaconda

Anaconda: https://www.continuum.io/Anaconda-Overview

To load Anaconda on our cluster:

module load anaconda

Step 2. Create a conda environment

You need to have a name for this conda environment. I am using pylab here but you can use whatever you want.

conda create -n pylab python=3.4

After that, enter this conda environment.

source activate pylab

Your shell command should look like this: (pylab) [[email protected] ~]$

We are going to use the newer packages to handle netcdf file.

conda install scipy xarray pandas matplotlib spyder

press y when asked

Step 3. Write your own python script

Load the Python IDE spyder

spyder

A new window should pop up. The left side is the editor and the lower right area is the Python console. The results including plots will be shown immediately in the console. Let's do a spinning test:

  1. Go to https://matplotlib.org/gallery.html
  2. Click one of the plots to see the souce code
  3. Copy the source code to the spyder editor
  4. Execute the code and observe the output in the console

Step 4. Convert ascii output to netcdf

In this step, we will convert the VIC output from the ascii to the netcdf file fotmat. We will use xarray, a relatively newer and advanced package, to handle netCDF files.

import os
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt

#%% input section

# netcdf file name you want to save; it must wnd with the file extension '.nc' or '.nc4'
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'

# path of the VIC output directory that contains flux*s
output_dir = '/work/nres879/ougengxin/EXAMPLE/output_txt'


#%% convert function

# columns in the VIC output file
var_names = ['PREC', 'ET', 'RO', 'BF', 'TEMP', 'SM1', 'SM2', 'SM3', 'SWE']

def vic_arcii_nc(flux_dir):
    # obtaining latitudes and longitudes lists 
    lat = []; lon = []
    for f in os.listdir(flux_dir):
        if f.startswith('fluxes_'):
            latlon = f.split('_')
            lat.append(float(latlon[-2]))
            lon.append(float(latlon[-1]))
    
    lats = list(set(lat))
    lons = list(set(lon))
    lons.sort()
    lats.sort()
    
    # obtain time
    df = pd.read_table(os.path.join(flux_dir, f), header=None)
    times = [pd.Timestamp(int(r[0]), int(r[1]), int(r[2])) for i, r in df.iterrows()]
    
    # create the netcdf with dimensions
    nc = xr.Dataset()    
    nc['time'] = times
    nc['lon'] = lons
    nc['lat'] = lats
    dims = ['time', 'lat', 'lon']
    shapes = [nc.dims[k] for k in dims]
    
    # create empty variables      
    for v in var_names:
        nc[v] = dims, np.full(shapes, np.nan)

    # read data
    nvar = len(var_names)
    for f in os.listdir(flux_dir):
        if f.startswith('fluxes_'):
            latlon = f.split('_')
            df = pd.read_table(os.path.join(flux_dir, f), header=None)
            for i in range(nvar):
                nc[var_names[i]].loc[:, float(latlon[-2]), float(latlon[-1])] = \ 
                    df.iloc[:, 3 + i].values   
    
    return nc

#%% save the netcdf file
nc = vic_arcii_nc(output_dir)
nc.to_netcdf(ncfile)

This script may require substantial computational resources. It is recommended to run by the job scheduler.

#!/bin/sh
#SBATCH --time=99:00:00          # Run time in hh:mm:ss
#SBATCH --job-name=txt2nc
 
module load anaconda
source activate pylab
python /work/nres879/yourname/txt2nc.py

Step 5. Make plot

import os
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt

#%% input section
# netcdf file name you saved
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'
nc = xr.open_dataset(ncfile)

#%% make plots
plot_var = 'ET'
plot_time = '1950-01-01'
nc[plot_var].sel(time=plot_time).plot()

Step 6. Save plots

#%% save plots
plot_file = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.png'
plot_var = 'ET'
plot_time = '1950-01-01'
fig, ax = plt.subplots()
nc[plot_var].sel(time=plot_time).plot(ax=ax)
fig.savefig(plot_file)

Step 7. Some common task with netcdf

Step 7.1. Maths

Say we want to do aggregation of the first three day data.

date1 = '1950-01-01'
date2 = '1950-01-02'
date3 = '1950-01-03'

nc_3day_sum = nc[plot_var].sel(time=date1) + nc[plot_var].sel(time=date2) + nc[plot_var].sel(time=date3) 
nc_3day_mean = nc_3day_sum / 3.
7.2 Subsetting

Select the temperature in the summer (Jun, Jul, Aug) of 1950

nc_summer = nc['TEMP'].sel(time=slice('1950-06-01','1950-08-31'))
nc_summer

excecise: subset your model area to center the half of the original model area

7.3 Aggregation A

To calculate the mean precip of the whole basin:

nc_basin_mean = nc[plot_var].mean(dim=['lat','lon'])
nc_basin_mean.plot()

FUNCTIONS: all any argmax argmin max mean median min prod sum std var

7.4 Aggregation B

Smart way to do Step 7.1:

nc_et_weekly_sum_all = nc[plot_var].resample(freq='W', dim='time', how='sum')
nc_et_weekly_mean_all = nc[plot_var].resample(freq='W', dim='time', how='mean')

Other frequency you can use:

  • ‘AS’: year start
  • ‘QS-DEC’: quarterly, starting on December 1
  • ‘MS’: month start
  • ‘D’: day
  • ‘H’: hour
  • ‘Min’: minute
  • ‘A’: annual
7.5 Plot the monthly precip of each month

The following script made a plot showing the mean monthly (12 month) plot_var between plot_time1 and plot_time2.

import os
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import calendar
#%% input section
#%% input section
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'
plot_var = 'PREC'  # precipitation
plot_time1 = '1950-01-01'
plot_time2 = '1955-12-31'
plot_file = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_12prec.png'

nc = xr.open_dataset(ncfile)
nc_prec = nc[plot_var].sel(time=slice(plot_time1, plot_time2))
nc_prec.time.values = nc_prec['time.month'].values
nc_prec = nc_prec.groupby('time', squeeze=False).mean(dim='time')

f = nc_prec.rename({'time':'month'}).plot(x='lon', y='lat', col='month', col_wrap=4, figsize=(12,8))
f.fig.savefig(plot_file) 
7.6 Plot time series and add trend line
#%% time series and trend line
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
ncfile = '/work/nres879/ougengxin/EXAMPLE/output_nc/vic_flux.nc'
plot_var = 'PREC'  # precipitation
plot_time1 = '1950-01-01'
plot_time2 = '1955-12-31'
plot_file = '/work/nres879/ougengxin/EXAMPLE/output_nc/time_series_trend.png'


nc = xr.open_dataset(ncfile)
nc_prec = nc[plot_var].sel(time=slice(plot_time1, plot_time2)).mean(dim=['lat', 'lon'])
nc_prec = nc_prec.resample(freq='M', dim='time', how='sum')

x = range(0, nc_prec.shape[0])
coeff = np.polyfit(x, nc_prec.values, 1)

y = x * coeff[0] + coeff[1]
fig, ax = plt.subplots(figsize=(8,5))
nc_prec.plot(ax=ax)
ax.plot(nc_prec.time.values, y, label='Trend')
fig.savefig(plot_file) 

Homework:

Undergrad

Plot the time series of monthly precip, ET, runoff and baseflow of the basin, respectively

hint:

  1. calculate the basin mean by nc.mean()
  2. calculate the monthly sum by nc.resample()
Grad

Calculate the annual means of precip, ET, total soil moisture, runoff, baseflow, swe

Calculate error = precip - ET - runoff - baseflow

Plot the annual error

hint:

  1. aggregate to annual (resample with 'A' freq)
  2. calculate the mean like nc.mean(dim='time')
  3. calculate the error of the annual water balance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment