Skip to content

Instantly share code, notes, and snippets.

@denis-bz
Last active April 23, 2024 15:02
Show Gist options
  • Save denis-bz/89f10092bc5de90d180a912a3a546b3b to your computer and use it in GitHub Desktop.
Save denis-bz/89f10092bc5de90d180a912a3a546b3b to your computer and use it in GitHub Desktop.
Cosmopy: clip and regrid COSMO_REA6 wind data to python numpy

Cosmopy: clip and regrid COSMO_REA6 wind data to python numpy

Keywords: Cosmo_rea6 DWD wind-data python numpy

Purpose: DWD, the German Meteorological Service, has wind data for Europe on a 6 x 6 km grid, from 1995 to August 2019, hourly, at heights 10 100 .. 200m. These are in 2 gigabyte files like
https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/converted/hourly/2D/WS_100m.2D.201908.nc4
-- windspeed data at 100m for 2019 August.

Cosmopy is a bare-bones package to

  1. clip out small areas of these files, e.g. Bayern, to numpy .npz files
  2. regrid the staggered Cosmo grid to a WGS84 uniform lat-lon grid, e.g. 0.05° x 0.05°.

Its design point: small programs with lots of print statements, verbose= . Of course, if you're happy with a particular package, be happy. But if you value clarity above raw speed, and like small solid packages, and like numpy, read on.

Gallery

22apr-cap-Obb-grid05-150m-2014-E-141-4200

17oct-avwind-BY-WS_160m 2D 2014

25oct-wind-cap-BY-3site-150m-2014-E-141-4200

The flow: download, clip, regrid

Files are generated by 3 shell scripts, wget-cosmo clip-cosmo regrid-cosmo, which call python; see their help text, clip-cosmo -h etc. After downloading wih wget-cosmo, clip the gigabyte files to the few percent you want to look at. Then if need be, regrid to a uniform WGS84 grid. Example files in this flow:

wget-cosmo:   $cosmodata/raw/WS_150m.2D.201401.nc4    -- monthly 01 .. 12, 1.5 GB each
clip-cosmo:   $cosmodata/BY-clip/WS_150m.2D.2014.npz  -- numpy savez: flexible, can memmap
regrid-cosmo: $cosmodata/BY-grid/WS_150m.2D.2014.npz  -- 0.05° x 0.05° grid to nearest cosmo

Here BY is an area name, see "Parameters" below.

The COSMO_REA6 grid

If I understand correctly, the data in opendata.dwd.de/climate_environment/REA/COSMO_REA6/.../*.nc4 is on a staggered grid, about 6 x 6 km. The 2d RLAT RLON in those files are in an odd rotated coordinate system which as far as I know has no CRS / no proj string. Ignore them, and use only the RLAT RLON in .../constant/COSMO_REA6_CONST_withOUTsponge.nc .

To clip a particular N S W E box out of the gigabyte COSMO_REA6 files, use straight numpy, see clip2d.py. For example, 48.5° .. 48° x 11.25° x 12°:

mask: 83 points in (10, 10)
[[0 0 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 1]
 [0 1 1 1 1 1 1 1 1 1]
 [0 1 1 1 0 0 0 0 0 0]]

To find the j,k with RLAT[j,k], RLON[j,k] nearest say 48.137 N, 11.575 E, use a KDTree, see kdtree.py also near_latlon.py. "Nearest" means nearest in the Cosmo staggered grid, in the Euclidean metric (cos-shrink ?) For example,

wgslat wgslon cosmoj cosmok cosmolat cosmolon
    48     12    378    443   47.973   11.969

Then wind_speed[:,378,443] in a (720, 824, 848) (time, lat, lon) array is the timeseries at the Cosmo gridpoint nearest 48° N, 12° E.

Interpolation

Power from windmills is highly sensitive to peak windspeeds. Interpolating timeseries usually smooths, lowers peaks, which would bias windpower estimates -- don't want to do that. For example, consider two hourly timeseries at points 6 km apart with 6 km/h wind:

6  0  6  0 ... m/s: some windpower
0  6  0  6 ...
3  3  3  3 ... interpolate, average: 0 windpower

(Wind turbines generate no electricity below about 3 m/s).

I regrid the Cosmo staggered grid to wgs84 with Kdtree, with each timeseries unchanged:
wgs84 gridpoint ⟶ the nearest staggered gridpoint ⟶ the windspeed timeseries there.
Yes, nearest-neighbor is blocky, and several points on the uniform wgs84 grid will get the same Cosmo timeseries. Comments welcome.

Files: xarray ⟶ Netcdf4 ⟶ hdf5 vs. numpy .npz

Netcdf4 files are afaik hdf5 files, which can be really messy. h5stat .../cosmo/...nc4 in h5py shows a multilevel tree with compressed blocks -- complicated and slow. Xarray dask zarr etc. can't even see the complicated hdf5 way down below, so cannot possibly speed it up.

What I really want is a clean path to numpy -- solid, with a well-defined algebra of primitives (slicing, views, memmap). A file tree of .npz files < 1 GB is, for me, simpler and faster than messing with xarray etc.; ymmv. See
numpy indexes small amounts of data 1000 faster than xarray (2019)
Cyrille Rossant moving-away-hdf5 (2016)
my little testbench to time various paths numpy <--> xarray <--> read/write

Notes

Choice of software depends on what you really want to do, what you know / who you know, learn time. Last not least, it's a matter of taste:

Less is more
More is more

Comments welcome

cheers
— denis-bz-py t-online.de updated 2024-04-23 April

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment