Skip to content

Instantly share code, notes, and snippets.

@mdsumner
Last active February 27, 2025 14:09
Show Gist options
  • Save mdsumner/0b748c809d64239f23600435aa4fa6e9 to your computer and use it in GitHub Desktop.
Save mdsumner/0b748c809d64239f23600435aa4fa6e9 to your computer and use it in GitHub Desktop.

convert to Zarr and zip

Interested in using GDAL to convert to zipped Zarr and then open as xarray, or stream through the warper api.

gdalmdimtranslate  /vsicurl/https://projects.pawsey.org.au/idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc \
   abc.zarr -of ZARR

cd abc.zarr
zip ../abc.zarr.zip . -r
cd ..

## now we need the classic-2D forms of each variable for rasterio
gdalinfo ZARR:/vsizip/abc.zarr.zip
Driver: Zarr/Zarr
Files: none associated
Size is 512, 512
Subdatasets:
  SUBDATASET_1_NAME=ZARR:"/vsizip/abc.zarr.zip":/lat
  SUBDATASET_1_DESC=Array /lat
  SUBDATASET_2_NAME=ZARR:"/vsizip/abc.zarr.zip":/lon
  SUBDATASET_2_DESC=Array /lon
  SUBDATASET_3_NAME=ZARR:"/vsizip/abc.zarr.zip":/time
  SUBDATASET_3_DESC=Array /time
  SUBDATASET_4_NAME=ZARR:"/vsizip/abc.zarr.zip":/zlev
  SUBDATASET_4_DESC=Array /zlev
  SUBDATASET_5_NAME=ZARR:"/vsizip/abc.zarr.zip":/anom
  SUBDATASET_5_DESC=Array /anom
  SUBDATASET_6_NAME=ZARR:"/vsizip/abc.zarr.zip":/err
  SUBDATASET_6_DESC=Array /err
  SUBDATASET_7_NAME=ZARR:"/vsizip/abc.zarr.zip":/ice
  SUBDATASET_7_DESC=Array /ice
  SUBDATASET_8_NAME=ZARR:"/vsizip/abc.zarr.zip":/sst
  SUBDATASET_8_DESC=Array /sst
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)
## we can open these directly in rioxarray/rasterio
import rioxarray
rioxarray.open_rasterio('ZARR:"/vsizip/abc.zarr.zip":/sst')
<xarray.DataArray (band: 1, y: 720, x: 1440)> Size: 2MB
[1036800 values with dtype=int16]
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 12kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9
  * y            (y) float64 6kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
    spatial_ref  int64 8B 0
Attributes: (12/13)
    long_name:       Daily sea surface temperature
    valid_max:       4500
    valid_min:       -300
    DIM_time_INDEX:  0
    DIM_time_VALUE:  1339
    DIM_time_UNIT:   days since 1978-01-01 12:00:00
    ...              ...
    DIM_zlev_VALUE:  0
    DIM_zlev_UNIT:   meters
    _FillValue:      -999
    scale_factor:    0.009999999776482582
    add_offset:      0.0
    units:           Celsius

note about the netcdf

/vsicurl/https://projects.pawsey.org.au/idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc

The above /vsicurl description is a hosted copy of a netcdf, sub out the endpoint/bucket and set your earthdata creds with "GDAL_HTTP_HEADERS" or "GDAL_HTTP_HEADER_FILE" to access via vsicurl, or other software earthdata creds:

i.e.

https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment