Skip to content

Instantly share code, notes, and snippets.

View mdsumner's full-sized avatar

Michael Sumner mdsumner

  • Integrated Digital East Antarctica, Australian Antarctic Division
  • Hobart, Australia
View GitHub Profile

Warning, requires very recent GDAL - built on 2025-03-21 source code.

gdal raster pipeline ! \ 
    read /vsicurl/https://projects.pawsey.org.au/idea-gebco-tif/GEBCO_2024.tif  ! \
    reproject --resolution 1000,1000  --bbox 140,-45,150,-33 --bbox-crs "EPSG:4326"  --dst-crs "https://spatialreference.org/ref/epsg/3111" ! \
     write gebco_tas_vic.gdalg.json
library(terra)
r <- rast(matrix(sample(1:5, 60,  replace = TRUE), 5))

uv <- unique(r)[,1L, drop = TRUE]
l <- setNames(vector("list", length(uv)), sprintf("l=%i", uv))
for (i in seq_along(uv)) {
  l[[i]] <- r == uv[i]
}

With R, this is just proving to myself that I can write all the local-path references.

Ran on 128 cores on HPC it did about 3200 of these files (it's still going, there are 8000+ days since 2002 July 01. Virtualizing netcdf is insanely efficient, but the details of how to auth those remote links still beyond me (which is what I'm working towards understanding). The scratch path below on my lustre storage would be replace by the base url and its child date structure:

https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1

They look like this here without the shard structure.

library(terra)
a <- vect(cbind(c(0, 1), c(2, 3, 4)), type = "l")

## imagine we have an independently specificed raster, and this line falls in it somewhere
r <- extend(rast(a, res = .05), 1)

(cell <- cells(r, a))

##so we can do 

On this site is a lot of ZARR datasets, ARCO (analysis ready cloud optimized).

https://catalog.leap.columbia.edu/

There are example codes that show how to open in xarray. We can do this is in R by loading up the GDAL api via the Python osgeo.gdal package.

The new version of reticulate makes this easy to do.

Create a 2x resolution GEBCO Zarr, I don't think this is well parallelized maybe because rioxarray doesn't distribute the reads amongst multiple open datasets?

import xarray
dsn = "/vsicurl/https://projects.pawsey.org.au/idea-gebco-tif/GEBCO_2024.tif"
ds = xarray.open_dataset(f'vrt://{dsn}?outsize=200%,200%', engine  = "rasterio", chunks = {"y":1024, "x": 1024})
ds.to_zarr("gebco2x.zarr", zarr_format = 3)
https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip