On this site is a lot of ZARR datasets, ARCO (analysis ready cloud optimized).
https://catalog.leap.columbia.edu/
There are example codes that show how to open in xarray. We can do this is in R by loading up the GDAL api via the Python osgeo.gdal package.
The new version of reticulate makes this easy to do.
reticulate::py_require("gdal")
gdal <- reticulate::import("osgeo.gdal")
Now this ZARR store is just a URL but we need some special syntax to use it in GDAL, essentially declare as ZARR, and carefully quote around the url with "/vsicurl/".
https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/AGDC-feedstock/AGCD.zarr
We use gdal MULTIDIM mode to open it as a group of n-dimensional arrays.
dsn <- "ZARR:\"/vsicurl/https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/AGDC-feedstock/AGCD.zarr\""
ds <- gdal$OpenEx(dsn, gdal$OF_MULTIDIM_RASTER)
We need the root group and then can ask what's in there.
rg$GetMDArrayNames()
##[1] "lat" "lon" "time" "precip" "tmax" "tmin" "vapourpres"
Let's go for tmax, and find out what it is.
tmax <- rg$OpenMDArrayFromFullname("/tmax")
ivar <- lapply(tmax$GetDimensions(), \(.x) .x$GetIndexingVariable()$ReadAsArray())
lapply(ivar, range)
[[1]]
[1] 44194.38 62090.38
[[2]]
[1] -44.5 -10.0
[[3]]
[1] 112.00 156.25
It's (probably) time, lat, lon.
(the metadata is there but we save that for another time)
Get one slice in time.
t1 <- tmax$GetView("[100,:,:]")$ReadAsArray() ## time step 101 (all lats, all lons)
image(ivar[[3]], ivar[[2]], t(t1), asp = 1/cos(mean(ivar[[2]]) * pi/180), col = hcl.colors(56))
The values look ok, no offset scale to do.
range(t1)
[1] 8.865234 39.677734
tmax$GetNoDataValue()
[1] NaN
tmax$GetOffset() ## None
tmax$GetScale() ## None