Skip to content

Instantly share code, notes, and snippets.

View mdsumner's full-sized avatar

Michael Sumner mdsumner

  • Integrated Digital East Antarctica, Australian Antarctic Division
  • Hobart, Australia
View GitHub Profile

Create a 2x resolution GEBCO Zarr, I don't think this is well parallelized maybe because rioxarray doesn't distribute the reads amongst multiple open datasets?

import xarray
dsn = "/vsicurl/https://projects.pawsey.org.au/idea-gebco-tif/GEBCO_2024.tif"
ds = xarray.open_dataset(f'vrt://{dsn}?outsize=200%,200%', engine  = "rasterio", chunks = {"y":1024, "x": 1024})
ds.to_zarr("gebco2x.zarr", zarr_format = 3)
https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip
library(reticulate)
py_require("gdal")
gdal <- import("osgeo.gdal")
gdal$UseExceptions()

f <- "/vsigs/gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3"
Sys.setenv("GS_NO_SIGN_REQUEST" = "YES")
AWS_REGION='region-is-not-set' al repo create AustralianAntarcticDivision/myrepo
⠴ Creating repo AustralianAntarcticDivision/myrepo...
✓ Creating repo AustralianAntarcticDivision/myrepo...failed
  [31m×[0m error listing objects in object store dispatch failure
  [31m│[0m
  [31m│[0m context:
  [31m│[0m    0: icechunk::repository::create

https://rstats.me/@mdsumner/114103904777240036

I've used HDF5 here because NETCDF can't do this on windows. On linux probably best to use "NETCDF:{dsn}:time".

(r <- terra::rast("HDF5:\"/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc\"://time"))
class       : SpatRaster 
dimensions  : 1, 1, 1  (nrow, ncol, nlyr)
resolution : 1, 1 (x, y)

convert to Zarr and zip

Interested in using GDAL to convert to zipped Zarr and then open as xarray, or stream through the warper api.

gdalmdimtranslate  /vsicurl/https://projects.pawsey.org.au/idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc \
   abc.zarr -of ZARR

cd abc.zarr
cellcentre.offset <- function(x) {
  ex <- terra::ext(x)
  res <- terra::res(x)
  halfcell <- res/2
  cco <- c(ex[1], ex[4]) + halfcell * c(1, -1)
  cco
}
service <- file.path(
  "https://s3-us-west-2.amazonaws.com", 
  "mrlc"
)
layers <- paste0(
  service, 
  "/Annual_NLCD_LndCov_", 
  years, 
  "_CU_C1V0.tif"
dsn <- "/vsicurl/https://projects.pawsey.org.au/idea-gebco-tif/GEBCO_2024.tif"

library(terra)
r <- rast(ext(-1, 1, -1, 1) * 6378137 * 1.9, res = 25000, crs = "+proj=spilhaus")
geb <- project(rast(dsn), r, by_util = TRUE)
@mdsumner
mdsumner / zarr.md
Last active February 17, 2025 23:55

with stars (uses GDAL MULTIDIM)

## note we have to inner-quote the string (I think ZARR driver needs some attention here)
 dsn <- "ZARR:\"/vsicurl/https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1\""
library(stars)
read_mdim(dsn, proxy = TRUE, bounds = FALSE, variable = "analysed_sst")
stars_proxy object with 1 attribute in 1 file(s):
$analysed_sst