Skip to content

Instantly share code, notes, and snippets.

@mdsumner
Last active March 5, 2025 14:46
Show Gist options
  • Save mdsumner/c8631d1c1c47796532d15d2345c0ac9e to your computer and use it in GitHub Desktop.
Save mdsumner/c8631d1c1c47796532d15d2345c0ac9e to your computer and use it in GitHub Desktop.

Zarr

zarrs

A real Rust library for Zarr, including virtualization support, Icechunk integration and v3.

GDAL

has its own Zarr v2 and v3 internal implementation, works but

  • we also have the flip between GDAL classic and multidim mode
  • classic mode is 2D with bands (unrolled from higher dimensions when present, and no 1D vars)
  • multidim is truly n-dimensional, but doesn't have the same reprojection power as 2D mode
  • not exercised much, I personally found pretty fundamental bugs (fixed immediately but could do with more eyes on it)
  • doesn't know about virtualization, but see https://lists.osgeo.org/pipermail/gdal-dev/2024-July/059256.html

R

there are several existing partial pathways for Zarr in R

  • {sf} (for {stars} has GDAL bindings and some support for multdim mode, not much attention (Carl Boettigger and me)
  • {gdalraster} is a real API for GDAL but only for classic mode for now (I have started with multidim support for that )
  • {pizzarr} an R-only package (no C), David Blodgett is a big supporter and has written netcdf-like wrappers
  • {Rarr} on bioconductor (which is a CRAN-sibling)
  • nczarr, we can use {RNetCDF} or {ncdf4} but cross platform support is patchy and unstable, {tidync} and {stars} (read_ncdf) will already work this way without any changes (WIP let's explore, but see https://gist.github.com/mdsumner/492d2a98bffc6de5974a96f50a0b75f2)

pizzarr and Rarr have these compression tools (some limitations on settings)

  • zlib/gzip
  • bzip2
  • blosc
  • LZMA
  • LZ4
  • Zstd

A fundamental issue in R is its narrow type support - Byte (raw), Int32 (integer and logical), Float64 (numeric), external package {bit64} provides Int64

None of the R packages support virtualization (kerchunk, VirtualiZarr), but the nczarr approach must support some, technically opendap is dmr++ so it can't be too far off. Biggest gap in netcdf is being able to read remote stores, and having it built to support that.

R itself just got Zstd compression native https://github.com/wch/r-source/commit/7e16093f2c107d4965e0ebfaeea50865062df54d

I need to look at how pizarr and Rarr do it but generally the compression tools are scattered, some native, some extension packages.

We could go a long way with R itself with Zarr, but it would be like the python landscape and its probably time to push behind something fundamental like Rust-zarrs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment