Skip to content

Instantly share code, notes, and snippets.

@mdsumner
Last active March 13, 2025 02:04
Show Gist options
  • Save mdsumner/9549d9bbe9f92463b60c31099cf6ee64 to your computer and use it in GitHub Desktop.
Save mdsumner/9549d9bbe9f92463b60c31099cf6ee64 to your computer and use it in GitHub Desktop.

Create a 2x resolution GEBCO Zarr, I don't think this is well parallelized maybe because rioxarray doesn't distribute the reads amongst multiple open datasets?

import xarray
dsn = "/vsicurl/https://projects.pawsey.org.au/idea-gebco-tif/GEBCO_2024.tif"
ds = xarray.open_dataset(f'vrt://{dsn}?outsize=200%,200%', engine  = "rasterio", chunks = {"y":1024, "x": 1024})
ds.to_zarr("gebco2x.zarr", zarr_format = 3)

Investigate

gdalinfo ZARR:"gebco2x.zarr/":/elevation -nomd
Driver: Zarr/Zarr
Files: gebco2x.zarr/elevation/zarr.json
Size is 172800, 86400
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.002083333333333,-0.002083333333333)
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
Band 1 Block=1024x1024 Type=Int16, ColorInterp=Undefined
  NoData Value=0
  Unit Type: m

Now convert that to COG

  • important to make sure ALL_CPUS is not an overestimate, e.g. on my HPC 2x is reported by ALL_CPUS (I will investigate)
  • wm I make sure is 500Mb for each cpu
  • had to set BIGTIFF

COG driver determines overviews :


GDAL: GDALOpen(test.tif.ovr.tmp, this=0x559f85e42f80) succeeds as GTiff.

GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
cd $MYSCRATCH
gdalwarp --debug on --config GDAL_NUM_THREADS $SLURM_CPUS_PER_TASK ZARR:"gebco2x.zarr/":/elevation test.tif -co COMPRESS=ZSTD -of COG -co BLOCKSIZE=1024 -multi -wm 60000 -co BIGTIFF=YES 

that proceeds:


COG: Generating overviews of the imagery: end
COG: Generating final product: start
GTiff: Using up to 128 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x559f85e489d0) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
GTiff: File being created as a BigTIFF.
GTiff: Using up to 128 threads for compression/decompression
GTiff: ScanDirectories()
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
<snip>

... lots of blocks ensue ...

GTIFF: Waiting for worker job to finish handling block 16
20GDAL: GDALClose(test.tif.ovr.tmp, this=0x559f85e42f80)
COG: Generating overviews of the imagery: end
COG: Generating final product: start
GTiff: Using up to 128 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x559f85e489d0) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
GTiff: File being created as a BigTIFF.
GTiff: Using up to 128 threads for compression/decompression
GTiff: ScanDirectories()
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
...30...40...50...60...70...80...90...GDAL: GDALClose(test.tif.ovr.tmp, this=0x559f85e489d0)
COG: Generating final product: end
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x559f85a72ff0)
GDAL: GDALClose(test.tif, this=0x559f8869e180)
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x559f85aa82c0)
100 - done in 00:07:25.

Here's the job effiency, so using a 128 cpus was definitely overkill

State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 256
CPU Utilized: 00:31:10
CPU Efficiency: 1.59% of 1-08:42:40 core-walltime
Job Wall-clock time: 00:07:40
Memory Utilized: 30.25 GB
Memory Efficiency: 13.15% of 230.00 GB
@mdsumner
Copy link
Author

I ran it again with less memory and cpu, took 9.40

#SBATCH --cpus-per-task=60
#SBATCH --mem=60Gb

here's the debug log

CPL: Loading configuration from /home/mdsumner/.gdal/gdalrc
CPL: Ignoring configuration option GDAL_CACHEMAX=50% from configuration file as it is already set as an environment variable
GDAL: GDALOpen(ZARR:gebco2x.zarr/:/elevation, this=0x55e2aff0c2c0) succeeds as Zarr.
GDAL: GDALDefaultOverviews::OverviewScan()
GDAL: GDAL_CACHEMAX = 30719 MB
GDAL: GDALDefaultOverviews::OverviewScan()
GDAL: GDALOpen(ZARR:gebco2x.zarr/:/elevation, this=0x55e2afed6ff0) succeeds as Zarr.
GDAL: GDALDefaultOverviews::OverviewScan()
COG: Generating overviews of the imagery: start
GTiff: File being created as a BigTIFF.
GTiff: Using up to 60 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x55e2b02a6f80) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.

<snip> GTIFF: Waiting for worker job to finish handling block ... </snip>


20GDAL: GDALClose(test.tif.ovr.tmp, this=0x55e2b02a6f80)
COG: Generating overviews of the imagery: end
COG: Generating final product: start
GTiff: Using up to 60 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x55e2b02a6f10) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
GTiff: File being created as a BigTIFF.
GTiff: Using up to 60 threads for compression/decompression
GTiff: ScanDirectories()
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55e2b0937fc0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55e2b0937fc0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55e2b0937fc0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55e2b0937fc0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55e2b0937fc0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55e2b0937fc0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55e2b0937fc0) creation.
...30...40...50...60...70...80...90...GDAL: GDALClose(test.tif.ovr.tmp, this=0x55e2b02a6f10)
COG: Generating final product: end
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x55e2afed6ff0)
GDAL: GDALClose(test.tif, this=0x55e2b029dfc0)
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x55e2aff0c2c0)
100 - done in 00:09:40. 

and the seff

Nodes: 1
Cores per node: 120
CPU Utilized: 01:09:13
CPU Efficiency: 5.81% of 19:52:00 core-walltime
Job Wall-clock time: 00:09:56
Memory Utilized: 20.99 GB
Memory Efficiency: 34.99% of 60.00 GB

@mdsumner
Copy link
Author

fwiw, it doesn't hurt to set ALL_CPUS too high

CPL: Loading configuration from /home/mdsumner/.gdal/gdalrc
CPL: Ignoring configuration option GDAL_CACHEMAX=50% from configuration file as it is already set as an environment variable
GDAL: GDALOpen(ZARR:gebco2x.zarr/:/elevation, this=0x55edf6dce2d0) succeeds as Zarr.
GDAL: GDALDefaultOverviews::OverviewScan()
GDAL: GDAL_CACHEMAX = 117759 MB
GDAL: GDALDefaultOverviews::OverviewScan()
GDAL: GDALOpen(ZARR:gebco2x.zarr/:/elevation, this=0x55edf6d99000) succeeds as Zarr.
GDAL: GDALDefaultOverviews::OverviewScan()
COG: Generating overviews of the imagery: start
GTiff: File being created as a BigTIFF.
GTiff: Using up to 256 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x55edf7168f90) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
0...10...20GDAL: GDALClose(test.tif.ovr.tmp, this=0x55edf7168f90)
COG: Generating overviews of the imagery: end
COG: Generating final product: start
GTiff: Using up to 256 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x55edf716a6c0) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
GTiff: File being created as a BigTIFF.
GTiff: Using up to 256 threads for compression/decompression
GTiff: ScanDirectories()
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55edf8ad3b70) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55edf8ad3b70) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55edf8ad3b70) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55edf8ad3b70) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55edf8ad3b70) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55edf8ad3b70) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x55edf8ad3b70) creation.
...30...40...50...60...70...80...90...GDAL: GDALClose(test.tif.ovr.tmp, this=0x55edf716a6c0)
COG: Generating final product: end
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x55edf6d99000)
GDAL: GDALClose(test.tif, this=0x55edf7201fe0)
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x55edf6dce2d0)
100 - done in 00:07:06.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment