Create a 2x resolution GEBCO Zarr, I don't think this is well parallelized maybe because rioxarray doesn't distribute the reads amongst multiple open datasets?
import xarray
dsn = "/vsicurl/https://projects.pawsey.org.au/idea-gebco-tif/GEBCO_2024.tif"
ds = xarray.open_dataset(f'vrt://{dsn}?outsize=200%,200%', engine = "rasterio", chunks = {"y":1024, "x": 1024})
ds.to_zarr("gebco2x.zarr", zarr_format = 3)
Investigate
gdalinfo ZARR:"gebco2x.zarr/":/elevation -nomd
Driver: Zarr/Zarr
Files: gebco2x.zarr/elevation/zarr.json
Size is 172800, 86400
Coordinate System is:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.002083333333333,-0.002083333333333)
Corner Coordinates:
Upper Left (-180.0000000, 90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
Upper Right ( 180.0000000, 90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
Center ( 0.0000000, 0.0000000) ( 0d 0' 0.01"E, 0d 0' 0.01"N)
Band 1 Block=1024x1024 Type=Int16, ColorInterp=Undefined
NoData Value=0
Unit Type: m
Now convert that to COG
- important to make sure ALL_CPUS is not an overestimate, e.g. on my HPC 2x is reported by ALL_CPUS (I will investigate)
- wm I make sure is 500Mb for each cpu
- had to set BIGTIFF
COG driver determines overviews :
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x559f85e42f80) succeeds as GTiff.
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
cd $MYSCRATCH
gdalwarp --debug on --config GDAL_NUM_THREADS $SLURM_CPUS_PER_TASK ZARR:"gebco2x.zarr/":/elevation test.tif -co COMPRESS=ZSTD -of COG -co BLOCKSIZE=1024 -multi -wm 60000 -co BIGTIFF=YES
that proceeds:
COG: Generating overviews of the imagery: end
COG: Generating final product: start
GTiff: Using up to 128 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x559f85e489d0) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
GTiff: File being created as a BigTIFF.
GTiff: Using up to 128 threads for compression/decompression
GTiff: ScanDirectories()
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
<snip>
... lots of blocks ensue ...
GTIFF: Waiting for worker job to finish handling block 16
20GDAL: GDALClose(test.tif.ovr.tmp, this=0x559f85e42f80)
COG: Generating overviews of the imagery: end
COG: Generating final product: start
GTiff: Using up to 128 threads for compression/decompression
GDAL: GDALOpen(test.tif.ovr.tmp, this=0x559f85e489d0) succeeds as GTiff.
GTiff: ScanDirectories()
GTiff: Opened 43200x21600 overview.
GTiff: Opened 21600x10800 overview.
GTiff: Opened 10800x5400 overview.
GTiff: Opened 5400x2700 overview.
GTiff: Opened 2700x1350 overview.
GTiff: Opened 1350x675 overview.
GTiff: Opened 675x337 overview.
GTiff: File being created as a BigTIFF.
GTiff: Using up to 128 threads for compression/decompression
GTiff: ScanDirectories()
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
GDAL: GDALOverviewDataset(test.tif.ovr.tmp, this=0x559f85eae5f0) creation.
...30...40...50...60...70...80...90...GDAL: GDALClose(test.tif.ovr.tmp, this=0x559f85e489d0)
COG: Generating final product: end
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x559f85a72ff0)
GDAL: GDALClose(test.tif, this=0x559f8869e180)
GDAL: GDALClose(ZARR:gebco2x.zarr/:/elevation, this=0x559f85aa82c0)
100 - done in 00:07:25.
Here's the job effiency, so using a 128 cpus was definitely overkill
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 256
CPU Utilized: 00:31:10
CPU Efficiency: 1.59% of 1-08:42:40 core-walltime
Job Wall-clock time: 00:07:40
Memory Utilized: 30.25 GB
Memory Efficiency: 13.15% of 230.00 GB
I ran it again with less memory and cpu, took 9.40
here's the debug log
and the seff