This process outlines the process for creating Cloud Optimised Geotiffs suitable for hosting in services such as AWS S3. COGs enables more efficient workflows use cases such as fast access from Functions as a Services (E.g AWS Lambda), or comsumption into client desktop GIS systems (e.g QGIS). For more details on COGs please see https://www.cogeo.org/in-depth.html
First create the virtual mosaic from the directory of tiles, ensuring that a alpha band is created in the VRT to set transparency where there is no source raster.
gdalbuildvrt -addalpha mosaic.vrt *.tif
gdal_translate -b 1 -b 2 -b 3 -mask 4 mosaic.vrt rgbmask.vrt
Create a BigTiff in a lossless compression to avoid quality loss. Use all available CPU cores (DEFLATE compression method can use multi-threading). The GeoTiff has an internal 1-bit mask band to provide transparency for parts of the mosaic raster extent that contain no source data
gdal_translate \
-b 1 -b 2 -b 3 -mask 4 \
-of GTiff \
-co BIGTIFF=YES \
-co TILED=YES \
-co COMPRESS=DEFLATE \
-co PREDICTOR=2 \
-co NUM_THREADS=ALL_CPUS \
--config GDAL_CACHEMAX 4096 \
-co ALPHA=YES \
--config GDAL_TIFF_INTERNAL_MASK YES \
mosaic.vrt output.tif
Create overviews for the mosaic.
Note: For the gdaladdo there is known issue that generating multiple overviews in the same TIFF file is slow and causes tiff directory thrashing. The libtiff library has to go back-and-forth between multiple TIFF internal images, and load/unload the TIFF indexes each time. For a huge file, this involes a lot of I/O. The workaround, which is especially fine for the COG case, is to generate each overview level in its own file by cascading calls to gdaladdo. See https://trac.osgeo.org/gdal/ticket/5067#comment:2 for more info
OVERVIEW=output.tif
for VARIABLE in 2 4 8 16 32 64 128 256 512
do
gdaladdo \
--config GDAL_CACHEMAX 4096 \
--config COMPRESS_OVERVIEW DEFLATE \
-ro \
-r average \
$OVERVIEW 2
OVERVIEW = ${OVERVIEW}.ovr
done
Create COGs, applying final JPEG compression, and copying and compressing the previously generated overview's IFD (Image File Directory) index in the header of the file to be efficiently fetchable via cloud web APIs. The GeoTiff is creates internal tiles of 256x256 for the main resolution and 128x128 tiles for overviews
NOTES:
- When compressing with JPEG multi-threading can not be used.
- Increasing the block size can reduce the size of the IFD. But larger blocks can cause more bytes to be pulled for random access if the compression rate is not high. Going from teh default of 256 to 512 will reduce the index by a factor of 4. The size of the TIFF index arrays, for each pyramid level, is : 2 * ceil(xsize / blockxsize) * ceil(ysize / blockysize) * 8 bytes Because we use an internal mask, this value has to be multiplied by 2.
gdal_translate \
-of GTiff \
-co BIGTIFF=YES \
-co TILED=YES \
-co BLOCKXSIZE=256 \
-co BLOCKYSIZE=256 \
-co COMPRESS=JPEG \
-co JPEG_QUALITY=85 \
-co PHOTOMETRIC=YCBCR \
-co COPY_SRC_OVERVIEWS=YES \
-co ALPHA=YES \
--config GDAL_TIFF_INTERNAL_MASK YES \
--config GDAL_TIFF_OVR_BLOCKSIZE 128 \
--config GDAL_CACHEMAX 4096 \
output.tif output_cogs.tif
Check there are no errors or warnings from the following script
python validate_cloud_optimized_geotiff.py output_cogs.tif
Hi,
I'm wondering about this command line :
gdal_translate -b 1 -b 2 -b 3 -mask 4 mosaic.vrt rgbmask.vrt
as the resulting VRT is not used after.
Could you clarify please ?