GeoParquet is an experimental standard for storing geospatial data in the Parquet format. Because Parquet's columnar architecture allow for efficient reads over HTTP it's considered to be an initial attempt at a "Cloud-optimized" vector format.
While it lacks the ability to do optimized spatial reads, since it can filter on a small fraction of a features fields it fits will with storing and querying STAC Items. A STAC Item inherits from GeoJSON so has a spatial component, but also can store a large number of metadata field, many of which may be redundant and rarely useful to query. GeoParquet lets us query for features with simple filter requirements like "all images with a low cloud cover percentage".
The first attempt at converting the full Open Data Catalog to GeoParquet is at:
s3://maxar-opendata/events/maxar-opendata.parquet
or
https://maxar-opendata.s3.amazonaws.com/events/maxar-opendata.parquet
GDAL/OGR can now read Geoparquet and can run SQL so you can query it much like a STAC API.
Note: Some fields can't be handled by OGR so you'll get some warnings
Get all the tiles covering Turkey with low cloud cover
ogr2ogr turkey_earthquake.geojson /vsis3/maxar-opendata/events/maxar-opendata.parquet -f GeoJSON \
-sql 'SELECT * FROM maxar_opendata WHERE "tile:clouds_percent" < 10' \
-spat 35 35 38 38
DuckDB gives you a fast SQL engine on top of Parquet files. While it doesn't "officially" handle spatial queries yet (https://github.com/duckdblabs/duckdb_spatial) we can use the quadkey identifiers and the LIKE operator as a simple spatial query system. The zone and quadkey "36/120022" roughly covers the Turkey earthquake area.
To read from S3, first install the httpfs
extension:
load httpfs;
Get the geotiff URLs for the Turkey earthquake with low cloud cover:
load httpfs;
SELECT assets.visual.href
FROM read_parquet('https://maxar-opendata.s3.amazonaws.com/events/maxar-opendata.parquet')
WHERE
"id" like '36/120022%'
AND "tile:clouds_percent" < 10;