The SpatioTemporal Asset Catalog specification is a simple specification being designed to make open geospatial assets searchable and crawlable. It aims to realise the dream of enabling users to search for imagery and other assets across multiple providers.
The STAC spec starts by placing an emphasis on static web resources, not web services, even though web services are part of the plan. The fundamental part of the spec are some JSON schemas for managing collections of geospatial data, along with detailed metadata about individual datasets.
The group working on STAC want to embrace a pragmatic approach to linked data by also focusing on HTML with standard web links. This format is very well supported by current search engines, and should enable people to share links to individual datasets, as well as using a search engine to find datasets by ID numbers.
There is also a Web Service component of STAC, to enable search, syndication and other more advanced features. At the moment there is some basic ideas around searching for datasets, with more advanced details still being decided.
The STAC specification is made up of three components used to describe spatial data. They are Catalogs, Items and Assets.
The top level component in STAC is a Catalog. A Catalog contains either sub-/Catalogs/ or Items. Each Catalog should be relatively small, ie, the JSON representation should be less than 1Mb (as a rough guideline).
An Item is very similar to an ODC Dataset. An Item could be one Landsat Scene made up of multiple bands. An Item contains one or more Assets, which include links to the actual data. An Item should contain information mostly relevant to search and discovery, with more detailed metadata split out elsewhere like Asset metadata.
An Item is a JSON object in the format of GeoJSON Feature describing it’s bounds, along with some extra metadata.
An Asset provides details on how to access the actual data. Eventually every Catalog and Item need to point to at least one Asset. Even if the data isn’t publicly available to everyone, there should still be a link to where the data is.
STAC is intended to be openly extensible, with a small set of required fields. There is work towards having a set of profiles and extensions for more specific use cases like Earth Observation data. As of May 2018 there are specifications for two extensions.
A set of fields that are relevant to searching for imagery and other types of observations of earth.
Enables providers to share data like the band information without having to
repeat it in every single Item
returned. A record can link to its collection and
smart clients can ‘merge’ the information to create a complete record.
Open Data Cube was defined to be flexible in the structure of Datasets it works with. It uses a metadata type definition to know how to interpret different parts of a JSON document. The default metadata type used for almost everything is called eo. It’s use is so widespread now that it can be taken as the ODC Dataset structure.
Some coding work will be required to export our metadata in a STAC structure. We will need to:
- [ ] Work out a mapping from our eo metadata type to a STAC Item
In parallel:
- [ ] Implement an exporter to write our metadata as sidecar JSON files
- [ ] Provide a STAC JSON view of datasets through
cubedash
- [ ] Decide on how to break up our large Product collections into smaller pieces to export as STAC Catalogs.
- [ ] Provide a STAC Catalog view through
cubedash
To make use of STAC structured metadata, it should be possible to write a new metadata type definition allowing ODC to index STAC documents verbatim, without having to do any format conversion.
The ODC Index includes a very flexible (but poorly documented), ability to efficiently store and search arbitrarily structured JSON documents representing spatial resources.
There may be some other changes required to make it work. There is work in progress now to improve the support for indexing datasets from URLs, not only from local paths. It may also be necessary to improve the metadata type usage in ODC, but only with relatively straightforward fixes. eg:
From @jez 2018-04-27:
We could almost index their datasets directly (adding a metadata_type for their json structure). The bounding box as-an-array may trip it up, as ours is usually a dict (we might need a new field type? unless array indicies already work?)
STAC contributors consider it much safer and more reliable than relying on a database/elasticsearch index to be maintained to keep all metadata safe.
Moves with the files.
Easy to read without special tools.
Stacked NetCDF makes this difficult!
Multiple systems can use the same point of truth.
The high level proposal for STAC sounds excellent. It also sounds like some of the big players are on board and collaborating, and hard work is being done to keep the specification minimal.
It’s still very early days, and so there are plenty of incompatible examples floating around, and lots and lots of details still to be determined. There has been some excellent progress and consensus so far, but I’m concerned about how far that will go.
The plans for syndicating catalogs and indexes, and keeping everything up to date has got a very long way to go. There are many existing and complicated standards and coming up with something new is fraught with danger. See: Atom, RSS, OAI-PMH, WebSub (formerly PubSubHubbub)
We should definitely follow and contribute to the discussions.
We should start planning and making some changes to ODC to support STAC’s direction.
From its homepage:
Swagger is the world’s largest framework of API developer tools for the OpenAPI Specification(OAS), enabling development across the entire API lifecycle, from design and documentation, to test and deployment.
From the OpenAPI specification at swagger.io
.
The OpenAPI specification (formerly known as the Swagger Specification) is a powerful definition format to describe RESTful APIs. The specification creates a RESTful interface for easily developing and consuming an API by effectively mapping all the resources and operations associated with it. It’s easy-to-learn, language agnostic, and both human and machine readable.
The important central part is a simple specification for a feature that is able to be statically hosted somewhere.
https://github.com/radiantearth/stac-spec/blob/master/api-spec/wfs-stac.md
- Radiant Earth
- a relatively new non-profit funded by Gates Foundation and Omidyar Network, helping to bring about more open geospatial data for positive global impact, and improved decision-making
- Open Geospatial Consortium
- OGC
- Chris Holmes
- part of Planet Labs, Radiant Earth and Open Geospatial Consortium[fn:1]
[fn:1] See https://medium.com/@cholmes/radiant-earth-fellow-3a8b959f473
- First Official release of SpatioTemporal Asset Catalog Spec - Chris Holmes
- Spatial Data on the Web Best Practices - W3 spec
- The potential of spatiotemporal asset catalogs - Chris Holmes
- Static SpatioTemporal Asset Catalogs in Depth - Chris Holmes
- https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby
- pystac
- https://github.com/awslabs/landsat-on-aws Not of STAC, but a very similar concept.