Skip to content

Instantly share code, notes, and snippets.

@omad
Last active May 15, 2018 22:35
Show Gist options
  • Save omad/da6f740be0ead467c77c80d66701450f to your computer and use it in GitHub Desktop.
Save omad/da6f740be0ead467c77c80d66701450f to your computer and use it in GitHub Desktop.
SpatioTemporal Access Catalogues and the Open Data Cube

SpatioTemporal Access Catalogues and ODC

What is STAC?

The SpatioTemporal Asset Catalog specification is a simple specification being designed to make open geospatial assets searchable and crawlable. It aims to realise the dream of enabling users to search for imagery and other assets across multiple providers.

The STAC spec starts by placing an emphasis on static web resources, not web services, even though web services are part of the plan. The fundamental part of the spec are some JSON schemas for managing collections of geospatial data, along with detailed metadata about individual datasets.

The group working on STAC want to embrace a pragmatic approach to linked data by also focusing on HTML with standard web links. This format is very well supported by current search engines, and should enable people to share links to individual datasets, as well as using a search engine to find datasets by ID numbers.

There is also a Web Service component of STAC, to enable search, syndication and other more advanced features. At the moment there is some basic ideas around searching for datasets, with more advanced details still being decided.

The Components of STAC

The STAC specification is made up of three components used to describe spatial data. They are Catalogs, Items and Assets.

Catalog

The top level component in STAC is a Catalog. A Catalog contains either sub-/Catalogs/ or Items. Each Catalog should be relatively small, ie, the JSON representation should be less than 1Mb (as a rough guideline).

Item

An Item is very similar to an ODC Dataset. An Item could be one Landsat Scene made up of multiple bands. An Item contains one or more Assets, which include links to the actual data. An Item should contain information mostly relevant to search and discovery, with more detailed metadata split out elsewhere like Asset metadata.

An Item is a JSON object in the format of GeoJSON Feature describing it’s bounds, along with some extra metadata.

Asset

An Asset provides details on how to access the actual data. Eventually every Catalog and Item need to point to at least one Asset. Even if the data isn’t publicly available to everyone, there should still be a link to where the data is.

Profiles and Extensions

STAC is intended to be openly extensible, with a small set of required fields. There is work towards having a set of profiles and extensions for more specific use cases like Earth Observation data. As of May 2018 there are specifications for two extensions.

A set of fields that are relevant to searching for imagery and other types of observations of earth.

Enables providers to share data like the band information without having to repeat it in every single Item returned. A record can link to its collection and smart clients can ‘merge’ the information to create a complete record.

What can ODC do to support STAC

Open Data Cube was defined to be flexible in the structure of Datasets it works with. It uses a metadata type definition to know how to interpret different parts of a JSON document. The default metadata type used for almost everything is called eo. It’s use is so widespread now that it can be taken as the ODC Dataset structure.

Exporting STAC compatible data

Some coding work will be required to export our metadata in a STAC structure. We will need to:

  • [ ] Work out a mapping from our eo metadata type to a STAC Item

In parallel:

  • [ ] Implement an exporter to write our metadata as sidecar JSON files
  • [ ] Provide a STAC JSON view of datasets through cubedash
  • [ ] Decide on how to break up our large Product collections into smaller pieces to export as STAC Catalogs.
  • [ ] Provide a STAC Catalog view through cubedash

Utilising STAC compatible data

To make use of STAC structured metadata, it should be possible to write a new metadata type definition allowing ODC to index STAC documents verbatim, without having to do any format conversion.

The ODC Index includes a very flexible (but poorly documented), ability to efficiently store and search arbitrarily structured JSON documents representing spatial resources.

There may be some other changes required to make it work. There is work in progress now to improve the support for indexing datasets from URLs, not only from local paths. It may also be necessary to improve the metadata type usage in ODC, but only with relatively straightforward fixes. eg:

From @jez 2018-04-27:

We could almost index their datasets directly (adding a metadata_type for their json structure). The bounding box as-an-array may trip it up, as ours is usually a dict (we might need a new field type? unless array indicies already work?)

Consider moving to STAC structured sidecar JSON for storing metadata

STAC contributors consider it much safer and more reliable than relying on a database/elasticsearch index to be maintained to keep all metadata safe.

Moves with the files.

Easy to read without special tools.

Stacked NetCDF makes this difficult!

Multiple systems can use the same point of truth.

Participating in the Standards Process

Summary

The high level proposal for STAC sounds excellent. It also sounds like some of the big players are on board and collaborating, and hard work is being done to keep the specification minimal.

It’s still very early days, and so there are plenty of incompatible examples floating around, and lots and lots of details still to be determined. There has been some excellent progress and consensus so far, but I’m concerned about how far that will go.

The plans for syndicating catalogs and indexes, and keeping everything up to date has got a very long way to go. There are many existing and complicated standards and coming up with something new is fraught with danger. See: Atom, RSS, OAI-PMH, WebSub (formerly PubSubHubbub)

We should definitely follow and contribute to the discussions.

We should start planning and making some changes to ODC to support STAC’s direction.

Appendix 1 - The STAC Service API

What is Swagger?

From its homepage:

Swagger is the world’s largest framework of API developer tools for the OpenAPI Specification(OAS), enabling development across the entire API lifecycle, from design and documentation, to test and deployment.

What is OpenAPI?

From the OpenAPI specification at swagger.io.

The OpenAPI specification (formerly known as the Swagger Specification) is a powerful definition format to describe RESTful APIs. The specification creates a RESTful interface for easily developing and consuming an API by effectively mapping all the resources and operations associated with it. It’s easy-to-learn, language agnostic, and both human and machine readable.

What do we need to know about WFS 3?

The important central part is a simple specification for a feature that is able to be statically hosted somewhere.

https://github.com/radiantearth/stac-spec/blob/master/api-spec/wfs-stac.md

References

Organisations

Radiant Earth
a relatively new non-profit funded by Gates Foundation and Omidyar Network, helping to bring about more open geospatial data for positive global impact, and improved decision-making
Open Geospatial Consortium
OGC

People

Chris Holmes
part of Planet Labs, Radiant Earth and Open Geospatial Consortium[fn:1]

[fn:1] See https://medium.com/@cholmes/radiant-earth-fellow-3a8b959f473

Documentation

Code and Implementations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment