Proposal F

Allow for query of registry schemas.

Description

As the number of different kinds of references increases, it becomes more important to be able to intelligently query. This proposal is based off of Proposal B but could generally be applied to any json structure.

Links

Description	Link
GitHub issue where this was first proposed	View
Sample code of a working proof-of-concept	View
Blog post which proves the point	View

Modifications

JSON Schema

Given a new field, reference, as proposed in PROPOSAL_B:

{
  "mediaType": "icecream/scoops.vanilla.v1.json",
  "size": 2345,
  "digest": "sha256:b2b2b2...",
  "reference": { // <----- New field
    "mediaType": "icecream/cone.v1.json",
    "size": 1234,
    "digest": "sha256:a1a1a1..."
  }
}

We expose a new query language to filter down a single listing, or across the references in the registry.

Registry HTTP API

We will first start with an example. Let's say we use the example in PROPOSAL_B:

GET /v2/<name>/manifests/<ref>/references
GET /v2/products/cones/manifests/neapolitan/references

generally yields:

{
  "manifests": [
    <descriptor1>,
    <descriptor2>,
    ...
  ]
}

or specifically yields:

{
  "manifests": [
    {
      "mediaType": "icecream/scoops.vanilla.v1.json",
      "size": 2345,
      "digest": "sha256:b2b2b2..."
    },
    {
      "mediaType": "icecream/scoops.chocolate.v1.json",
      "size": 2345,
      "digest": "sha256:c3c3c3..."
    },
    {
      "mediaType": "icecream/scoops.strawberry.v1.json",
      "size": 2345,
      "digest": "sha256:d4d4d4..."
    }
  ]
}

We could then filter this down with a query, which would be especially useful if the result is long.

GET /v2/products/cones/manifests/neapolitan/references?q_manifests__mediatype__contains=vanilla

{
  "manifests": [
    {
      "mediaType": "icecream/scoops.vanilla.v1.json",
      "size": 2345,
      "digest": "sha256:b2b2b2..."
    },
    {
      "mediaType": "icecream/scoops.vanilla.v1.json",
      "size": 2345,
      "digest": "sha256:c3c3c3..."
    },
    {
      "mediaType": "icecream/scoops.vanilla.v1.json",
      "size": 2345,
      "digest": "sha256:d4d4d4..."
    }
  ]
}

However, this is more useful to query up one level across the registry:

Find me all references of the type chocolate.

GET /v2/_oci/references?q_manifests__mediatype__contains=chocolate

That might be an angrier query, implementation wise. The registry implementation would probably want to provide pagination and rate limiting to not abuse it, and some kind of indexing. There will be no changes to existing schema, as the query will simply allow us to better filter the provided content. The attributes and structure of the current schema are the drivers of the query string.

Query Format

The query format string always starts with q_ and is broken into the following pieces:

GET ?q_<attribute>__<nested-attribute>__<comparison>=vanilla
GET ?q_manifests__mediatype__contains=vanilla

And would mirror how the Django Query API All letters for the query parameter should be provided in lower case (and would be transformed if not), and all queries should be case insensitive. Comparison values can be string or numerican (e.g., for a size or version). The following comparison values are supported:

name	description	example
contains	search for any match that include a value	`?q_manifests__mediatype__contains=vanilla`
notcontains	search for any match that doesn't include a value	`?q_manifests__mediatype__contains=vanilla`
eq	search for exact match (case insensitive)	`?q_manifests__mediatype__eq=chocolate`
ne	search for values that aren't of the exact match	`?q_manifests__mediatype__ne=strawberry`
gt	search for values that are greater than the queyr	`?q_manifests__size__gt=400`
lt	search for values that are less than the query	`?q_manifests__size__lt=600`

Registry behavior

If a query is not valid for any items in the data structure, an error response should be returned with a meaningful message
If a query is only valid for some items, the rest can be ignored.
Thus an empty response (with 200 OK) says that the query was run successfully (and no matches).

Requirements

The requirements stated for the working group only are specific to a digest or tag. This proposal extends filtering and search to be for any attribute in json provided by the API, and allow for custom filtering. The closest match is:

As a user, I want to query the registry for stored objects that reference a container image filtering by type (eg. Signature, SBOM, etc) or by annotation (I want to see all signatures from this identity)
As a tool writer, I would like to be able to efficiently query artifacts of different types attached to a given digest
As a tool writer, I would like to be able to query for a specific artifact based on artifact type and other user defined annotations on the artifact

These use cases are sort of related, if "most updated" can be derived from a search query.

As a user, I want to identify the most updated artifact in a registry
As a user, I would like to be able to map monotonically increasing product versions to container images so I have an idea of deployment progression

jdolitsky/PROPOSAL_F.md