Multi-architecture support in content manifest scheme version 2

The new generic distributed content manifest format (schema version 2) [see distribution/distribution#62] is approaching final approval, and while some discussion has happened in-line, it seems reasonable to break out the various pieces that would be required to implement a multi-arch Docker image solution on top of the flexible manifest format being proposed.

As a starting point it is useful to discuss what the intended use case is for multi-architecture images in the Docker platform. The following requirements summarize the expected capabilities of the engine + registry + storage/retrieval format that is implemented.

Requirements

A specific repository name:tag manifest will need to contain the proper information for access/retrieval of multiple architectures
- For example docker pull ubuntu:15.04 should have the capability on a POWER system to pull the POWER-specific image layers which comprise an Ubuntu 15.04 image
Not every name:tag manifests will have information for more than one architecture (e.g. the default amd64/linux images of today)
docker push of name:tag manifest should create a manifest which is {tagged|labeled|marked} appropriately for its arch/os settings (already encoded in Docker's layer/image metadata)
- When other architecture manifests are pushed of the same name:tag a "merge-like" capability should be provided to create a "super-manifest" which references each of the architecture-specific manifests which are available for that name:tag
- The push activity should not need to be coordinated across architectures: I may create an amd64/linux manifest today, and a ppc64le/linux manifest next week. A GET for that repository manifest path from my registry in week 1 should respond with one manifest supporting architecture/os pair amd64/linux, and in week 2 the same GET should respond with a super-manifest referencing the pushed amd64/linux and ppc64le/linux content manifests.

Potential implementation

Josh Hawn has provided a potential idea for this super-manifest/"fat manifest" using the capabilities already provided for in the generic manifest schema v2 referenced above. See his comment here: distribution/distribution#62 (comment)

The idea specifically is that given the list of dependencies in a manifest each have a mediaType, and in his example manifest he references a v1 image spec/config as mediaType: application/vnd.docker.container.image.params.v1+json and the layers associated with it as mediaType: application/vnd.docker.container.image.layer+x-gtar, there could be a mediaType called application/vnd.docker.container.image.combined+json which has content like:

{
    "linux/amd64": "docker.com/library/ubuntu@fa7139b345c6f88c8329e68d15864baf7d2b907ed435e3996ba983ab0ebcb7d1",
    "linux/arm": "docker.com/library/ubuntu@e457c078b68d843c8e050b454936a600a0c5d133c094e42bb4fcda68f072e976",
    "linux/ppc64le": "docker.com/library/ubuntu@7ccf96608588961b1e8e4cc3d7ec02906efb144b5b092f89c4292c74d5d1cde8"
}

Given the flexibility of the manifest format, these then become a layer of indirection for a client to algorithmically choose the os/arch combination required (or respond to an end-user that the requested content does not exist for their os/arch) and follow the pointer to get the manifest with dependencies containing the actual image params and layer data for that architecture.

Implementation steps

Distribution PR#62 needs to be finalized and agreed-to by all parties - underway as we speak.
Need agreement across Distribution and Engine projects on the combined mediaType for fat/super-manifest content.
Docker daemon (or dist command?) algorithm for pull needs to properly handle combined mediaType redirection based on os/arch of daemon
Docker daemon (or dist command?) push implementation needs capability to create super-manifest when second os/architecture of the same name:tag image is pushed to a v2 schema-supporting registry.
Any coordination required with swarm, machine, compose regarding the super-manifest?

Future: Orchestration clients may need a way to query/specify/hint os/arch requests. Use case: docker machine is used to create Docker engines across a multitude of systems, including those of different architectures. When workloads are deployed using swarm, and there are multiple architectures supported by the underlying machines and supported by the images used in the workloads being deployed, who will decide which architecture to use? Should there be a hinting/policy mechanism allowed to prefer POWER, for example, for certain images or workloads, and prefer amd64 for others?

@duglin @estesp The straw man in distribution/distribution#62 (comment) suggests placing the architecture dispatching in the "application" field, rather than the "dependencies" field. The gist should clarify that.

Is that because its still TBD or because the API of the registry already supports the daemon specifying the arch/os on the push/pulls ?

@duglin The goal of this proposal is to avoid coupling runtime specifics, such as arch/os, with the registry API, which is focused on content distribution. There is room for such indexing in a search API, focused on discoverability, but this is outside the use case of the primary registry API.

@estesp This proposal is looking good. It compiles a lot of disparate information into a single place. It might be a good idea to work out an example of combining a group of manifests. I have a feeling that this should be an explicit operation, with its own command, rather than something that happens automatically. Perhaps, something like this:

$ docker pack --multiarch <digest0> <digest1> ... <digestN>
<fat manifest digest>

The resulting digest could be tagged and pushed like any other. Running it would dispatch to the correct manifest.

estesp/Multi-arch Implementation.md

Requirements

Potential implementation

Implementation steps

estesp commented Feb 7, 2015

duglin commented Mar 2, 2015

stevvooe commented Mar 5, 2015