TL;DR;
- More secure foundation for referencing images and layers
- New distribution manifest and pull features
- Upgrading old images will include a migration step
- A migration tool exists in order to minimize migration time
This post describes the upcoming changes to the way Docker Engine stores images and filesystem data in containers. These changes are coming to users starting from version v1.10. (FIXME: link to download rc1)
Starting from v1.10 we completely change the way we address the image data on disk. Previously, every image and layer used a randomly assigned UUID, now we have opted for a content addressable method and use an ID that is based on as secure hash of the image and layer data.
The new method gives users more security, provides a built-in way for avoiding ID collisions and guarantees data integrity after pull and push or load and save. It also brings better sharing of layers by allowing many images to freely share their layers even if they didn't come from the same build.
Addressing images by their content also lets us more easily detect if something has already been downloaded. Because we have separated images and layers, you don't have to pull the configurations for every image that was part of the original build chain and we also don't need to create layers for the build instructions that didn't modify the filesystem.
Content addressability is also a foundation that makes the new distribution features possible. The image pull and push code has been reworked to use a download/upload manager concept that should make push and pull much more stable and mitigate any issues with parallel requests. Download manager also brings retries on failed downloads and better prioritization for concurrent downloads.
We are also introducing a new manifest format that is built on top of the content addressable base. It directly references the content addressable image configuration and layer checksums. The new format also has support for a manifest list that can be used for targeting multiple architectures/platforms. Moving to the new format should be completely transparent. If the registry supports the new format, Docker will start using it on pushes. The registry itself has a converter that makes the new manifests also available to old clients.
To make your current images accessible to the new model we have to migrate them to content addressable storage. This means calculating the secure checksums for your current data.
All your current images, tags and containers are automatically migrated to the new foundation the first time you start updated Docker daemon. Before loading your container, the Docker daemon will calculate all needed checksums for your current data, and after it has completed, all your images and tags will have brand new secure IDs.
While this is simple, calculating SHA256 checksums for your files can take time if you have lots of image data. On average you should assume that migrator can process data at a speed of 100MB/s. During this time your Docker daemon won’t be ready to respond to requests.
If you can accept this one time hit, then upgrading Docker Engine and restarting the daemon will transparently take care of migrating you images. However, if you want to minimize your daemon's downtime, we have prepared a separate migration utility that you can use while your old daemon is still running.
This program will find all your current images and calculate the checksums for them, then after you upgrade and restart your daemon, the checksum data will already be there and Docker can start using it. You can get it from https://github.com/docker/v1.10-migrator/releases.
The migration tool can also be run as a Docker image. While running the migrator image you need to expose your Docker data directory to the container. If you use the default path then it would look like docker run --rm -v /var/lib/docker:/var/lib/docker docker-1.10-migrator
. If you use the devicemapper storage driver, you also need to include --privileged
to give the tool access to your storage devices.