Anonymous docker/rpm assembly tool

Code exists in a branch of rpm-ostree:

https://github.com/cgwalters/rpm-ostree/commits/libhif-next-compose-dockerimages

Executive summary

Dockerfile is an imperative format that allows doing anything, and as a consequence the tooling around it (docker itself, rel-eng tools) are unable to accurately optimize.

This tool is like rpm-ostree in creating a declarative and rigorous binding between Docker and just RPM packages.

This allows a number of optimizations, critical bug fixes, and user experience improvements for the subset of users who want to make Docker containers out of pure RPM packages. It allows release engineering for RPM-based distributions to be much simpler.

Future of this effort

Write in Python (probably) because hawkey/librepo have bindings
Would like to transfer the ideas
Pick up by one of: dnf/packaging, RCM, Dock?
Extend to support config?: See coreos/rpm-ostree#96

Example

{
  "repos": ["fedora-rawhide"],

  "images": {

    "baseimage": {
      "packages": ["yum"]
    },

    "freeipa": {
      "packages": ["freeipa"],
      "entrypoint": "/usr/libexec/freeipa-server",
      "expose": "80"
    }

    "nodejs": {
      "packages": ["nodejs"]
    },

    "django": {
      "packages": ["django"]
    }
  }
}

What about non-RPM content?

Derive from an image created this way using a regular Dockerfile FROM directive. That's a nice part about Docker - images generated in different ways interoperate via simple layering.
Create a binary-only RPM (really, it's not hard!)

Plus: Always-enabled releng RPM-image generation tool

One thing that is quite easy for this tool to achieve is to regenerate an image if and only if its set of input RPMs change. It can do that without doing an expensive re-install + throw away, by doing a dependency resolution and comparing that versus the target state.

Because of that, rather than having "Docker builds" being something that humans log in and initiate by hand, this could become part of a background infrastructure that is ensuring the Docker images for apps are always continually up to date.

(And of course, once we enable upstream git -> RPM, things get more exciting...)

Plus: accurate caching

Have you ever had to choose between this

RUN yum update #nocache20150110.0

or using docker build --no-cache? Right. This tool only downloads metadata for the specified repositories once, and supports caching it on the host where it can be reused by multiple generations of containers.

Can still generate a "traditional" base image

What we currently call the "base image" here could also simply fall out of this tool by specifying an image which contains yum.

Of course, it's also easily possible to generate images without yum inside them at all. That leads to much more minimal containers, and dovetails with the "automatic updates" model above - you treat images as immutable on the client.

Plus: automatic layering computation

For app authors that maintain multiple apps, it's not uncommon for them to have a "midlayer" tier of say their favorite logging library or whatever. With Docker they have to manually re-create the layering that's already implicit in the RPM dependency set.

This tool can simply compute a constantly optimal chain from the RPM dependencies alone, without repeating things.

OpenShift Dockerfiles

Ideally, can replace:

cgwalters/gist:73072640f9f19fdc205f