Right now, the builder is very tightly coupled with docker core. The builder has the following roles:
- process the build context
- parse Dockerfile into sexp-expressions
- evaluate individual sexp-expressions which may involves the following job:
- persist configurations i.e ENV, EXPOSE, VOLUME, etc.
- import external data into a new layer i.e ADD, COPY, etc.
- run extra commands that creates a new the layer i.e. RUN, etc.
- current builder is not extensible. The only interface to the builder right now is Dockerfile, which in many cases are clunky / hard to extend / hard to deprecate.
- hard to guarantee backward-compatibilities between different versions of Dockerfile and/or deprecate old Dockerfile instructions. Ideally I would want a Dockerfile to always be able to build even with new version of Docker and/or builder.
Semantics of processing the build context as well as parsing dockerfile should be separated from docker core. Docker core should only concern about processing the build layers ( ideally via a set of defined API ). For example a set of Dockerfile instructions can be translated into API calls as follow: ( all the calls are just example and may not be correct at all)
--> POST /build
47c0 # build session id
FROM golang:1.3.1
--> POST /images/create?fromImage=golang:1.3.1
asdf
--> POST /build/47c0/create?fromImage=asdf
31f1 # container id
--> POST /build/47c0/commit/31f1
31f0 # layer id
ADD . /go/src/app
--> POST /build/47c0/add?fromImage=31f0&path=/go/src/app
<tar stream>
f50c # container id
--> POST /build/47c0/commit/f50c
f41c # layer id
BUILD /go/bin # nested build
--> POST /containers?fromImage=f41c
f401
--> POST /containers/f401/copy?resource=/go/bin
</go/bin tar stream> # saved somewhere else so we can reuse after
d405
# not commit because we dont want cache here
FROM scratch
--> POST /images/create?fromImage=golang:1.3.1
scratch
--> POST /build/47c0/add?fromImage=scratchf0&path=/go/bin
</go/bin tar stream> # previously extracted /go/bin
d789
ADD dnsdock /
--> POST /build/47c0/add?fromImage=31f0&path=/go/src/app
<tar stream>
e234 # container id
--> POST /build/47c0/commit/e234
qwer # layer id
ENV CGO_ENABLED 0
--> POST /images/create?fromImage=golang:1.3.1&change="ENV CGO_ENABLED 0"
wert # layer id
By separating the core API from the builder, we are also able to separate the semantics of building/processing build context + dockerfile, which means that user can write his own build script that is not depends on Dockerfile, but instead programmatically processing build context and calling the build API directly. For example, we would be able to write an ansible builder that build ansible playbooks into a container by writing a custom wrapper around ansible connections
This also means that as long as a client has access to Docker's API endpoint, it can behave as a builder, which opens up several possibility of decouple build file semantics from the core. For example, we can have a builder that builds dockerfile v1 that will be run as:
# this starts a builderv1 that parse Dockerfile v1 inside /buildcontext
# and translate those into API calls
docker run \
-v /var/run/docker.sock:/var/run/docker.sock docker/builderv1 \
-v /tmp/buildcontext:/buildcontext /bin/builder
while dockerfile v2 will be run as:
# this starts a builderv1 that parse Dockerfile v2 inside /buildcontext
# and translate those into API calls
docker run \
-v /var/run/docker.sock:/var/run/docker.sock docker/builderv2 \
-v /tmp/buildcontext:/buildcontext /bin/builder
docker build -t image .
will be translated into the corresponding docker run
after processing the build context into /tmp/buildcontext
In the same spirit, an custom builder will be: ( taken from https://gist.github.com/tonistiigi/c7b539c2a1a0568020c6 )
# this starts a builderv1 that parse Dockerfile v2 inside /buildcontext
docker run \
-v /var/run/docker.sock:/var/run/docker.sock nitrousio/builder \
-v /tmp/buildcontext:/buildcontext /bin/builder
> cat /bin/builder
#!/usr/bin/env nodejs
var docker = require('docker-builder')
var container = docker.New()
container.copy('/context/foo', '/bar')
container.commit() // also sets itself to next
container.env('FOO', 'bar')
container.workdir('/foo/bar')
container.commit()
docker.tag(container)
NOTE: However Core API also depends on some of the Dockerfile instruction for changing image configuration ( see commit --change, or import --change ), so we might also have to maintain the (latest?) parser inside the core.
Are so many new Remote API endpoints even needed? I would have though that after
docker commit --change
andscp
compatibledocker cp
we would have all the use cases covered.To keep the maintenance workload under control we should avoid creating the same functionality in many places and try more to inherit it from where its already implemented. Still, I think this is a better approach than the "separate binary for everything", proposed by @erikh. Even in that case, these binaries would just connect to some outer service, meaning that they can be bypassed and new builder-libraries can be created(like the Node one in the example). We should see this as inevitable thing that will happen and only design for the API. These binaries should be just one implementation on top of that.
I still think that for creating 99% of the images, current Dockerfile format is the best solution. But I am in favor of having a lower level implementation and making current Dockerfile based building on top of that as a separate tool(at least when I don't take the loss of observability into an account). As well as any other alternative tool to build Docker images.
It is important that the build process happens inside a container like in your
docker run
examples. This should (somehow) be a hard requirement to make sure images are always easy to rebuild by anyone. In my own write-up I suggested a shebang + comment for this.One difficulty with this approach is caching. The build script will always need to run through(this is fast anyway) but API needs to be designed so that every step that adds a new layer has a cacheable criteria(like it does atm). Turning everything into a single layer based on the script checksum should not be acceptable. Another, and more complicated problem is the cleanup of a cancelled build. This has a very bad UX in current implementation also, but I think in current implementation it could at least be added quite easily. In here its more complicated because if we just cancel in some random point things could end up in very broken state.
Following nested builds example, it wasn't quite clear for me. But I guess the general idea is to have an endpoint for downloading a tar from a container and another for starting a new layer hierarchy. One of the key points in nested builds is that layers should be maintained. I do not want to use it just for static binaries. To me, the assumption that the best environment for building an image is also the best environment for running it, is wrong and needs to be addressed.