Multi-stage builds in docker

Having spent a bit of time working to dockerize an API I wanted to share one of my learnings in case it’s helpful or interesting to others.

Granting access to private git repositories

The first issue I ran into was how to authenticate requests to gems we host privately on Github. We do this quite a bit throughout our repos.

So I needed a way to grant access to our docker image in order for bundler to run properly. Because were trying to build for production deployment I first decided to remove HTTPS repository references (https://github.com/ldock/) and replace them with SSH URLs ([email protected]:ldock/). But this introduced the challenge of getting the private key on the container securely. I needed the ability to get an authorized SSH key on the image when dependencies were being installed (bundle install) but I did not want to leave the key on the file system after the dependencies were installed.

Enter multi-stage docker builds...

Multi-stage docker builds were introduced to help keep docker image sizes down. Apparently this is a huge challenge for heavily containerized services when deploying at scale. Heres how they work:

Every time you see a FROM statement in a dockerfile a new “stage” is introduced.

Here are some examples you’ll see in ldock dockerfiles:

FROM teamldock/ruby:2.4.1-6c8a537
FROM ruby:2.3.3-slim
FROM heroku/cedar:14

Most dockerfiles start with a FROM statement declaring what image will be used for the container and if multiple FROM statement appear in a dockerfile then the build is considered to be “multi-stage”.

The beauty of multiple stages is that you can start your build with a fat image that has helpful dependencies like git, bash, or bundler (or whatever). Then introduce a smaller image (alpine) before the image is finalized and older stages are discarded. The magic is in dockers ability to copy artifacts between stages. Here’s an example of you how you can do this:

Example

FROM ruby:2.5.0 as dependency_stage
RUN bundle install

FROM ruby:2.5.0-alpine # Alpine Linux is much smaller than most distribution base images (~5MB)
COPY --from=dependency_stage /usr/local/lib/ruby/gems/2.5.0/bundler/gems .

CMD ["rails", "server"]

Explainer

Quick review of what happens in this dockerfile ☝️…

1. The build starts its first stage on top of an official ruby docker image (it has 2.5.0. installed)
2. `bundler install` runs
3. A new build stage begins on an alpine ruby image (which is much smaller than the image from the first stage)
4. The gems install in the first stage (during step 2) are copied to the alpine image
5. rails server starts on the executing container

…

Wrap up

SO! Getting back to the original task of giving the api image access to our private gems…multi-stage builds afford a way to install the dependencies on the image but not include private SSH keys in the final container.

References:

AWS / Docker Questions 1
https://docs.docker.com/develop/develop-images/multistage-build/
https://vsupalov.com/build-docker-image-clone-private-repo-ssh-key/#multi-stage-builds

yohanmishkin/doc.md

Multi-stage builds in docker

Granting access to private git repositories

Enter multi-stage docker builds...

Wrap up