📷 Image: Rusted Rail Car by Wendy Bayer
In this post, you will learn how you can craft your Dockerfile
specifically for your application to reduce size, cost,
and risk. It covers Docker Best Practices, uses
Multi-stage builds,
and avoids the
Treating Docker containers as Virtual Machines
Docker anti-pattern.
Back in the "old days" before Docker and containerization, developers were usually stuck making their application run on the machine (i.e. computer) that they were given. Generally, these machines were managed by an entirely different group of people with different concerns, priorities, and managers.
In an "enterprise" organization with lots of machines, this group made these machines the same or very similar. This made operating, administrating, and maintaining (e.g. patching and updating) all these machines possible. This meant that regardless of your application or its needs, the operating system, its version, and the installed operating system packages including your programming language and its version were already set. Even with the introduction of Hypervisors (e.g. VMs), this model has not really changed.
In that world, the developer and the application had to be the ones to adapt to the machine. The application and its packages had to work with the operating system and packages already on the machine. Frequently, the application's language version and even the libraries could not be updated until the machine's operating system was updated. This could result in security vulnerabilities and slower time to market for new features.
❌ This is making the application fit the machine
Now with Docker and containerization, your application is packaged along with its operating environment or machine and you get to choose and build the exact right machine for your application.
✨ This is making the machine fit the application
Unfortunately, many still choose to build their application images from a common "pre-built" base image that builds in all the packages and libraries for all the applications. This is the making the application fit the machine or Treating Docker containers as Virtual Machines which is considered a Docker AntiPattern.
The benefits of this approach is probably a faster build time, but the costs are the coupling of this shared image to the application(s), larger image sizes for slower upload/download, and larger security threat surface especially if also including development and test packages and libraries.
What defines the right machine for an application?
-
Smaller image sizes means faster uploads and downloads and requires less storage
-
Minimum operating system packages and libraries means smaller images, less potential conflicts with your application's libraries, and less security threat surface
-
Different machines for the specific application environment:
- A Build Machine for building your application and/or its libraries (i.e build environment)
- A Deployment Machine for running your application (i.e. production runtime environment)
- A Development Machine for developing and testing your application (i.e. development environment)
This is engineering the minimum machines needed for...
- Running your application in Production
- Container-native development locally and in Continuous Integration (CI) ...
This is also based on Dockerfile
and image building
best practices.
🙇 📖 These are my sources, references, and guides...
- Ruby-specific, BUT: Best practices when writing a Dockerfile for a Ruby application is what got me started on this path; it is for Ruby/Rails applications, BUT it is pretty comprehensive on
Dockerfile
best practices- Google Architecture Guide: Best practices for building containers
- Red Hat Guide: 10 tips for writing secure, maintainable Dockerfiles
- Docker Guide: General best practices for writing Dockerfiles
- Docker Anti-Patterns when you are considering “what to do”, it is prudent to consider “what not to do”
This is a summary of Dockerfile
and image building best practices...
-
Only one application with a single ephemeral lifecycle, a single concern per image/container
- Stateless per 12-factor app processes (self-healing is stopping/starting)
- Don't install dependency systems such as databases, browsers, etc.
-
Use small pinned official (Docker Hub) base image without full OS (packages)
- Minimize overall image size (speeds up builds and faster downloads and uploads)
- Install only necessary packages, defining dependencies and reducing clutter, security threats, and maintenance
- Don't install unnecessary packages or tools
-
Make Image reproducible by having transparent, self-contained, and self-documenting
Dockerfile
- Pin your application dependencies
- Only self-contained idempotent operations
- Don't use external scripts or systems
-
Use Multilayer/Multistage Builds
- Minimize overall image size
- Combine separate production, test, and development build
processing and images into a single
Dockerfile
with same base image - Reduce installed dependencies per layer
- Isolate secrets
- Don't install development or test dependencies in your production image
-
Run your production application with lowest possible privileges (Don't run as root)
Here is a an example Dockerfile
for a simple Ruby/Rails API
that demonstrates these principles and practices.
📖 The
Dockerfile
reference
- Use small pinned official base image without full OS (packages)
- Use Multilayer/Multistage Builds
You can only have a single base image in your
Dockerfile
, but with this approach you will reference
it at least twice; once for your builder machine layer
and once for your final deployment runtime image.
⚙️ If you want some flexibility in specifying your
base image (mostly for versions), you can use the
Docker ARG
instruction which can be overridden in build arguments.
# --- Base Image ---
ARG BASE_IMAGE=ruby:3.2-slim-bookworm
FROM ${BASE_IMAGE} AS ruby-base
This Ruby base image uses the debian operating system with the "slim" build which has minimal packages.
- Make image reproducible by having transparent, self-contained, and self-documenting
Dockerfile
- Install only necessary packages, defining dependencies and reducing clutter, security threats, and maintenance
- Use Multilayer/Multistage Builds
You often have to build your application or at least its
libraries. Building requires special packages (usually the
gcc
compiler ) that minimal base images do not have. You
will need to add them to your base image which makes them
explicit.
You generally don't need these builder packages in your runtime deployment image which is why you use a separate layer here.
You may have two or more builder layers, one for your deployment runtime application and/or libraries and one for your development and testing libraries.
#--- Base Builder Stage ---
FROM ruby-base AS base-builder
# Use the same version of Bundler in the Gemfile.lock
ARG BUNDLER_VERSION=2.5.6
ENV BUNDLER_VERSION=${BUNDLER_VERSION}
# Install base build packages needed for both devenv and deploy builders
ARG BASE_BUILD_PACKAGES='build-essential libpq-dev'
RUN apt-get update \
&& apt-get -y dist-upgrade \
&& apt-get -y install ${BASE_BUILD_PACKAGES} \
&& rm -rf /var/lib/apt/lists/* \
# Update gem command to latest
&& gem update --system \
# install bundler and rails versions
&& gem install bundler:${BUNDLER_VERSION}
# Copy Gemfiles
WORKDIR /app
COPY Gemfile Gemfile.lock ./
- Make image reproducible by having transparent, self-contained, and self-documenting
Dockerfile
- Use Multilayer/Multistage Builds
- Don't install development or test dependencies in your production image
Here is an example of the application needing a different set of packages and application libraries for its development and test operations.
⚙️ This example happens to configure a multi-platform build of the libraries to support multiarchitecture image builds.
#--- Dev Environment Builder Stage ---
FROM base-builder AS devenv-builder
ARG DEV_BUILD_PACKAGES='postgresql-client'
ARG DEVENV_PACKAGES='git curl'
ARG DEVENV_QOL_PACKAGES='vim'
ARG BUNDLER_PATH=/usr/local/bundle
# Install dev environment specific build packages
RUN apt-get update \
&& apt-get -y dist-upgrade \
&& apt-get -y install ${DEV_BUILD_PACKAGES} ${DEVENV_PACKAGES} ${DEVENV_QOL_PACKAGES}\
&& rm -rf /var/lib/apt/lists/* \
# Add support for multiple platforms
&& bundle lock --add-platform ruby \
&& bundle lock --add-platform x86_64-linux \
&& bundle lock --add-platform aarch64-linux \
# Install app dependencies
&& bundle install \
# Remove unneeded files (cached *.gem, *.o, *.c)
&& rm -rf ${BUNDLER_PATH}/cache/*.gem \
&& find ${BUNDLER_PATH}/gems/ -name '*.[co]' -delete
- Use Multilayer/Multistage Builds
👮 In order to run any tests, etc. in a container-based CI/CD, you will need this image.
You want your development environment to contain your application's build environment and test environment because building and testing are part of development.
With only Docker and your favorite editor or Integrated Development Environment (IDE) installed on your computer, you can volume mount your application's source code and have a container-native development environment. Make your changes in your native editor and run them in your container with your terminal window.
⚙️ This example assumes that you will volume mount
your application's source as /app
. Otherwise you can
copy it into your image and volume mount over it.
⚙️ This example uses a CMD
entrypoint to make it
more general and easy to override to run specific commands.
For example in CI, you can override it to run your tests,
your linting, your security scans etc.
Otherwise this assumes it is being run with an interactive
terminal session (e.g. -it
) for local development.
# --- Dev Environment Image ---
FROM devenv-builder AS devenv
WORKDIR /app
# Start devenv in (command line) shell
CMD ["bash"]
To build this development environment image you specify
the devenv
layer as the build --target
...
docker build --target devenv -t my_application-dev .
To run the development environment interactively...
docker run -it --rm -v $(pwd):/app -p 3000:3000 my_application-dev
To use the development environment to run the tests (e.g. in CI)...
docker run -it --rm -v $(pwd):/app -p 3000:3000 my_application-dev bundle exec rspec
- Make image reproducible by having transparent, self-contained, and self-documenting
Dockerfile
- Use Multilayer/Multistage Builds
- Don't install development or test dependencies in your production image
The deployment builder layer will only exist to build the deployment application and/or libraries.
Here any development and test libraries are excluded in building your app and/or its libraries.
#--- Deploy Builder Stage ---
FROM base-builder AS deploy-builder
ARG BUNDLER_PATH=/usr/local/bundle
RUN bundle config set --local without 'development:test' \
# Add support for multiple platforms
&& bundle lock --add-platform ruby \
&& bundle lock --add-platform x86_64-linux \
&& bundle lock --add-platform aarch64-linux \
&& bundle install \
# Remove unneeded files (cached *.gem, *.o, *.c)
&& rm -rf ${BUNDLER_PATH}/cache/*.gem \
&& find ${BUNDLER_PATH}/gems/ -name '*.[co]' -delete \
# Configure bundler to lock to Gemfile.lock
&& bundle config --global frozen 1
- Make image reproducible by having transparent, self-contained, and self-documenting
Dockerfile
- Use Multilayer/Multistage Builds
- Don't install development or test dependencies in your production image
- Don't run as root
This is your deployment image (or candidate) to be deployed in production.
To ensure the minimal image size and threat surface, it starts
from the base image, adds only necessary runtime operating system
packages, and copies over the built libraries and app (or its
source) from the deployment builder (e.g. deploy-builder
)
🔒 You should add a restricted-access user and group for
running your app so that your container is not running as
root. Adding this user and group is specific to your base
image but you can use the Dockerfile
USER
command to run your container as this user.
When you copy your app and its libraries, you must change the file/directory ownership to your app's user.
Unfortunately, the
Docker ARG
instruction
"goes out of scope at the end of the build stage where it was defined."
This is why BUNDLER_VERSION
is defined here and in the
base builder layer.
#--- Deploy Image ---
FROM ruby-base AS deploy
# Use the same version of Bundler in the Gemfile.lock
ARG BUNDLER_VERSION=2.5.6
ENV BUNDLER_VERSION=${BUNDLER_VERSION}
# Install runtime packages
ARG RUNTIME_PACKAGES='postgresql-client'
# Update package info since this is from base image not builder
RUN apt-get update \
&& apt-get -y dist-upgrade \
&& apt-get -y install ${RUNTIME_PACKAGES} \
&& rm -rf /var/lib/apt/lists/*
# Add user for running app
RUN adduser --disabled-password --gecos '' deployer
USER deployer
WORKDIR /app
# Copy the built gems directory from builder layer
COPY --from=deploy-builder --chown=deployer /usr/local/bundle/ /usr/local/bundle/
# Copy the app source
COPY --chown=deployer . /app/
# Run the server with any required setup
CMD ["./entrypoint.sh"]
Because this image/layer starts FROM
the base image
(and there can only be one), you can not use a "prebuilt" or
"cached" or "machine" image to speed up your image builds.
This is why the builder layers are necessary.
No good comes without a cost.
The benefits of this approach are...
- Smaller image sizes for faster upload/download
- Smaller threat surface in production
- Reproducible, self-contained, and idempotent images
- Container-native development environment
The drawbacks of this approach are...
- Duplication in
Dockerfile
s especially across repositories - Slower build times since there is no "cached" or prebuilt "machine" base image with installed operating system packages or libraries
- Must build two images: deployment and development environment