Craft your Dockerfile specifically for your application to reduce size, cost, and risk

Dockerfile: Build the Right Machine for Your App

Rusted Rail Car - Wendy Bayer

📷 Image: Rusted Rail Car by Wendy Bayer

In this post, you will learn how you can craft your Dockerfile specifically for your application to reduce size, cost, and risk. It covers Docker Best Practices, uses Multi-stage builds, and avoids the Treating Docker containers as Virtual Machines Docker anti-pattern.

Before Containerization

Back in the "old days" before Docker and containerization, developers were usually stuck making their application run on the machine (i.e. computer) that they were given. Generally, these machines were managed by an entirely different group of people with different concerns, priorities, and managers.

In an "enterprise" organization with lots of machines, this group made these machines the same or very similar. This made operating, administrating, and maintaining (e.g. patching and updating) all these machines possible. This meant that regardless of your application or its needs, the operating system, its version, and the installed operating system packages including your programming language and its version were already set. Even with the introduction of Hypervisors (e.g. VMs), this model has not really changed.

In that world, the developer and the application had to be the ones to adapt to the machine. The application and its packages had to work with the operating system and packages already on the machine. Frequently, the application's language version and even the libraries could not be updated until the machine's operating system was updated. This could result in security vulnerabilities and slower time to market for new features.

This is making the application fit the machine

Containerized Applications: The Right Tool for the Job

Now with Docker and containerization, your application is packaged along with its operating environment or machine and you get to choose and build the exact right machine for your application.

This is making the machine fit the application

Unfortunately, many still choose to build their application images from a common "pre-built" base image that builds in all the packages and libraries for all the applications. This is the making the application fit the machine or Treating Docker containers as Virtual Machines which is considered a Docker AntiPattern.

The benefits of this approach is probably a faster build time, but the costs are the coupling of this shared image to the application(s), larger image sizes for slower upload/download, and larger security threat surface especially if also including development and test packages and libraries.

Defining the Right Machine for Your Application

What defines the right machine for an application?

  • Smaller image sizes means faster uploads and downloads and requires less storage

  • Minimum operating system packages and libraries means smaller images, less potential conflicts with your application's libraries, and less security threat surface

  • Different machines for the specific application environment:

    • A Build Machine for building your application and/or its libraries (i.e build environment)
    • A Deployment Machine for running your application (i.e. production runtime environment)
    • A Development Machine for developing and testing your application (i.e. development environment)

This is engineering the minimum machines needed for...

  1. Running your application in Production
  2. Container-native development locally and in Continuous Integration (CI) ...

This is also based on Dockerfile and image building best practices.

Dockerfile Best Practices

🙇 📖 These are my sources, references, and guides...

This is a summary of Dockerfile and image building best practices...

  • Only one application with a single ephemeral lifecycle, a single concern per image/container

    • Stateless per 12-factor app processes (self-healing is stopping/starting)
    • Don't install dependency systems such as databases, browsers, etc.
  • Use small pinned official (Docker Hub) base image without full OS (packages)

    • Minimize overall image size (speeds up builds and faster downloads and uploads)
    • Install only necessary packages, defining dependencies and reducing clutter, security threats, and maintenance
    • Don't install unnecessary packages or tools
  • Make Image reproducible by having transparent, self-contained, and self-documenting Dockerfile

    • Pin your application dependencies
    • Only self-contained idempotent operations
    • Don't use external scripts or systems
  • Use Multilayer/Multistage Builds

    • Minimize overall image size
    • Combine separate production, test, and development build processing and images into a single Dockerfile with same base image
    • Reduce installed dependencies per layer
    • Isolate secrets
    • Don't install development or test dependencies in your production image
  • Run your production application with lowest possible privileges (Don't run as root)

An Example Dockerfile

Here is a an example Dockerfile for a simple Ruby/Rails API that demonstrates these principles and practices.

📖 The Dockerfile reference

The Base Image

  • Use small pinned official base image without full OS (packages)
  • Use Multilayer/Multistage Builds

You can only have a single base image in your Dockerfile, but with this approach you will reference it at least twice; once for your builder machine layer and once for your final deployment runtime image.

⚙️ If you want some flexibility in specifying your base image (mostly for versions), you can use the Docker ARG instruction which can be overridden in build arguments.

 # --- Base Image ---
ARG BASE_IMAGE=ruby:3.2-slim-bookworm
FROM ${BASE_IMAGE} AS ruby-base

This Ruby base image uses the debian operating system with the "slim" build which has minimal packages.

The (Base) Builder

  • Make image reproducible by having transparent, self-contained, and self-documenting Dockerfile
  • Install only necessary packages, defining dependencies and reducing clutter, security threats, and maintenance
  • Use Multilayer/Multistage Builds

You often have to build your application or at least its libraries. Building requires special packages (usually the gcc compiler ) that minimal base images do not have. You will need to add them to your base image which makes them explicit.

You generally don't need these builder packages in your runtime deployment image which is why you use a separate layer here.

You may have two or more builder layers, one for your deployment runtime application and/or libraries and one for your development and testing libraries.

#--- Base Builder Stage ---
FROM ruby-base AS base-builder

# Use the same version of Bundler in the Gemfile.lock

# Install base build packages needed for both devenv and deploy builders
ARG BASE_BUILD_PACKAGES='build-essential libpq-dev'

RUN apt-get update \
  && apt-get -y dist-upgrade \
  && apt-get -y install ${BASE_BUILD_PACKAGES} \
  && rm -rf /var/lib/apt/lists/* \
  # Update gem command to latest
  && gem update --system \
  # install bundler and rails versions
  && gem install bundler:${BUNDLER_VERSION}

# Copy Gemfiles
COPY Gemfile Gemfile.lock ./

The Development Environment Builder

  • Make image reproducible by having transparent, self-contained, and self-documenting Dockerfile
  • Use Multilayer/Multistage Builds
  • Don't install development or test dependencies in your production image

Here is an example of the application needing a different set of packages and application libraries for its development and test operations.

⚙️ This example happens to configure a multi-platform build of the libraries to support multiarchitecture image builds.

#--- Dev Environment Builder Stage ---
FROM base-builder AS devenv-builder

ARG DEV_BUILD_PACKAGES='postgresql-client'
ARG BUNDLER_PATH=/usr/local/bundle

# Install dev environment specific build packages
RUN apt-get update \
  && apt-get -y dist-upgrade \
  && rm -rf /var/lib/apt/lists/* \
  # Add support for multiple platforms
  && bundle lock --add-platform ruby \
  && bundle lock --add-platform x86_64-linux \
  && bundle lock --add-platform aarch64-linux \
  # Install app dependencies
  && bundle install \
  # Remove unneeded files (cached *.gem, *.o, *.c)
  && rm -rf ${BUNDLER_PATH}/cache/*.gem \
  && find ${BUNDLER_PATH}/gems/ -name '*.[co]' -delete

The Development Environment Image

  • Use Multilayer/Multistage Builds

👮 In order to run any tests, etc. in a container-based CI/CD, you will need this image.

You want your development environment to contain your application's build environment and test environment because building and testing are part of development.

With only Docker and your favorite editor or Integrated Development Environment (IDE) installed on your computer, you can volume mount your application's source code and have a container-native development environment. Make your changes in your native editor and run them in your container with your terminal window.

⚙️ This example assumes that you will volume mount your application's source as /app. Otherwise you can copy it into your image and volume mount over it.

⚙️ This example uses a CMD entrypoint to make it more general and easy to override to run specific commands. For example in CI, you can override it to run your tests, your linting, your security scans etc.

Otherwise this assumes it is being run with an interactive terminal session (e.g. -it) for local development.

# --- Dev Environment Image ---
FROM devenv-builder AS devenv


# Start devenv in (command line) shell
CMD ["bash"]

To build this development environment image you specify the devenv layer as the build --target...

docker build --target devenv -t my_application-dev .

To run the development environment interactively...

docker run -it --rm -v $(pwd):/app -p 3000:3000 my_application-dev

To use the development environment to run the tests (e.g. in CI)...

docker run -it --rm -v $(pwd):/app -p 3000:3000 my_application-dev bundle exec rspec

The Deployment Builder

  • Make image reproducible by having transparent, self-contained, and self-documenting Dockerfile
  • Use Multilayer/Multistage Builds
  • Don't install development or test dependencies in your production image

The deployment builder layer will only exist to build the deployment application and/or libraries.

Here any development and test libraries are excluded in building your app and/or its libraries.

#--- Deploy Builder Stage ---
FROM base-builder AS deploy-builder

ARG BUNDLER_PATH=/usr/local/bundle

RUN bundle config set --local without 'development:test' \
    # Add support for multiple platforms
    && bundle lock --add-platform ruby \
    && bundle lock --add-platform x86_64-linux \
    && bundle lock --add-platform aarch64-linux \
    && bundle install \
    # Remove unneeded files (cached *.gem, *.o, *.c)
    && rm -rf ${BUNDLER_PATH}/cache/*.gem \
    && find ${BUNDLER_PATH}/gems/ -name '*.[co]' -delete \
    # Configure bundler to lock to Gemfile.lock
    && bundle config --global frozen 1

The Deployment (Production) Image

  • Make image reproducible by having transparent, self-contained, and self-documenting Dockerfile
  • Use Multilayer/Multistage Builds
  • Don't install development or test dependencies in your production image
  • Don't run as root

This is your deployment image (or candidate) to be deployed in production.

To ensure the minimal image size and threat surface, it starts from the base image, adds only necessary runtime operating system packages, and copies over the built libraries and app (or its source) from the deployment builder (e.g. deploy-builder)

🔒 You should add a restricted-access user and group for running your app so that your container is not running as root. Adding this user and group is specific to your base image but you can use the Dockerfile USER command to run your container as this user.

When you copy your app and its libraries, you must change the file/directory ownership to your app's user.

Unfortunately, the Docker ARG instruction "goes out of scope at the end of the build stage where it was defined." This is why BUNDLER_VERSION is defined here and in the base builder layer.

#--- Deploy Image ---
FROM ruby-base AS deploy

# Use the same version of Bundler in the Gemfile.lock

# Install runtime packages
ARG RUNTIME_PACKAGES='postgresql-client'
# Update package info since this is from base image not builder
RUN apt-get update \
  && apt-get -y dist-upgrade \
  && apt-get -y install ${RUNTIME_PACKAGES} \
  && rm -rf /var/lib/apt/lists/*

# Add user for running app
RUN adduser --disabled-password --gecos '' deployer
USER deployer


# Copy the built gems directory from builder layer
COPY --from=deploy-builder --chown=deployer /usr/local/bundle/ /usr/local/bundle/

# Copy the app source
COPY --chown=deployer . /app/

# Run the server with any required setup
CMD ["./"]

Because this image/layer starts FROM the base image (and there can only be one), you can not use a "prebuilt" or "cached" or "machine" image to speed up your image builds. This is why the builder layers are necessary.

The Ramifications

No good comes without a cost.

The benefits of this approach are...

  • Smaller image sizes for faster upload/download
  • Smaller threat surface in production
  • Reproducible, self-contained, and idempotent images
  • Container-native development environment

The drawbacks of this approach are...

  • Duplication in Dockerfiles especially across repositories
  • Slower build times since there is no "cached" or prebuilt "machine" base image with installed operating system packages or libraries
  • Must build two images: deployment and development environment
