Whichever route you take to implementing containers, you’ll want to steer clear of common pitfalls that can undermine the efficiency of your Docker stack.
The beauty of containers—and an advantage of containers over virtual machines—is that it is easy to make multiple containers interact with one another in order to compose a complete application. There is no need to run a full application inside a single container. Instead, break your application down as much as possible into discrete services, and distribute services across multiple containers. This maximizes flexibility and reliability.
It is possible to install a complete Linux operating system inside a container. In most cases, however, this is not necessary. If your goal is to host just a single application or part of an application in the container, you need to install only the essential pieces inside the container. Installing an operating system on top of those pieces adds unnecessary overhead.
To make the very most of containers, you want each container to be as lean as possible. This maximizes performance and minimizes security risks. For this reason, avoid running services that are not strictly necessary. For example, unless it is absolutely essential to have an SSH service running inside the container—which is probably not the case because there are other ways to log in to a container, such as with the docker exec call—don’t include an SSH service.
Use the “builder pattern”. Docker Multistage builds in Docker CE 17.05+. In a single dockerfile, you have multiple FROM
stages. The ephemeral builder stage container will be discarded so that the final runtime container image will be as lean as possible.
Pros:
- Builds are faster
- Need less storage
- Cold starts (image pull) are faster
- Potentially less attack surface
Cons:
- Less tooling inside container
- “Non-standard” environment
You can run containers without security and monitoring solutions in place. But if you want to use containers securely and efficiently, you’ll need to include these additional components in your stack.
Static Analysis of Containers:
This is a mistake because anyone with access to the registry in which the container image is stored, or to the running container, can read the information. Instead, store sensitive data on a secure filesystem to which the container can connect. In most scenarios, this filesystem would exist on the container host or be available over the network.
As noted earlier, any data stored inside a running container will disappear forever when that container shuts down, unless it is exported to a persistent storage location that is external to the container. That means you need to be sure to have such a persistent storage solution in place for any logs or other container data that you want to store permanently.
A container registry is designed to do one thing and one thing only: store container images. Although container registries offer features that might make them attractive as all-purpose repositories for storing other types of data, doing so is a mistake. Vine learned this the hard way in 2016, when the company was using a container registry to store sensitive source code, which was inadvertently exposed to the public.
Every time you write something to container's filesystem, it activates copy on write strategy.
A new storage layer is created using a storage driver (devicemapper, overlayfs or others). In case of active usage, it can put a lot of load on storage drivers, especially in case of Devicemapper or BTRFS.
Make sure your containers write data only to volumes. You can use tmpfs
for small (as tmpfs stores everything in memory) temporary files:
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: busybox
name: test-container
volumeMounts:
- mountPath: /tmp
name: tempdir
volumes:
- name: tempdir
emptyDir: {}
Use tini or dumb-init (see below for why)
Dumb-init vs Tini vs Gosu vs Su-Exec
PID 1 is special in unix, and so omitting an init system often leads to incorrect handling of processes and signals, and can result in problems such as containers which can't be gracefully stopped, or leaking containers which should have been destroyed.
In Linux, processes in a PID namespace form a tree with each process having a parent process. Only one process at the root of the tree doesn't really have a parent. This is the "init" process, which has PID 1.
Processes can start other processes using the fork and exec syscalls. When they do this, the new process' parent is the process that called the fork syscall. fork is used to start another copy of the running process and exec is used to start different process. Each process has an entry in the OS process table. This records info about the process' state and exit code. When a child process has finished running, its process table entry remains until the parent process has retrieved its exit code using the wait syscall. This is called "reaping" zombie processes.
Zombie processes are processes that have stopped running but their process table entry still exists because the parent process hasn't retrieved it via the wait syscall. Technically each process that terminates is a zombie for a very short period of time but they could live for longer.
Something like tini or dumb-init can be used if you have a process that spawns new processes but doesn't have good signal handlers implemented to catch child signals and stop your child if your process should be stopped etc.
Bash scripts for example do NOT handle and emit signals properly.
Dumb-init supports signal rewriting and Tini doesn't, but Tini supports subreapers and they don't.
Similarly, 'su' and 'sudo' have very strange and often annoying TTY and signal-forwarding behavior. They're also somewhat complex to setup and use (especially in the case of sudo), which allows for a great deal of expressivity, but falls flat if all you need is "run this specific application as this specific user and get out of the pipeline". gosu is a tool for simpler, lighterweight behavior. su-exec is a very minimal re-write of gosu in C, making for a much smaller binary, and is available in the main Alpine package repository.
Gosu (or su-exec) is useful inside containers for processes that are root, but don't want to be. While creating an image, the dockerfile USER is ideal. After the image is created, gosu (or su-exec) is more useful as part of a container initialization when you can no longer change users between run commands in your Dockerfile.
Again, after the image is created, something like gosu allows you to drop root permissions at the end of your entrypoint inside of a container. You may initially need root access to do some initialization steps (fixing uid's, host mounted volume permissions, etc). Then once initialized, you run the final service without root privileges and as pid 1 to handle signals cleanly. For instance, you may trap for instance SIGHUP for reloading the process as you would normally achieve via systemctl reload or such.
Just wanted to point out there are native (and smaller) alternatives to gosu and su-exec.
In debian there is setuidgid.
Busybox su does exactly what su-exec does so there is no need for su-exec in Alpine (unless they fix busybox su), and you can also use setuidgid