On IOTstack and Docker volumes

IOTstack is a framework for running arbitrary collections of Docker containers using docker compose. The canonical example is the "MING" stack (Mosquitto, InfluxDB, Node-RED and Grafana).

From time to time, someone asks a question like this:

I have read that Docker's "volume mounts" are recommended and/or better than "bind mounts". Why does IOTstack use "bind mounts"?

Unless you understand the differences between Docker's mount types, it's difficult to see why you would want choose one type over the other. This gist attempts to explain the differences between Docker bind and volume mounts, and then answers the question of why IOTstack uses Docker bind mounts exclusively.

To set the scene, we are going to explore the entries in the bottom row of Figure 1:

Figure 1: Terminology

Linux bind mount
- about mount point terminology
Docker mount types
- Docker bind mount
- volume mount
  - named volume mount
  - anonymous volume mount
one bind mount to rule them all
- sharing - another furphy
why IOTstack uses Docker bind mounts
other resources

Linux bind mounts

First, a quick refresher on Linux bind mounts. These are normally just called "bind mounts" and I'm only using "Linux" as an adjective to disambiguate these from Docker bind mounts. By the time you reach the end of this gist, you will realise that this is a distinction without a difference because all the Docker mount types are actually implemented as "bind mounts". For now, please just take it on faith that using Linux and Docker as adjectives will assist the discussion.

At its simplest, a Linux bind mount is what you get by running:

$ sudo mount --bind «mountpoint» «target»

Try this experiment:

$ mkdir one two
$ touch one/this-is-one two/this-is-two
$ tree --noreport one two

The response from tree will be:

one
└── this-is-one
two
└── this-is-two

To summarise: you have two directories, each containing a file named for its parent directory.

Now let's create a Linux bind mount:

$ sudo mount --bind one two

$ tree --noreport one two
one
└── this-is-one
two
└── this-is-one

What has happened is that both the one and two directories have the same content. When a directory is used as the «target» of a mount --bind operation, its own content becomes unavailable.

You can tell if a directory is being used as a mount point with the mountpoint command:

$ mountpoint one
one is not a mountpoint

$ mountpoint two
two is a mountpoint

If the mountpoint command reports that a directory is a mount point and you want to know which file system it points to, you can use the findmnt command:

$ findmnt --mountpoint two
TARGET       SOURCE                  FSTYPE OPTIONS
/home/pi/two /dev/sda1[/home/pi/one] ext4   rw,relatime,errors=remount-ro

Although the original content of a directory which is being used as a «target» is unavailable, it is not deleted. As soon as you remove the mount, the original content becomes available again:

$ sudo umount two

$ tree --noreport one two
one
└── this-is-one
two
└── this-is-two

about mount point terminology

The eagle-eyed reader will probably have spotted some inconsistencies in the terminology associated with mount points. Many examples ^{1, 2, 3} you find on the web read like this:

Create a directory to act as a mount point; then
Run the mount command to mount an existing file system at the mount point.

This wording implies that the directory created in step 1 is the mount point, while the file system it points to doesn't really have a conceptual name. It's just "a file system" you want mounted "at the mount point".

The inline help and man pages for Linux commands take differing views:

The mount command refers to «mountpoint» (a reference to an existing file system) and «target» (the directory where the existing file system should be mounted).

This is an inversion of the usual meaning of "target". Instead of a «target» being something to be pointed at, it is the «target» which is doing the job of pointing at the existing file system.
The mountpoint command inverts the terminology established by the mount command. If you pass to the mountpoint command what the mount command calls a:
- «target» directory, it reports "is a mountpoint"
- «mountpoint» directory, it reports "is not a mountpoint"
The findmnt command is consistent with mount in its use of "TARGET" but uses "SOURCE" for what mount calls a «mountpoint».
The usage for the umount command describes the parameter as either a «source» or «directory» but, in each case, it is actually referring to what both mount and findmnt call the «target».

Confused? You're not alone! Unfortunately, there are no useful heuristics for any of this. You just have to learn by rote.

Docker mount types

Now that you've understood how Linux bind mounts are constructed, let's turn to Docker.

A Docker container is best visualised as a small self-contained computer, complete with its own operating system. All file-system operations performed by processes running inside a container are confined to the container (like a sandbox). By default, any changes made to a container’s internal file-system are lost when the container terminates (sort of like a RAM disk evaporating when a computer shuts down).

If changes made to a container’s internal file-system need to persist across container recreations then you need to tell Docker which parts of the container's internal file-system should persist. You can do this via one or more uses of:

the -v flag on the docker run command; or
an entry in a volumes: clause in the container's service definition in your docker-compose.yml.

IOTstack is a docker compose environment so I'm going to focus on volumes: clauses. The basic syntax is:

volumes:
  - «externalReference»:«internalPath»
  - ...

Each line in a volumes: clause describes a mapping between an absolute path inside the container (the «internalPath») and a storage location outside the container (the «externalReference»).

Strictly speaking, a one-line mapping is known as "short syntax". There is also a multi-line form which lets you set more options. See volumes.

Each mapping between a path inside a container and a storage location outside the container is usually referred to as a "mount". Docker supports multiple types of mounts but I am going to focus on these:

Docker bind mount
named volume mount
anonymous volume mount

Docker bind mount

Docker bind mounts are defined like this (short syntax):

---

services:

  example:
    container_name: example
    ...
    volumes:
      - /path/to/external:/path/to/internal

In words, the internal (container) path:

/path/to/internal

is persisted at the external (host) path:

/path/to/external

Internal paths are always absolute. External paths can either be absolute or relative to the directory containing the compose file. For example, IOTstack uses this convention:

- ./volumes/«containerName»/dir:/path/to/dir

The leading . implies "the path to the directory containing docker-compose.yml" so, in most cases, the left hand side will expand to:

~/IOTstack/volumes/«containerName»/dir

Here's a practical example using Mosquitto's service definition:

  mosquitto:
    container_name: mosquitto
    ...
    volumes:
      - ./volumes/mosquitto/config:/mosquitto/config
      - ./volumes/mosquitto/data:/mosquitto/data
      - ./volumes/mosquitto/log:/mosquitto/log
      - ./volumes/mosquitto/pwfile:/mosquitto/pwfile

It should be apparent that the four internal (container) directories (config, data, log and pwfile) are going to be persisted in the following external (host) directory structure:

/home/pi
└── IOTstack
    └── volumes
        └── mosquitto
            ├── config
            ├── data
            ├── log
            └── pwfile

When you ask docker compose to start a container, it performs the following steps for each Docker bind mount:

If the external path does not exist, it is created with root ownership using the equivalent of:
```
$ sudo mkdir -p /path/to/external
```
If the internal path does not exist, it is created using the type of the external path as a guide. For example:
- if the external path is a directory:
```
 $ docker exec «container» mkdir -p /path/to/internal
```
- if the external path is a file:
```
 $ docker exec «container» touch /path/to/internal
```
A Linux bind mount association is created between the two:
```
$ docker exec «container» mount --bind /path/to/external /path/to/internal
```
You can’t actually run that last command yourself. It’s only intended to give you an idea about how the Linux bind mount association is formed.

Go back and focus on step 1 for a moment because it contains an important implication. If the external path does not exist, Docker always creates a directory. Docker does that without having any regard to whether the internal path exists and, if so, whether it is a file rather than a directory. Taken together, steps 1 and 2 mean that there are three situations where a Docker bind mount can fail and docker compose will refuse to launch the container:

The external path is missing while the internal path is a file.
The external path is a directory while the internal path is a file. This is actually the same situation as the missing external path where docker-compose automatically creates a directory.
The external path is a file while the internal path is a directory.

Every other combination of external and internal paths will succeed in at least creating the Docker bind mount. However, if a process running inside the container is handed a file when it is expecting a directory, or vice versa, the process will usually abort and send the container into a restart loop.

Having said all that, you can use Docker bind mount syntax to map files into containers and the web is full of service definitions where that is done. One very common (and almost always totally unnecessary) example is:

volumes:
  - /etc/timezone:/etc/timezone

Unless something has gone seriously wrong with your host system, the external path /etc/timezone is pretty much guaranteed to exist. docker compose will note that the external path leads to a file and, providing the internal path either does not exist or exists as a file, will create a Linux bind mount for a file.

And that brings me to the material point. If you want to use Docker bind mount syntax to map files into a container then you must take responsibility for guaranteeing that the external file path exists every time the container launches. docker compose has no mechanism for automating this. You have to do it yourself.

There is one other implication. Refer back to the earlier discussion on Linux bind mounts and how a side-effect of the mount command means any content of the «target» directory becomes unavailable for the duration of the mount. The same happens with containers. If a container's image includes pre-existing content at an internal path which is the subject of a mount, that content becomes unavailable for the duration of the mount (ie the life of the container).

volume mount

named volume mount

In their simplest form, named volume mounts are defined like this (short syntax):

---

services:
  example:
    volumes:
      - «handle»:/path/to/internal

volumes:
  «handle»:

In words, the absolute path inside the example container at:

/path/to/internal

is persisted at an external (host) location which is managed by Docker. The volume «handle» is considered to be local in scope to the compose file. However, named volume mounts actually have a global scope so Docker constructs a volume name for each named volume mount by concatenating the project name with the «handle», as in:

«project»_«handle»

The project name is the all-lower-case representation of the name of the directory containing the compose file. For example, given both of the following:

the directory structure:

~/IOTstack/
└── docker-compose.yml

this top-level volumes definition within the compose file:
```
volumes:
  mydatabase:
```

then the resulting volume name would be:

iotstack_mydatabase

Alternatively, you can supply a name for your volume via a name: clause:

volumes:
  «handle»:
    name: «volumeName»

In this case, the volume is given the «volumeName» you supply and you are responsible for managing any name-space collisions.

In the remainder of this discussion, «name» encompasses both:

«project»_«handle»; and
«volumeName».

When you ask docker compose to start a container, it does the following steps for each named volume mount:

If the named volume mount does not exist then:
- a storage location is created with root ownership. On Linux, it is the equivalent of:
```
 $ sudo mkdir -p /var/lib/docker/volumes/«name»/_data
```
  That path is then wrapped with other metadata which you can inspect like this:
```
 $ docker volume ls
 DRIVER    VOLUME NAME
 local     «name»

 $ docker volume inspect «name»
 [
     {
         "CreatedAt": “timestamp”,
         "Driver": "local",
         "Labels": {
             "com.docker.compose.project": "«project»",
             "com.docker.compose.version": "x.y.z",
             "com.docker.compose.volume": "«handle»"
         },
         "Mountpoint": "/var/lib/docker/volumes/«name»/_data",
         "Name": "«name»",
         "Options": null,
         "Scope": "local"
     }
 ]
```
- If the internal path exists, its content is copied to the external storage location. The equivalent command would be:
```
 $ docker cp -a «container»:/path/to/internal /var/lib/docker/volumes/«name»/_data
```
  Key points:
  - This copying only happens when a named volume mount has just been created. It is a one-time-only initialisation!
  - At the time of writing (July 2024), the Dockerfile documentation says:
    - Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.
    This does not seem to be correct. My testing shows that content added in a Dockerfile either after the VOLUME statement or in the absence of any VOLUME statement, persists into volume mounts.
If the internal path does not exist, it is created using the type of the external path as a guide. Because step 1 always creates a directory, the internal path will also always be a directory so the equivalent command would be:
```
$ docker exec «container» mkdir -p /path/to/internal
```
I know of no way to use named volume mount syntax to map a file into a container (nor any reason why you might want to do that).
A Linux bind mount association is created between the two:
```
$ docker exec «container» mount --bind /var/lib/docker/volumes/«name»/_data /path/to/internal
```
As before, you can’t actually run that command yourself. It’s only intended to give you an idea about how the mount association is formed.

Note:
- If /path/to/internal is a file, the mount will fail and the container will not start.

anonymous volume mount

Anonymous volume mounts are the result of the following preconditions:

In the Dockerfile used to build the image from which the container is instantiated, /path/to/internal was the subject of a VOLUME statement. It does not matter whether the directory exists or has any content.
In the compose file, /path/to/internal is not mentioned on the right hand side of a volumes clause within the container's service definition.

When you launch the container, docker compose treats the situation as though you had defined it like this (short syntax):

---

services:
  example:
    volumes:
      - «64-byte-hexadecimal-string»:/path/to/internal

volumes:
  «64-byte-hexadecimal-string»:
    name: "«64-byte-hexadecimal-string»"

You can identify anonymous volume mounts from both the long random hexadecimal string. For example:

$ docker volume ls
DRIVER    VOLUME NAME
local     e4adda851450b4faf14ee1d6288857ccfed3a418da27189c1202181ee9394f85

Anonymous volume mounts are mostly treated the same as named volume mounts and are initialised in the same manner. The main difference concerns long-term persistence. An anonymous volume mount:

persists across container restarts, re-creations and system reboots; but
does not persist across a container down/up.

The latter condition means you can wind up with dangling volumes. Those can be removed by running:

$ docker volume prune -f

If you spot anonymous volumes on your system, it can be tricky to figure out which container(s) they are coming from. Sometimes, inspecting the content of the volume may be revealing but, if not, the following script may help:

#!/usr/bin/env bash

SCRIPT="$(basename "$0")" # assumes "show_declared_volumes.sh"

# no arguments supported
if [ $# -gt 0 ] ; then
	echo -e "\nUsage: $SCRIPT"
	echo -e "\nReturns declared volumes for all images known to the host. Takes no arguments."
	exit 1
fi

# iterate all images on this system
curl -s --unix-socket /var/run/docker.sock http://localhost/images/json \
	| jq -r .[].RepoTags[0] \
	| while read IMAGE ; do
		echo "Declared VOLUMES for image \"$IMAGE\""
		docker image inspect "$IMAGE" \
			| jq .[].Config.Volumes \
			| sed -e "s/^/  /"
		echo ""
	done

That script iterates over all of the Docker images on your host. Where an image declares at least one VOLUME, the script displays the internal paths as an array, otherwise reporting null.

It is then a matter of looking through your compose file to find which declared VOLUME isn't mentioned in a volumes: clause of its corresponding service definition.

one bind mount to rule them all

You can use the mountpoint and findmnt commands to explore how paths inside containers are associated with persistent stores outside the container. These are the patterns you can expect to see in the SOURCE column:

Docker bind mount
- /dev/«partition»[«/path/to/external»]
named volume mount
- «volumeName» syntax
  - /dev/«partition»[/var/lib/docker/volumes/«volName»/_data]
- «project»_«handle» syntax
  - /dev/«partition»[/var/lib/docker/volumes/«project»_«handle»/_data]
anonymous volume mount
- /dev/«partition»[/var/lib/docker/volumes/«64-byte-hexadecimal-string»/_data]

One fact should be apparent: when everything is said and done, the various types of Docker mounts discussed in this gist (named, anonymous, bind) are all implemented as Linux bind mounts. Or, to put it slightly differently:

A volume mount (named or anonymous) is just a Linux bind mount where Docker chooses the storage location; and
A Docker bind mount is just a Linux bind mount where you choose the storage location.

Now, apply that knowledge to how Docker wires-up internal and external paths. A Linux bind mount implies that any content at the internal path (the «target») becomes unavailable for the duration of the mount which, for a container, is from "up" to "down". The only wrinkle is the extra step for volume mounts where Docker copies the content of the internal path to the external path before it creates the Linux bind mount. This is summarised in Table 1:

Table 1: Docker Volume vs Bind Mounts

This copying only occurs when the volume mount is first created. If anything critical to the operation of a container is somehow removed from the scope of a volume mount, Docker does not repair the damage the next time the container is launched. There are only two ways in which missing content can be repaired:

Erase the container's persistent store to obtain a clean slate; or
The container must take responsibility for self-repair.

It is also implied that erasing the persistent store to let Docker re-copy the initial content only works for volume mounts because the copy step shown in Table 1 does not occur for Docker bind mounts

However, adding simple "self-repair" code to a container can actually take care of properly initialising and self-repairing both volume mounts and Docker bind mounts. Here are some practical examples:

sharing - another Furphy

One other Furphy that springs up in the context of discussions on Docker mounts is that volume mounts somehow support sharing between containers while Docker bind mounts don't.

Utter claptrap!

You've just learned that everything boils down to Linux bind mounts. Those are used all over the place in Linux. To believe that there is something somehow different about processes running inside containers which somehow precludes them from having shared, parallel access to the directories and files in a file system is akin to magical thinking. There is nothing special about processes running inside Docker containers. They are just processes. They turn up in the ps list on the host.

why IOTstack uses Docker bind mounts

To be clear, there is nothing wrong with volume mounts. They have features not found in Docker bind mounts, such as the ability to use network shares for persistent storage. It's equally true to say that anything you can do with a volume mount can also be done with a Docker bind mount; you just need to do the setup work yourself instead of letting Docker take care of the details.

Here's the problem. The way IOTstack is set up, everything lives under ~/IOTstack. That means that, at a pinch, you can perform a backup by doing this:

$ cd ~/IOTstack
$ docker compose down
$ touch ../mybackup.tar.gz
$ sudo tar -czf ../mybackup.tar.gz .
$ docker compose up -d

Armed with that backup file, you can restore like this:

$ sudo rm -rf ~/IOTstack
$ mkdir ~/IOTstack
$ cd ~/IOTstack
$ sudo tar -x --same-owner -z -f ../mybackup.tar.gz
$ docker compose up -d

As soon as you bring volume mounts into the equation, that strategy no longer works. It's true that you can iterate through docker volume ls, find the path to each mount-point folder, and then include those in a backup. But getting things into a backup is only half the story. To perform a restore, you also need to create and populate those volumes and you need to do it before the container comes up.

Docker creates the volumes but it leaves the and populate to you.

In essence, if you care about the restore side of backup-and-restore, you have to do all the work of figuring out a solution. And you need a separate "solution" for every container you run.

Docker bind mounts are simple. The data is where you can see it. It can be backed-up and restored and you can easily prove that the totality of persistent stores survive the round trip.

What's not to like?

And that's why IOTstack uses Docker bind mounts. Exclusively.

To mis-quote Chester James Carville Jr.: "It's the restore, stupid!"

Paraphraser/Docker-volumes.md