In this short experiment, we are going to verify how containers really work. We'll check that containers are nothing else than Linux processes running on a host machine. These processes are isolated from the host machine and from each other by Linux namespaces and they also have their resources constrained/limited by control groups also know as cgroups - a Linux kernel feature that allows processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored.
Note: To follow along with this experiment you need to run the commands on a Linux machine. If you are on Mac or Windows you can use Vagrant to quickly spin up a Ubuntu machine. In this gist I share a
Vagrantfile
that will start a Ubuntu VM and install Docker for you.
Let's start by creating a container in the host machine and connecting to its shell
$ vagrant@node1:~$ docker run -it --rm --name my-container busybox /bin/sh
Now from inside the container let's create a file and start a process
$ echo > my_file_xpto
$ sleep 10000
^Z[1]+ Stopped sleep 1000
$ ps
PID USER TIME COMMAND
1 root 0:00 /bin/sh
7 root 0:00 sleep 1000
8 root 0:00 ps
In a different terminal window, connected to the host machine and check that the process running "inside" the container is visible from the host but has a different pid. I'm quoting "inside" because as we will see there isn't really an inside.
$ vagrant@node1:~$ ps -C sleep
PID TTY TIME CMD
15962 pts/0 00:00:00 sleep
Ok, now what if we can find the file we created in the container from the host
$ vagrant@node1:~$ sudo find / -name my_file_xpto
/var/lib/docker/overlay2/caeb1bdf150b03733d7920def5866ec1dcfc38fe61176a89eca218e3bb088f2e/merged/my_file_xpto
/var/lib/docker/overlay2/caeb1bdf150b03733d7920def5866ec1dcfc38fe61176a89eca218e3bb088f2e/diff/my_file_xpto
This folder structure can be different depending on your Docker version. But what is really interesting to note is that, from the host, we can see processes running whithin the container and we can also see the container's file system.
Now let's finally check that the container is nothing else than a process running on the host machine. For that we will retrieve the container pid and apply on it, using nsenter, the same set of namespaces configuration that Docker has defined for the process.
$ vagrant@node1:~$ docker inspect my-container --format {{.State.Pid}}
15782
$ vagrant@node1:~$ sudo nsenter --target 15782 --mount --uts --ipc --net --pid /bin/sh
$
It give us a shell within the container. We can now check and interact with its file system, network, execute new process, etc. Compare these outputs with the host machine and the container we launched with docker run
.
$ ls -l
drwxr-xr-x 2 root root 12288 May 13 02:35 bin
drwxr-xr-x 5 root root 360 May 17 02:33 dev
drwxr-xr-x 1 root root 4096 May 17 02:33 etc
drwxr-xr-x 2 nobody nogroup 4096 May 13 02:35 home
-rw-r--r-- 1 root root 1 May 17 03:34 my_file_xpto
dr-xr-xr-x 112 root root 0 May 17 02:33 proc
drwx------ 1 root root 4096 May 17 03:34 root
dr-xr-xr-x 13 root root 0 May 17 02:33 sys
drwxrwxrwt 2 root root 4096 May 13 02:35 tmp
drwxr-xr-x 3 root root 4096 May 13 02:35 usr
drwxr-xr-x 4 root root 4096 May 13 02:35 var
$ ps
PID USER TIME COMMAND
1 root 0:00 /bin/sh
7 root 0:00 sleep 1000
13 root 0:00 /bin/sh
15 root 0:00 ps
$ hostname
93245b7721ac
$ ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02
inet addr:172.17.0.2 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:28 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2096 (2.0 KiB) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Now lets take a quick look at the cgroups. Lets start by starting the container again by limiting the memory (you need to exit all the running container terminals).
$ vagrant@node1:~$ docker run -it --rm --memory 300m --name my-container busybox /bin/sh
We can see the cgroups set for this contanier from the host
$ vagrant@node1:~$ docker inspect my-container --format {{.Id}}
d0278d99b66ae7582d8975c8058ba7cc239ff422f392e0ec5df1c5de9ae370cc
$ vagrant@node1:~$ cat /sys/fs/cgroup/memory/docker/d0278d99b66ae7582d8975c8058ba7cc239ff422f392e0ec5df1c5de9ae370cc/memory.limit_in_bytes | numfmt --to=iec-i
500Mi
Now from inside the container lets check the memory
$ free -m
total used free shared buff/cache available
Mem: 985 250 139 5 594 650
Swap: 979 0 979
Why doesn't it show the 500m
value we defined when we started the container? What's gonna happen when your container reaches the limit? I won't answer that question 😄 I'll let that for you to check here.
Recommended reading