Part of collection: Hyper-converged Homelab with Proxmox
After struggling for some days, and since I really needed this to work (ignoring the it can't be done vibe everywhere), I managed to get Docker to work reliable in privileged Debian 12 LXC Containers on Proxmox 8
(Unfortunately, I couldn't get anything to work in unprivileged LXC Containers)
There are NO modifications required on the Proxmox host or the /etc/pve/lxc/xxx.conf
file; everything is done on the Docker Swarm host. So the only obvious candidate who could break this setup, are future Docker Engine updates!
My host are Debian 12 LXC containers, installed via tteck's Proxmox VE Helper Scripts
bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/ct/debian.sh)"
Docker info shows i'm using overlay2, this is the recommended storage driver for Debian. This storage driver requires XFS or EXT4 as backing file system.
docker info | grep -A 7 "Storage Driver:"
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
As Neuer_User pointed out, running the Docker Containers unprivileged on a privileged
LXC seems the best compromise to run the containers in a relative secure way.
To do so, add a daemon.json on the Docker Servers that are part of the Swarm.
mkdir /etc/docker
nano /etc/docker/daemon.json
{
"userns-remap": "root"
}
And reboot reboot
the Docker Host.
(This moves everything below /var/lib/docker/ to the folder /var/lib/docker/0.0/ existing workload disappear, hence it's a step pre Docker installation!)
The get-docker.sh script is the most convenient way to quickly install the latest Docker-CE release!
curl -fsSL https://get.docker.com -o get-docker.sh
chmod +x get-docker.sh
./get-docker.sh
Without this step, the next step(s) fail!
# Manager Node
docker swarm init
# Add Node
docker swarm join --token <some-very-long-token>
# Display Join token again
docker swarm join-token worker
docker swarm join-token manager
For Docker in LXC to work, the only thing needed is to execute:
nsenter --net=/run/docker/netns/ingress_sbox sysctl -w net.ipv4.ip_forward=1
on the Docker Swarm Servers
This doesn't survive reboots, so I created an oneshot systemd service for it, to make sure that after each reboot the setting is applied.
First, we need a Bash script to be executed by the service.
nano /usr/local/bin/ipforward.sh
#!/bin/bash
nsenter --net=/run/docker/netns/ingress_sbox sysctl -w net.ipv4.ip_forward=1
chmod +x /usr/local/bin/ipforward.sh
This service is of the type oneshot
, during startup it waits for the docker.service
to be started, and then 10 seconds for run-docker-netns-ingress_sbox.mount
to be loaded. Only after that net.ipv4.ip_forward=1
can be applied.
nano /etc/systemd/system/ingress-sbox-ipforward.service
[Unit]
Description = Set net.ipv4.ip_forward for ingress_sbox namespace
After = docker.service
Wants = docker.service
[Service]
Type = oneshot
RemainAfterExit = yes
ExecStartPre = /bin/sleep 10
ExecStart = /usr/local/bin/ipforward.sh
[Install]
WantedBy = multi-user.target
systemctl daemon-reload
systemctl enable ingress-sbox-ipforward.service
systemctl start ingress-sbox-ipforward.service
systemctl status ingress-sbox-ipforward.service
Without ipv4.ip_forward set to 1, the Ingress Networking to the Docker Swarm is not active. So it's important to verify if the value is applied successfully.
systemctl status ingress-sbox-ipforward.service | grep ipforward.sh
# Or in a script via:
current_value=$(nsenter --net=/run/docker/netns/ingress_sbox sysctl -n net.ipv4.ip_forward)
echo $current_value
(Now, Docker in LXC seems to behave as Docker in a VM.)
To fix this, it’s needed to add a hostname entry for each swarm service, to make it more logical I also add a prefix service to the service names.
services:
service_nginx: # Prefix service_
image: nginx
hostname: nginx
- Proxmox Forum Topic about this subject.
Did you end up moving over to vm's with the virtiofs option (i see the heading there on your main page). I may switch to that if it has less headaches than running swarm in lxc. I find the creation of the vm's much simpler for lxc than qemu vms (i currently use ansible to automate the lot for lxc), however, if i switch to vm's i'll need to hook in packer to create a template first before cloning the vm's. anyhooo, i might switch to vm's and look into virtiofs for the cephfs shared filesystem, this is where i hoped the bind mounts for lxc would have sufficed...
no need to do a restore, thanks for commenting though!
EDIT:
For reference, this is the error message I get from the portainer app when the app fails over to another node while testing: