title | author | ENVBOX_VERSION | RKE2_VERSION | CNI_USED | OS_AND_KERNEL |
---|---|---|---|---|---|
Coder Kubernetes Docker-in-Docker Setup with RKE2 |
joshyorko, [@joshyorko](https://github.com/joshyorko), [email protected] |
0.6.1 |
v1.30.6 |
Flannel |
Ubuntu Focal (20.04), Kernel 5.15.0 |
TLDR:
- Use Rancher MCM to manage RKE2 clusters for DinD workloads.
- Start with Flannel for simplicity
- Before provisioning ensure your cloud config updates the kernel and installs
libseccomp2
- RESTART YOUR NODES AFTER KERNEL UPGRADE
- Install a load balancer service for coder like metallb and set ip pool or use sslip.io with the vip of the node that coder is installed on to access the coder service
- Get the envbox terraform template from CODER ENV BOX TEMPLATE REPO
- Pin the envbox version in the terraform template to a stable version
- Build template from scratch and copy pasta the template from CODER
- Create workspace and test
helm repo add bitnami https://charts.bitnami.com/bitnami --force-update
helm upgrade -i coder-db bitnami/postgresql -n coder --create-namespace --set auth.username=coder --set auth.password=coder --set auth.database=coder --set persistence.size=10Gi
kubectl create secret generic coder-db-url -n coder --from-literal=url="postgres://coder:[email protected]:5432/coder?sslmode=disable"
helm repo add coder-v2 https://helm.coder.com/v2 --force-update
cat << EOF >> values.yaml
coder:
env:
- name: CODER_PG_CONNECTION_URL
valueFrom:
secretKeyRef:
name: coder-db-url
key: url
- name: CODER_ACCESS_URL
value: "http://coder.NODE_OR_LB_IP.sslip.io"
ingress:
enable: true
host: "coder.NODE_OR_LB_IP.sslip.io"
EOF
helm upgrade -i coder coder-v2/coder --namespace coder --values values.yaml
This document provides a comprehensive guide to setting up a Kubernetes cluster optimized for Docker-in-Docker (DinD) workloads, specifically for deploying Coder workspaces requiring Docker access. We utilize Rancher Multi-Cluster Management (MCM) to configure and manage an RKE2 cluster, test both Flannel and Calico CNI, and ensure our environment meets the kernel and libseccomp
requirements for tools like Sysbox and Envbox. Additionally, we pin an older, stable version of Envbox rather than using the latest
tag to ensure greater reliability.
What We Did:
- Leveraged Rancher Multi-Cluster Management to provision and manage RKE2 clusters. This allows a centralized, UI-driven approach to deploy and control multiple RKE2 clusters from a single Rancher interface.
- Initially, we configured an RKE2 cluster running v1.29 with Flannel as the default CNI.
- We plan to later test Calico on the latest Rancher-supported RKE2 version (e.g., v1.30.6 or newer).
Why Rancher MCM?
- Simplifies cluster lifecycle management (provisioning, upgrading, scaling).
- Provides a standardized approach to apply config changes and manage multiple clusters consistently.
- Integrates directly with RKE2, allowing easy switch between different CNI options (e.g., Flannel).
Operating System & Kernel:
- OS: Ubuntu Focal (20.04)
- Kernel: Upgraded to
5.15.0
or newer- Meets the
>=5.7
kernel requirement for seccomp notifications. - Eliminates the need for shiftfs when using Sysbox.
- Ensures compatibility with Envbox and Docker-in-Docker scenarios.
- Meets the
Master Nodes:
- 4 CPUs
- 4 GB RAM
- 100 GB bootable disk
Worker Nodes:
- 8 CPUs
- 24 GB RAM
- 100 GB bootable disk
Additional Configuration:
- Verified that
libseccomp2 >= 2.5.0
is installed, ensuring seccomp notification compatibility.
Why Flannel Initially?
- Flannel provides a simple, minimal overlay network with fewer moving parts.
- Ideal starting point for testing RKE2 DinD workloads without complex network policies.
The bulk of the setup for each node happens at provisioning time via cloud-init. This ensures each node:
- Installs the correct kernel for Sysbox/Envbox compatibility.
- Updates packages, installs
libseccomp2
if necessary. - Prepares the node for joining the RKE2 cluster managed by Rancher MCM.
Example cloud-config
(adapt as needed):
#cloud-config
disable_root: false
package_update: true
packages:
- linux-generic-hwe-20.04 # Ensures kernel >=5.15
- libseccomp2 # Ensures seccomp compatibility
runcmd:
- [ apt-get, update ]
- [ apt-get, install, -y, linux-generic-hwe-20.04, libseccomp2 ]
- [ update-grub ]
- [ reboot ]
ssh_pwauth: true
users:
- name: root
hashed_passwd: <root-hash>
lock_passwd: false
shell: /bin/bash
ssh_authorized_keys:
- ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA...
- name: kdlocpanda
hashed_passwd: <user-hash>
lock_passwd: false
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA...
ssh_authorized_keys:
- ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA...
This ensures each newly created node meets kernel requirements and is ready for RKE2 and Sysbox.
The default Coder templates often reference envbox:latest
, which can lead to unpredictable behavior if the image updates. Instead, pin a specific stable version:
Example Terraform Snippet (Coder Template):
resource "kubernetes_pod" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-${lower(data.coder_workspace_owner.me.name)}-${lower(data.coder_workspace.me.name)}"
namespace = var.namespace
}
spec {
restart_policy = "Never"
container {
name = "dev"
# We highly recommend pinning this to a specific release of envbox, as the latest tag may change.
image = "ghcr.io/coder/envbox:0.6.1"
image_pull_policy = "Always"
command = ["/envbox", "docker"]
security_context {
privileged = true
}
Replace latest
with a known-good version tested in your environment. This approach prevents breakage due to unexpected image updates and ensures reproducible results.
- Performance: Quick and easy to set up, stable for smaller clusters.
- Compatibility: No significant issues running DinD with Sysbox and pinned Envbox image.
- With the
5.15
kernel andlibseccomp2
in place, Envbox ran smoothly. - Seccomp notifications worked as intended, allowing system-level containers to run Docker-in-Docker workloads efficiently.
By using Rancher Multi-Cluster Management for RKE2 provisioning and carefully preparing nodes via cloud-init, we established a Kubernetes environment optimized for DinD workloads. Pinning an older Envbox version ensures stability over time, while exploring Flannel vs. Calico networking offers insights into balancing simplicity against advanced network features.
Note: The advanced features of Calico might have played a role in the failures observed during initial Sysbox testing. Eventually, transitioning to a managed image by Envbox that had Sysbox and CRI-O preinstalled on the image provided a more stable environment. Initial testing was done on older versions of Envbox and then worked up to the most stable version at 0.6.1.
This approach provides a reproducible, scalable foundation for deploying Coder workspaces with dependable Docker access in Kubernetes.