title	author	ENVBOX_VERSION	RKE2_VERSION	CNI_USED	OS_AND_KERNEL
Coder Kubernetes Docker-in-Docker Setup with RKE2	joshyorko, [@joshyorko](https://github.com/joshyorko), [email protected]	0.6.1	v1.30.6	Flannel	Ubuntu Focal (20.04), Kernel 5.15.0

Coder Kubernetes Docker-in-Docker Setup with RKE2 (via Rancher Multi-Cluster Management)

TLDR:

Use Rancher MCM to manage RKE2 clusters for DinD workloads.
Start with Flannel for simplicity
Before provisioning ensure your cloud config updates the kernel and installs libseccomp2
RESTART YOUR NODES AFTER KERNEL UPGRADE
Install a load balancer service for coder like metallb and set ip pool or use sslip.io with the vip of the node that coder is installed on to access the coder service
Get the envbox terraform template from CODER ENV BOX TEMPLATE REPO
Pin the envbox version in the terraform template to a stable version
Build template from scratch and copy pasta the template from CODER
Create workspace and test

Clemenko GIST - TY SIR


helm repo add bitnami https://charts.bitnami.com/bitnami --force-update

helm upgrade -i coder-db bitnami/postgresql -n coder --create-namespace --set auth.username=coder --set auth.password=coder --set auth.database=coder --set persistence.size=10Gi

kubectl create secret generic coder-db-url -n coder --from-literal=url="postgres://coder:[email protected]:5432/coder?sslmode=disable"

helm repo add coder-v2 https://helm.coder.com/v2 --force-update

cat << EOF >> values.yaml
coder:
  env:
    - name: CODER_PG_CONNECTION_URL
      valueFrom:
        secretKeyRef:
          name: coder-db-url
          key: url


    - name: CODER_ACCESS_URL
      value: "http://coder.NODE_OR_LB_IP.sslip.io"

  ingress:
    enable: true
    host: "coder.NODE_OR_LB_IP.sslip.io"
EOF

helm upgrade -i coder coder-v2/coder --namespace coder --values values.yaml

Objective

This document provides a comprehensive guide to setting up a Kubernetes cluster optimized for Docker-in-Docker (DinD) workloads, specifically for deploying Coder workspaces requiring Docker access. We utilize Rancher Multi-Cluster Management (MCM) to configure and manage an RKE2 cluster, test both Flannel and Calico CNI, and ensure our environment meets the kernel and libseccomp requirements for tools like Sysbox and Envbox. Additionally, we pin an older, stable version of Envbox rather than using the latest tag to ensure greater reliability.

1. Cluster Setup

1.1. RKE2 via Rancher MCM

What We Did:

Leveraged Rancher Multi-Cluster Management to provision and manage RKE2 clusters. This allows a centralized, UI-driven approach to deploy and control multiple RKE2 clusters from a single Rancher interface.
Initially, we configured an RKE2 cluster running v1.29 with Flannel as the default CNI.
We plan to later test Calico on the latest Rancher-supported RKE2 version (e.g., v1.30.6 or newer).

Why Rancher MCM?

Simplifies cluster lifecycle management (provisioning, upgrading, scaling).
Provides a standardized approach to apply config changes and manage multiple clusters consistently.
Integrates directly with RKE2, allowing easy switch between different CNI options (e.g., Flannel).

1.2. System Requirements

Operating System & Kernel:

OS: Ubuntu Focal (20.04)
Kernel: Upgraded to 5.15.0 or newer
- Meets the >=5.7 kernel requirement for seccomp notifications.
- Eliminates the need for shiftfs when using Sysbox.
- Ensures compatibility with Envbox and Docker-in-Docker scenarios.

Master Nodes:

4 CPUs
4 GB RAM
100 GB bootable disk

Worker Nodes:

8 CPUs
24 GB RAM
100 GB bootable disk

Additional Configuration:

Verified that libseccomp2 >= 2.5.0 is installed, ensuring seccomp notification compatibility.

2. Rationale

Why Flannel Initially?

Flannel provides a simple, minimal overlay network with fewer moving parts.
Ideal starting point for testing RKE2 DinD workloads without complex network policies.

3. Configuration Details

3.1. Cloud-Init for Node Configuration

The bulk of the setup for each node happens at provisioning time via cloud-init. This ensures each node:

Installs the correct kernel for Sysbox/Envbox compatibility.
Updates packages, installs libseccomp2 if necessary.
Prepares the node for joining the RKE2 cluster managed by Rancher MCM.

Example cloud-config (adapt as needed):

#cloud-config
disable_root: false
package_update: true
packages:
  - linux-generic-hwe-20.04  # Ensures kernel >=5.15
  - libseccomp2              # Ensures seccomp compatibility

runcmd:
  - [ apt-get, update ]
  - [ apt-get, install, -y, linux-generic-hwe-20.04, libseccomp2 ]
  - [ update-grub ]
  - [ reboot ]

ssh_pwauth: true
users:
  - name: root
    hashed_passwd: <root-hash>
    lock_passwd: false
    shell: /bin/bash
    ssh_authorized_keys:
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA...

  - name: kdlocpanda
    hashed_passwd: <user-hash>
    lock_passwd: false
    shell: /bin/bash
    sudo: ALL=(ALL) NOPASSWD:ALL
    ssh_authorized_keys:
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA...

ssh_authorized_keys:
  - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA...

This ensures each newly created node meets kernel requirements and is ready for RKE2 and Sysbox.

3.2. Pinning an Older Envbox Version in the Coder Template

The default Coder templates often reference envbox:latest, which can lead to unpredictable behavior if the image updates. Instead, pin a specific stable version:

Example Terraform Snippet (Coder Template):

resource "kubernetes_pod" "main" {
  count = data.coder_workspace.me.start_count

  metadata {
    name      = "coder-${lower(data.coder_workspace_owner.me.name)}-${lower(data.coder_workspace.me.name)}"
    namespace = var.namespace
  }

  spec {
    restart_policy = "Never"

    container {
      name = "dev"
      # We highly recommend pinning this to a specific release of envbox, as the latest tag may change.
      image             = "ghcr.io/coder/envbox:0.6.1"
      image_pull_policy = "Always"
      command           = ["/envbox", "docker"]

      security_context {
        privileged = true
      }

Replace latest with a known-good version tested in your environment. This approach prevents breakage due to unexpected image updates and ensures reproducible results.

4. Outcome and Observations

4.1. Flannel Observations

Performance: Quick and easy to set up, stable for smaller clusters.
Compatibility: No significant issues running DinD with Sysbox and pinned Envbox image.

4.3. Sysbox and Kernel

With the 5.15 kernel and libseccomp2 in place, Envbox ran smoothly.
Seccomp notifications worked as intended, allowing system-level containers to run Docker-in-Docker workloads efficiently.

Conclusion

By using Rancher Multi-Cluster Management for RKE2 provisioning and carefully preparing nodes via cloud-init, we established a Kubernetes environment optimized for DinD workloads. Pinning an older Envbox version ensures stability over time, while exploring Flannel vs. Calico networking offers insights into balancing simplicity against advanced network features.

Note: The advanced features of Calico might have played a role in the failures observed during initial Sysbox testing. Eventually, transitioning to a managed image by Envbox that had Sysbox and CRI-O preinstalled on the image provided a more stable environment. Initial testing was done on older versions of Envbox and then worked up to the most stable version at 0.6.1.

This approach provides a reproducible, scalable foundation for deploying Coder workspaces with dependable Docker access in Kubernetes.

joshyorko/coder_docker_in_docker.md