Skip to content

Instantly share code, notes, and snippets.

@dghubble
Last active November 10, 2024 06:07
Show Gist options
  • Save dghubble/c2dc319249b156db06aff1d49c15272e to your computer and use it in GitHub Desktop.
Save dghubble/c2dc319249b156db06aff1d49c15272e to your computer and use it in GitHub Desktop.
Running QEMU/KVM and Nested Kubernetes on Bare-Metal Kubernetes

%title: Kubeception %author: @dghubble

// Youtube: https://www.youtube.com/watch?v=tlUiQa2JYQU

-> Kubeception

-> Experiments with QEMU/KVM on Kubernetes <-

-> Dalton Hubble <- -> @dghubble <-


-> QEMU/KVM

  • QEMU is an open-source machine emulator and virtualizer
  • Combined with KVM, it runs virtual machines with almost natve speeds
  • KVM (kernel-based Virtual Machines) is a kernel feature and kernel module
  • Exposes /dev/kvm interface so userspace programs can use processor virtualization features

-> Using QEMU/KVM

Typically you'd run QEMU/KVM VMs on a Linux host (laptop, CI, etc.)

  • Container Linux docs on running under QEMU/KVM VMs.
  • Testing CoreOS matchbox

You can also run QEMU/KVM VMs on a bare-metal Kubernetes cluster...


-> DEMO: QEMU/KVM in Alpine

Run a privileged alpine container on a bare-metal Kubernetes cluster.

kubectl create -f alpine/deployment.yaml

Snippet from deployment.yaml

containers:
  - name: alpine
    image: alpine:3.5
    securityContext:
      privileged: true
    command:
      - sh
      - -c
      - "echo Hello; sleep 36000"

-> Privileged

The privileged securityContext maps to the docker privileged flag, which is a mode to allow a pod to access the host's device files.

kubectl exec -it alpine-12345 /bin/ash

Look at devices files and find /dev/kvm is available.


-> Install QEMU

Let's install qemu-system-x86_64 and a few dependencies,

apk add --update qemu-system-x86_64 bzip2 wget

-> Launch a VM

Download a Container Linux image.

wget https://stable.release.core-os.net/amd64-usr/current/coreos_production_qemu_image.img.bz2

Decompress the bz2 image.

bzip2 -d coreos_production_qemu_image.img.bz2

Start a QEMU/KVM instance.

qemu-system-x86_64 -m 1024 -enable-kvm -hda coreos_production_qemu_image.img -nographic

-> Container Linux "Image"

Build and publish a container image for Container Linux.

Build

  • Install QEMU/KVM
  • Download a Container Linux image
  • Add any tools or utilities

Run

  • Setup desired features for your guest VM
  • Resize image to a desired disk size
  • Launch QEMU/KVM VM with desired cpu/memory

-> Tips: Networking

QEMU has a hostfwd option which forwards local ports to guest ports.

hostfwd=[tcp|udp]:[hostaddr]:hostport-[guestaddr]:guestport
           Redirect incoming TCP or UDP connections to the host port hostport to the guest IP
           address guestaddr on guest port guestport.

For example, hostfwd=tcp::2222-:22 will allow you to SSH from host to guest.

ssh -p 2222 localhost

-> Provisioning: Container Linux Config

Container Linux accepts Container Linux Configs (indirectly).

  • Declarative YAML file
  • Provisions disks during early boot
    • Create partitions
    • Write files (systemd units, networkd units, configs)
    • Configure users
  • Caveat: Convert to machine-readable Ignition first

-> Example 1

Add an SSH public key for user "core".

passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - "ssh-rsa blah"

-> Example 2

Systemd should run the etcd2.service.

systemd:
  units:
    - name: etcd2.service
      enable: true
      dropins:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
            Environment="ETCD_NAME=node0"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=http://127.0.0.1:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://127.0.0.1:2380"
            Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
            Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380"
            Environment="ETCD_INITIAL_CLUSTER=node0=http://127.0.0.1:2380"
            Environment="ETCD_STRICT_RECONFIG_CHECK=true"

-> Tips: QEMU Firmware Config

QEMU has a fw_cfg option which allows a file to be passed to the guest.

fw_cfg [name=]name,file=file
       Add named fw_cfg entry with contents from file file. The fw_cfg entries are passed
       by QEMU through to the guest.

Container Linux can read from the QEMU firmware config device to get user-data.

-fw_cfg name=opt/com.coreos/config,file="${PWD}/ignition.ign" "$@"

Trick

Container accepts a Container Linux config, convert to Ignition. Pass into guest via fw_cfg to configure the VM.

./ct -in-file $CONTAINER_LINUX_CONFIG_FILE -out-file ${PWD}/ignition.ign

-> coreos-kvm

Nightly Jenkins pipeline publishes quay.io/dghubble/coreos-kvm.

Example

quay.io/dghubble/coreos-kvm:stable-1353.7.0

Environment Variables

  • CONFIG_FILE - provide a Container Linux Config
  • IGNITION_CONFIG_FILE - provide a raw Ignition Config
  • CLOUD_CONFIG_FILE - provide a Cloud-Config
  • VM_NAME - name of the VM
  • VM_MEMORY - amount of VM RAW (4G)
  • VM_DISK_SIZE - size of VM disk (12G)
  • HOSTFWD - port forwards (hostfwd=tcp::2222-:22)

-> DEMO: coreos-kvm

Create a "VM pod" with user-data in a ConfigMap.

kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml

Access the Container Linux VM via the service's cluster IP.

kubectl get service coreos-kvm
ssh [email protected]

                           +-----------+
                           |           |
Service in 10.3.0.0/16     |  Service  |   10.3.0.X:22
                           |           |
                           +-----------+
                                 |
                    +-------------------------+
                    |        Endpoints        | 10.2.0.X:2222
                    +-------------------------+
Pod in 10.2.0.0/16               |
                    +-------------------------+
                    |   coreos-kvm container  | 0.0.0.0:2222 local
                    |         "host"          | port forwards to
                    |  +-------------------+  | guest :22
                    |  |  Container Linux  |  |
                    |  |   QEMU/KVM guest  |  |
                    |  |                   |  |
                    |  +-------------------+  |
                    +-------------------------+

-> Applications

  • Jenkins executors/workers
  • Docker builds in a clean Container Linux env
  • Arbitrary VMs (QEMU can run almost anything)

-> Kubernetes in a VM

Goal: Single node Kubernetes

  • Write a Kubernetes deployment for a Container Linux QEMU/KVM VM
  • Write a Kubernetes configmap with a Container Linux Config
  • Write a Kubernetes service exposing 22 and 443
  • Add a DNS record resolving to the apiserver (for kubectl)

-> Demo

Create the configmap, deployment, and service.

cd k8s
kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml

Let's take a look at what we've created.


-> Deployment

  • Mounts the Container Linux Config
  • Adds port forwards from 2222 to 22 and 1443 to 443 from host to guest
image: quay.io/dghubble/coreos-kvm:stable-1353.7.0
env:
  - name: HOSTFWD
    value: "hostfwd=tcp::2222-:22,hostfwd=tcp::1443-:443"
  - name: CONFIG_FILE
    value: /userdata/config.yaml
ports:
  - name: apiserver
    containerPort: 1443
  - name: ssh
    containerPort: 2222
volumeMounts:
  - name: config-volume
    mountPath: /userdata

-> Service

  • Expose pod ports 2222 and 1443
  • Assign a fixed service IP (hacky).
kind: Service
metadata:
  name: coreos-k8s
spec:
  clusterIP: 10.3.0.50
  selector:
    name: coreos-k8s
  ports:
    - name: ssh
      port: 22
      targetPort: 2222
    - name: api
      port: 443
      targetPort: 1443

-> DNS, TLS, ConfigMap

Add a DNS record resolving to the service IP.

$ dig nested-k8s.lab.dghubble.io
10.3.0.50

Generate TLS certificates

./k8s-certgen -s nested-k8s.lab.dghubble.io \
  -m IP.1=10.3.0.1,DNS.1=nested-k8s.lab.dghubble.io

Write a Container Linux Config and place it in a Kubernetes ConfigMap.

  • Add systemd units for etcd, flanneld, and kubelet
  • Add TLS certificates (hacky: should be mouted as secrets into pod and then into guest)
  • Just modify matchbox examples

-> Fingers Crossed

Show the pod running the Container Linux VM.

kubectl get pods
kubectl get service coreos-k8s

Show that the pod is running a single-node Kubernetes inside.

KUBECONFIG=tls/kubeconfig
kubectl get nodes
kubectl get pods --all-namespaces

Let's make it more weird?

kubectl scale deployment coreos-k8s --replicas=3

-> Back to Reality

Applications

  • Develop and test federated Kubernetes
  • Provide a (nested) Kubernetes to each developer

Pros and Cons

  • Each "VM pod" is running qemu-system-x86 inside, baked into the image
  • Image must provide the features
    • Customizable cpu, memory, and disk size
    • Providing Container Linux configs to guest
    • Mounting volumes into guests from Kubernetes
    • Snapshots, migrations, etc.

Future?

  • rkt has an alternative stage 1 which can use QEMU/KVM
  • kubevirt
@mnlipp
Copy link

mnlipp commented Oct 31, 2023

Contribution to future: https://jdrupes.org/vm-operator/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment