%title: Kubeception %author: @dghubble
// Youtube: https://www.youtube.com/watch?v=tlUiQa2JYQU
-> Experiments with QEMU/KVM on Kubernetes <-
-> Dalton Hubble <- -> @dghubble <-
- QEMU is an open-source machine emulator and virtualizer
- Combined with KVM, it runs virtual machines with almost natve speeds
- KVM (kernel-based Virtual Machines) is a kernel feature and kernel module
- Exposes /dev/kvm interface so userspace programs can use processor virtualization features
Typically you'd run QEMU/KVM VMs on a Linux host (laptop, CI, etc.)
- Container Linux docs on running under QEMU/KVM VMs.
- Testing CoreOS matchbox
You can also run QEMU/KVM VMs on a bare-metal Kubernetes cluster...
Run a privileged alpine container on a bare-metal Kubernetes cluster.
kubectl create -f alpine/deployment.yaml
Snippet from deployment.yaml
containers:
  - name: alpine
    image: alpine:3.5
    securityContext:
      privileged: true
    command:
      - sh
      - -c
      - "echo Hello; sleep 36000"
The privileged securityContext maps to the docker privileged flag, which is a mode to allow a pod to access the host's device files.
kubectl exec -it alpine-12345 /bin/ash
Look at devices files and find /dev/kvm is available.
Let's install qemu-system-x86_64 and a few dependencies,
apk add --update qemu-system-x86_64 bzip2 wget
Download a Container Linux image.
wget https://stable.release.core-os.net/amd64-usr/current/coreos_production_qemu_image.img.bz2
Decompress the bz2 image.
bzip2 -d coreos_production_qemu_image.img.bz2
Start a QEMU/KVM instance.
qemu-system-x86_64 -m 1024 -enable-kvm -hda coreos_production_qemu_image.img -nographic
Build and publish a container image for Container Linux.
Build
- Install QEMU/KVM
- Download a Container Linux image
- Add any tools or utilities
Run
- Setup desired features for your guest VM
- Resize image to a desired disk size
- Launch QEMU/KVM VM with desired cpu/memory
QEMU has a hostfwd option which forwards local ports to guest ports.
hostfwd=[tcp|udp]:[hostaddr]:hostport-[guestaddr]:guestport
           Redirect incoming TCP or UDP connections to the host port hostport to the guest IP
           address guestaddr on guest port guestport.
For example, hostfwd=tcp::2222-:22 will allow you to SSH from host to guest.
ssh -p 2222 localhost
Container Linux accepts Container Linux Configs (indirectly).
- Declarative YAML file
- Provisions disks during early boot
- Create partitions
- Write files (systemd units, networkd units, configs)
- Configure users
 
- Caveat: Convert to machine-readable Ignition first
Add an SSH public key for user "core".
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - "ssh-rsa blah"
Systemd should run the etcd2.service.
systemd:
  units:
    - name: etcd2.service
      enable: true
      dropins:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
            Environment="ETCD_NAME=node0"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=http://127.0.0.1:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://127.0.0.1:2380"
            Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
            Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380"
            Environment="ETCD_INITIAL_CLUSTER=node0=http://127.0.0.1:2380"
            Environment="ETCD_STRICT_RECONFIG_CHECK=true"
QEMU has a fw_cfg option which allows a file to be passed to the guest.
fw_cfg [name=]name,file=file
       Add named fw_cfg entry with contents from file file. The fw_cfg entries are passed
       by QEMU through to the guest.
Container Linux can read from the QEMU firmware config device to get user-data.
-fw_cfg name=opt/com.coreos/config,file="${PWD}/ignition.ign" "$@"
Trick
Container accepts a Container Linux config, convert to Ignition. Pass into
guest via fw_cfg to configure the VM.
./ct -in-file $CONTAINER_LINUX_CONFIG_FILE -out-file ${PWD}/ignition.ign
Nightly Jenkins pipeline publishes quay.io/dghubble/coreos-kvm.
Example
quay.io/dghubble/coreos-kvm:stable-1353.7.0
Environment Variables
- CONFIG_FILE - provide a Container Linux Config
- IGNITION_CONFIG_FILE - provide a raw Ignition Config
- CLOUD_CONFIG_FILE - provide a Cloud-Config
- VM_NAME - name of the VM
- VM_MEMORY - amount of VM RAW (4G)
- VM_DISK_SIZE - size of VM disk (12G)
- HOSTFWD - port forwards (hostfwd=tcp::2222-:22)
Create a "VM pod" with user-data in a ConfigMap.
kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml
Access the Container Linux VM via the service's cluster IP.
kubectl get service coreos-kvm
ssh [email protected]
                           +-----------+
                           |           |
Service in 10.3.0.0/16     |  Service  |   10.3.0.X:22
                           |           |
                           +-----------+
                                 |
                    +-------------------------+
                    |        Endpoints        | 10.2.0.X:2222
                    +-------------------------+
Pod in 10.2.0.0/16               |
                    +-------------------------+
                    |   coreos-kvm container  | 0.0.0.0:2222 local
                    |         "host"          | port forwards to
                    |  +-------------------+  | guest :22
                    |  |  Container Linux  |  |
                    |  |   QEMU/KVM guest  |  |
                    |  |                   |  |
                    |  +-------------------+  |
                    +-------------------------+
- Jenkins executors/workers
- Docker builds in a clean Container Linux env
- Arbitrary VMs (QEMU can run almost anything)
Goal: Single node Kubernetes
- Write a Kubernetes deployment for a Container Linux QEMU/KVM VM
- Write a Kubernetes configmap with a Container Linux Config
- Write a Kubernetes service exposing 22 and 443
- Add a DNS record resolving to the apiserver (for kubectl)
Create the configmap, deployment, and service.
cd k8s
kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml
Let's take a look at what we've created.
- Mounts the Container Linux Config
- Adds port forwards from 2222 to 22 and 1443 to 443 from host to guest
image: quay.io/dghubble/coreos-kvm:stable-1353.7.0
env:
  - name: HOSTFWD
    value: "hostfwd=tcp::2222-:22,hostfwd=tcp::1443-:443"
  - name: CONFIG_FILE
    value: /userdata/config.yaml
ports:
  - name: apiserver
    containerPort: 1443
  - name: ssh
    containerPort: 2222
volumeMounts:
  - name: config-volume
    mountPath: /userdata
- Expose pod ports 2222 and 1443
- Assign a fixed service IP (hacky).
kind: Service
metadata:
  name: coreos-k8s
spec:
  clusterIP: 10.3.0.50
  selector:
    name: coreos-k8s
  ports:
    - name: ssh
      port: 22
      targetPort: 2222
    - name: api
      port: 443
      targetPort: 1443
Add a DNS record resolving to the service IP.
$ dig nested-k8s.lab.dghubble.io
10.3.0.50
Generate TLS certificates
./k8s-certgen -s nested-k8s.lab.dghubble.io \
  -m IP.1=10.3.0.1,DNS.1=nested-k8s.lab.dghubble.io
Write a Container Linux Config and place it in a Kubernetes ConfigMap.
- Add systemd units for etcd, flanneld, and kubelet
- Add TLS certificates (hacky: should be mouted as secrets into pod and then into guest)
- Just modify matchbox examples
Show the pod running the Container Linux VM.
kubectl get pods
kubectl get service coreos-k8s
Show that the pod is running a single-node Kubernetes inside.
KUBECONFIG=tls/kubeconfig
kubectl get nodes
kubectl get pods --all-namespaces
Let's make it more weird?
kubectl scale deployment coreos-k8s --replicas=3
Applications
- Develop and test federated Kubernetes
- Provide a (nested) Kubernetes to each developer
Pros and Cons
- Each "VM pod" is running qemu-system-x86 inside, baked into the image
- Image must provide the features
- Customizable cpu, memory, and disk size
- Providing Container Linux configs to guest
- Mounting volumes into guests from Kubernetes
- Snapshots, migrations, etc.
 
Future?
- rkt has an alternative stage 1 which can use QEMU/KVM
- kubevirt