%title: Kubeception %author: @dghubble
-> Experiments with QEMU/KVM on Kubernetes <-
-> Dalton Hubble <- -> @dghubble <-
- QEMU is an open-source machine emulator and virtualizer
- Combined with KVM, it runs virtual machines with almost natve speeds
- KVM (kernel-based Virtual Machines) is a kernel feature and kernel module
- Exposes /dev/kvm interface so userspace programs can use processor virtualization features
Typically you'd run QEMU/KVM VMs on a Linux host (laptop, CI, etc.)
- Container Linux docs on running under QEMU/KVM VMs.
- Testing CoreOS matchbox
You can also run QEMU/KVM VMs on a bare-metal Kubernetes cluster...
Run a privileged alpine container on a bare-metal Kubernetes cluster.
kubectl create -f alpine/deployment.yaml
Snippet from deployment.yaml
containers:
- name: alpine
image: alpine:3.5
securityContext:
privileged: true
command:
- sh
- -c
- "echo Hello; sleep 36000"
The privileged securityContext maps to the docker privileged flag, which is a mode to allow a pod to access the host's device files.
kubectl exec -it alpine-12345 /bin/ash
Look at devices files and find /dev/kvm
is available.
Let's install qemu-system-x86_64
and a few dependencies,
apk add --update qemu-system-x86_64 bzip2 wget
Download a Container Linux image.
wget https://stable.release.core-os.net/amd64-usr/current/coreos_production_qemu_image.img.bz2
Decompress the bz2 image.
bzip2 -d coreos_production_qemu_image.img.bz2
Start a QEMU/KVM instance.
qemu-system-x86_64 -m 1024 -enable-kvm -hda coreos_production_qemu_image.img -nographic
Build and publish a container image for Container Linux.
Build
- Install QEMU/KVM
- Download a Container Linux image
- Add any tools or utilities
Run
- Setup desired features for your guest VM
- Resize image to a desired disk size
- Launch QEMU/KVM VM with desired cpu/memory
QEMU has a hostfwd
option which forwards local ports to guest ports.
hostfwd=[tcp|udp]:[hostaddr]:hostport-[guestaddr]:guestport
Redirect incoming TCP or UDP connections to the host port hostport to the guest IP
address guestaddr on guest port guestport.
For example, hostfwd=tcp::2222-:22
will allow you to SSH from host to guest.
ssh -p 2222 localhost
Container Linux accepts Container Linux Configs (indirectly).
- Declarative YAML file
- Provisions disks during early boot
- Create partitions
- Write files (systemd units, networkd units, configs)
- Configure users
- Caveat: Convert to machine-readable Ignition first
Add an SSH public key for user "core".
passwd:
users:
- name: core
ssh_authorized_keys:
- "ssh-rsa blah"
Systemd should run the etcd2.service
.
systemd:
units:
- name: etcd2.service
enable: true
dropins:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_NAME=node0"
Environment="ETCD_ADVERTISE_CLIENT_URLS=http://127.0.0.1:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://127.0.0.1:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380"
Environment="ETCD_INITIAL_CLUSTER=node0=http://127.0.0.1:2380"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"
QEMU has a fw_cfg
option which allows a file to be passed to the guest.
fw_cfg [name=]name,file=file
Add named fw_cfg entry with contents from file file. The fw_cfg entries are passed
by QEMU through to the guest.
Container Linux can read from the QEMU firmware config device to get user-data.
-fw_cfg name=opt/com.coreos/config,file="${PWD}/ignition.ign" "$@"
Trick
Container accepts a Container Linux config, convert to Ignition. Pass into
guest via fw_cfg
to configure the VM.
./ct -in-file $CONTAINER_LINUX_CONFIG_FILE -out-file ${PWD}/ignition.ign
Nightly Jenkins pipeline publishes quay.io/dghubble/coreos-kvm.
Example
quay.io/dghubble/coreos-kvm:stable-1353.7.0
Environment Variables
- CONFIG_FILE - provide a Container Linux Config
- IGNITION_CONFIG_FILE - provide a raw Ignition Config
- CLOUD_CONFIG_FILE - provide a Cloud-Config
- VM_NAME - name of the VM
- VM_MEMORY - amount of VM RAW (4G)
- VM_DISK_SIZE - size of VM disk (12G)
- HOSTFWD - port forwards (hostfwd=tcp::2222-:22)
Create a "VM pod" with user-data in a ConfigMap.
kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml
Access the Container Linux VM via the service's cluster IP.
kubectl get service coreos-kvm
ssh [email protected]
+-----------+
| |
Service in 10.3.0.0/16 | Service | 10.3.0.X:22
| |
+-----------+
|
+-------------------------+
| Endpoints | 10.2.0.X:2222
+-------------------------+
Pod in 10.2.0.0/16 |
+-------------------------+
| coreos-kvm container | 0.0.0.0:2222 local
| "host" | port forwards to
| +-------------------+ | guest :22
| | Container Linux | |
| | QEMU/KVM guest | |
| | | |
| +-------------------+ |
+-------------------------+
- Jenkins executors/workers
- Docker builds in a clean Container Linux env
- Arbitrary VMs (QEMU can run almost anything)
Goal: Single node Kubernetes
- Write a Kubernetes deployment for a Container Linux QEMU/KVM VM
- Write a Kubernetes configmap with a Container Linux Config
- Write a Kubernetes service exposing 22 and 443
- Add a DNS record resolving to the apiserver (for kubectl)
Create the configmap, deployment, and service.
cd k8s
kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml
Let's take a look at what we've created.
- Mounts the Container Linux Config
- Adds port forwards from 2222 to 22 and 1443 to 443 from host to guest
image: quay.io/dghubble/coreos-kvm:stable-1353.7.0
env:
- name: HOSTFWD
value: "hostfwd=tcp::2222-:22,hostfwd=tcp::1443-:443"
- name: CONFIG_FILE
value: /userdata/config.yaml
ports:
- name: apiserver
containerPort: 1443
- name: ssh
containerPort: 2222
volumeMounts:
- name: config-volume
mountPath: /userdata
- Expose pod ports 2222 and 1443
- Assign a fixed service IP (hacky).
kind: Service
metadata:
name: coreos-k8s
spec:
clusterIP: 10.3.0.50
selector:
name: coreos-k8s
ports:
- name: ssh
port: 22
targetPort: 2222
- name: api
port: 443
targetPort: 1443
Add a DNS record resolving to the service IP.
$ dig nested-k8s.lab.dghubble.io
10.3.0.50
Generate TLS certificates
./k8s-certgen -s nested-k8s.lab.dghubble.io \
-m IP.1=10.3.0.1,DNS.1=nested-k8s.lab.dghubble.io
Write a Container Linux Config and place it in a Kubernetes ConfigMap.
- Add systemd units for etcd, flanneld, and kubelet
- Add TLS certificates (hacky: should be mouted as secrets into pod and then into guest)
- Just modify matchbox examples
Show the pod running the Container Linux VM.
kubectl get pods
kubectl get service coreos-k8s
Show that the pod is running a single-node Kubernetes inside.
KUBECONFIG=tls/kubeconfig
kubectl get nodes
kubectl get pods --all-namespaces
Let's make it more weird?
kubectl scale deployment coreos-k8s --replicas=3
Applications
- Develop and test federated Kubernetes
- Provide a (nested) Kubernetes to each developer
Pros and Cons
- Each "VM pod" is running qemu-system-x86 inside, baked into the image
- Image must provide the features
- Customizable cpu, memory, and disk size
- Providing Container Linux configs to guest
- Mounting volumes into guests from Kubernetes
- Snapshots, migrations, etc.
Future?
- rkt has an alternative stage 1 which can use QEMU/KVM
- kubevirt