For this environment, we'll be using these hostname/IP combinations:
helper
=192.168.110.39
bootstrap
=192.168.110.60
controlplane-0
=192.168.110.61
controlplane-0
=192.168.110.62
controlplane-0
=192.168.110.63
worker-0
=192.168.110.65
worker-1
=192.168.110.66
-
Configure libvirt network and storage
Using your tool of choice, configure the libvirt network. You can create a new one or modify the default if desired:
- Assign a name, e.g.
okd
- Mode =
NAT
- IPv4 config
- Choose any subnet that doesn't overlap with your external network. I'll be using
192.168.110.0/24
- Disable DHCP
- DNS domain name - do not use your regular domain name, I'm using
okd.lan
- Choose any subnet that doesn't overlap with your external network. I'll be using
After creating the new libvirt network, we need to inform the local DNS resolver of how to find the domain. With Fedora 33,
systemd-resolved
is used, so we need to useresovlectl
to configure it.# change these values to match your environment # virbr1 = the bridge interface to the libvirt network # okd.lan = the domain you chose # 110.168.192.in-addr.arpa = the reverse subnet you're using sudo resolvectl domain virbr1 '~okd.lan' '~110.168.192.in-addr.arpa' sudo resolvectl default-route virbr1 false sudo resolvectl dns virbr1 192.168.110.1 # verify settings resolvectl domain resolvectl dns
If needed, create a storage pool for where you'll be storing the VMs. The control plane nodes, in particular, need low latency storage, e.g. SSD or NVMe.
- Assign a name, e.g.
-
Create and configure helper node
The helper node will provide DNS and DHCP via dnsmasq, an http server, and load balancing via haproxy. Choose any OS you like, I'll be using Fedora Server. Create the VM (1 CPU, 1 GiB memory), install the OS, apply updates. Use a static IP address, I'm using
192.168.110.39
.SSH to the helper node for the following steps.
-
Install Podman, haproxy, and dnsmasq
dnf -y install podman haproxy dnsmasq
-
Configure dnsmasq
cat << EOF > /etc/dnsmasq.d/okd.conf expand-hosts domain-needed domain=okd.lan # OKD required address=/api.cluster.okd.lan/192.168.110.39 address=/api-int.cluster.okd.lan/192.168.110.39 address=/.apps.cluster.okd.lan/192.168.110.39 # create node entries address=/bootstrap.cluster.okd.lan/192.168.110.60 address=/controlplane-0.cluster.okd.lan/192.168.110.61 address=/controlplane-1.cluster.okd.lan/192.168.110.62 address=/controlplane-2.cluster.okd.lan/192.168.110.63 address=/worker-0.cluster.okd.lan/192.168.110.65 address=/worker-1.cluster.okd.lan/192.168.110.66 EOF # enable and start dnsmasq systemctl enable --now dnsmasq
-
Configure haproxy
cat << EOF > /etc/haproxy/haproxy.cfg global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon stats socket /var/lib/haproxy/stats defaults mode tcp log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 10m timeout server 10m timeout http-keep-alive 10s timeout check 10s maxconn 3000 listen stats bind :9000 mode http stats enable stats uri / monitor-uri /healthz frontend openshift-api-server bind *:6443 default_backend openshift-api-server option tcplog backend openshift-api-server balance source server bootstrap 192.168.110.60:6443 check server controlplane0 192.168.110.61:6443 check server controlplane1 192.168.110.62:6443 check server controlplane2 192.168.110.63:6443 check frontend machine-config-server bind *:22623 default_backend machine-config-server option tcplog backend machine-config-server balance source server bootstrap 192.168.110.60:22623 check server controlplane0 192.168.110.61:22623 check server controlplane1 192.168.110.62:22623 check server controlplane2 192.168.110.63:22623 check frontend ingress-http bind *:80 default_backend ingress-http option tcplog backend ingress-http mode http balance source server controlplane0-http-router 192.168.110.60:80 check server controlplane1-http-router 192.168.110.61:80 check server controlplane2-http-router 192.168.110.62:80 check server worker0-http-router 192.168.110.65:80 check server worker1-http-router 192.168.110.66:80 check frontend ingress-https bind *:443 default_backend ingress-https option tcplog backend ingress-https balance source server controlplane0-https-router 192.168.110.60:443 check server controlplane1-https-router 192.168.110.61:443 check server controlplane2-https-router 192.168.110.62:443 check server worker0-https-router 192.168.110.65:443 check server worker1-https-router 192.168.110.66:443 check EOF # enable and start haproxy systemctl enable --now haproxy
-
Configure an http server using podman
The FCOS rootfs image used in this step is here.
# start the httpd container on port 8080 podman run -d \ --restart=unless-stopped \ -p 8080:80 \ -v /var/www/html:/usr/local/apache2/htdocs \ docker.io/library/httpd:2.4-alpine
-
-
Download and place OKD resources
From the libvirt host, download the following resources:
From the OKD release page on GitHub, the
openshift-client
andopenshift-install
packages.Un-gzip and move the binaries to
/usr/local/bin
:# download wget https://github.com/openshift/okd/releases/download/4.6.0-0.okd-2021-02-14-205305/openshift-install-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz wget https://github.com/openshift/okd/releases/download/4.6.0-0.okd-2021-02-14-205305/openshift-client-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz # unpack tar xzf openshift-install-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz tar xzf openshift-client-linux-4.6.0-0.okd-2021-02-14-205305.tar.gz # place sudo mv openshift-install oc kubectl /usr/local/bin rm README.md
From the helper node:
Links for the most recent binaries to download are here.
# organizational directories mkdir -p /var/www/html/{install,ignition} # download the kernel image wget https://builds.coreos.fedoraproject.org/prod/streams/testing-devel/builds/33.20210212.20.1/x86_64/fedora-coreos-33.20210212.20.1-live-kernel-x86_64 mv fedora-coreos-33.20210212.20.1-live-kernel-x86_64 /var/www/html/install/kernel # download the initramfs image wget https://builds.coreos.fedoraproject.org/prod/streams/testing-devel/builds/33.20210212.20.1/x86_64/fedora-coreos-33.20210212.20.1-live-initramfs.x86_64.img mv fedora-coreos-33.20210212.20.1-live-initramfs.x86_64.img /var/www/html/install/initramfs.img # download the rootfs image wget https://builds.coreos.fedoraproject.org/prod/streams/testing-devel/builds/33.20210212.20.1/x86_64/fedora-coreos-33.20210212.20.1-live-rootfs.x86_64.img mv fedora-coreos-33.20210212.20.1-live-rootfs.x86_64.img /var/www/html/install/rootfs.img # set permissions for all chmod 444 /var/www/html/install/*
-
Create
install-config.yaml
From the libvirt host.
Substitue your values for the SSH key and adjust others as needed, e.g.
networking.machineNetwork.cidr
.mkdir ~/okd && cd ~/okd # a real pull secret is not needed for OKD PULLSECRET='{"auths":{"fake":{"auth": "bar"}}}' # use the path for your public key SSHKEY=$(cat ~/.ssh/id_*.pub) cat << EOF > install-config.yaml apiVersion: v1 baseDomain: okd.lan compute: - architecture: amd64 hyperthreading: Enabled name: worker replicas: 0 controlPlane: architecture: amd64 hyperthreading: Enabled name: master replicas: 3 metadata: creationTimestamp: null name: cluster networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 192.168.110.0/24 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 platform: none: {} publish: External pullSecret: '$PULLSECRET' sshKey: | $SSHKEY EOF
-
Create ignition files
From the libvirt host
This will be done in two phases so that the control plane can be marked unschedulable.
# create a working directory cd ~/okd && mkdir cluster && cp install-config.yaml cluster/ # generate manifests openshift-install create manifests --dir=cluster # set the control plane non-schedulable sed -i 's/mastersSchedulable: true/mastersSchedulable: false/' cluster/manifests/cluster-scheduler-02-config.yml # generate ignition files openshift-install create ignition-configs --dir=cluster # copy the ignition files to the helper node scp cluster/*.ign [email protected]:/var/www/html/ignition
You may need to adjust permissions for the files on the helper node so that the containerized web server can access them.
-
Create and configure the VMs
From the libvirt host.
# create the disk images, set the directory according to your host for NODE in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do sudo qemu-img create -f qcow2 /var/lib/libvirt/okd-images/$NODE.qcow2 120G done # set permissions sudo chown qemu:qemu /var/lib/libvirt/okd-images/* sudo chmod 600 /var/lib/libvirt/okd-images/* # organization is good mkdir -p node-configs wget http://192.168.110.39:8080/install/kernel && sudo mv kernel /var/lib/libvirt/boot/ wget http://192.168.110.39:8080/install/initramfs.img && sudo mv initramfs.img /var/lib/libvirt/boot/ sudo chown qemu:qemu /var/lib/libvirt/boot/kernel sudo chown qemu:qemu /var/lib/libvirt/boot/initramfs.img # where to find the install files needed for direct kernel boot of the VMs KERNEL='/var/lib/libvirt/boot/kernel' INITRD='/var/lib/libvirt/boot/initramfs.img' KERNEL_ARGS='coreos.live.rootfs_url=http://192.168.110.39:8080/install/rootfs.img rd.neednet=1 coreos.inst.install_dev=/dev/vda' # set static IP configuration IP_STR='ip=192.168.110.NODEIP::192.168.110.1:255.255.255.0:NODENAME.cluster.okd.lan:enp1s0:none nameserver=192.168.110.39' IP_bootstrap=$(echo $IP_STR | sed 's/NODEIP/60/;s/NODENAME/bootstrap/') IP_controlplane0=$(echo $IP_STR | sed 's/NODEIP/61/;s/NODENAME/controlplane-0/') IP_controlplane1=$(echo $IP_STR | sed 's/NODEIP/62/;s/NODENAME/controlplane-1/') IP_controlplane2=$(echo $IP_STR | sed 's/NODEIP/63/;s/NODENAME/controlplane-2/') IP_worker0=$(echo $IP_STR | sed 's/NODEIP/65/;s/NODENAME/worker-0/') IP_worker1=$(echo $IP_STR | sed 's/NODEIP/66/;s/NODENAME/worker-1/') # bootstrap ignition location BOOTSTRAP_IGNITION='coreos.inst.ignition_url=http://192.168.110.39:8080/ignition/bootstrap.ign' # create the bootstrap machine sudo virt-install \ --virt-type kvm \ --ram 12188 \ --vcpus 4 \ --os-variant fedora-coreos-stable \ --disk path=/var/lib/libvirt/okd-images/bootstrap.qcow2,device=disk,bus=virtio,format=qcow2 \ --noautoconsole \ --vnc \ --network network:okd \ --boot hd,network \ --install kernel=${KERNEL},initrd=${INITRD},kernel_args_overwrite=yes,kernel_args="${KERNEL_ARGS} ${IP_bootstrap} ${BOOTSTRAP_IGNITION}" \ --name bootstrap \ --print-xml 1 > node-configs/bootstrap.xml # set the ignition location for the control plane nodes CONTROL_IGNITION='coreos.inst.ignition_url=http://192.168.110.39:8080/ignition/master.ign' # define the nodes, set values according to your host and desired outcome for NODE in controlplane-0 controlplane-1 controlplane-2; do # jiggery pokery to get the IP string via variable reference ipvarname="IP_$(echo $NODE | sed 's/-//')" sudo virt-install \ --virt-type kvm \ --ram 12188 \ --vcpus 4 \ --os-variant fedora-coreos-stable \ --disk path=/var/lib/libvirt/okd-images/$NODE.qcow2,device=disk,bus=virtio,format=qcow2 \ --noautoconsole \ --vnc \ --network network:okd \ --boot hd,network \ --install kernel=${KERNEL},initrd=${INITRD},kernel_args_overwrite=yes,kernel_args="${KERNEL_ARGS} ${!ipvarname} ${CONTROL_IGNITION}" \ --name $NODE \ --print-xml 1 > node-configs/$NODE.xml done # set the worker ignition location WORKER_IGNITION='coreos.inst.ignition_url=http://192.168.110.39:8080/ignition/worker.ign' for NODE in worker-0 worker-1; do ipvarname="IP_$(echo $NODE | sed 's/-//')" sudo virt-install \ --virt-type kvm \ --ram 8192 \ --vcpus 2 \ --os-variant fedora-coreos-stable \ --disk path=/var/lib/libvirt/okd-images/$NODE.qcow2,device=disk,bus=virtio,format=qcow2 \ --noautoconsole \ --vnc \ --network network:okd \ --boot hd,network \ --install kernel=${KERNEL},initrd=${INITRD},kernel_args_overwrite=yes,kernel_args="${KERNEL_ARGS} ${!ipvarname} ${WORKER_IGNITION}" \ --name $NODE \ --print-xml 1 > node-configs/$NODE.xml done # define each of the VMs for VM in `ls node-configs/`; do sudo virsh define node-configs/$VM # for some reason libvirt doesn't like the kernel and initd location, set it forcefully sudo virt-xml $VM --edit \ --xml ./os/kernel=$KERNEL \ --xml ./os/initrd=$INITRD done
-
Install FCOS
Now we need to power on each of the VMs and let them boot the first time to install FCOS:
for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do sudo virsh start $VM # optionally, add a sleep here to not overwhelm the storage #sleep 30 done
The VMs will start, install FCOS, then power off. After they have powered off, we need to adjust the settings to boot from the drive as normal
for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do sudo virt-xml $VM --edit \ --xml ./on_reboot=restart \ --xml xpath.delete=./os/kernel \ --xml xpath.delete=./os/initrd \ --xml xpath.delete=./os/cmdline done
-
Finish deploying
Finally, start the VMs so that OKD can finish deploying.
for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do sudo virsh start $VM done
Monitor the progress, and complete the deployment, using these commands, from the libvirt host.
cd ~/okd # bootstrap openshift-install wait-for bootstrap-complete --log-level=debug --dir=cluster # turn off bootstrap when it's done sudo virsh destroy bootstrap # connect to the cluster export KUBECONFIG=~/okd/cluster/auth/kubeconfig # approve CSRs watch -n 5 oc get csr # when there are two CSRs pending, approve them oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve # two additional CSRs will be requested shortly after, repeat the commands above to approve them # when the CSRs are approved, wait for the install to complete openshift-install wait-for install-complete --log-level=debug --dir=cluster
Fin.
for VM in bootstrap controlplane-0 controlplane-1 controlplane-2 worker-0 worker-1; do
sudo virsh destroy $VM
sudo virsh undefine --domain $VM
done
sudo rm /var/lib/libvirt/okd-images/bootstrap.qcow2
sudo rm /var/lib/libvirt/okd-images/controlplane*.qcow2
sudo rm /var/lib/libvirt/okd-images/worker*.qcow2
Hi Andrew, thanks for writing the gist up.
I'm trying to follow along but stuck between point 1 & 2.
I'm struggling for weeks and find some solutions for different issues.
I hope the following posts can help someone to save time and get more understanding in how it works.
Issue:
When booting up the helper node, dnsmasq fails to start because of the error: unknown interface enp1s0
It seems that NetworkManager has not brought the Ethernet interface up in time, so it fails to start.
To start dnsmasq manually after logging in is possible.
Solution:
Using 'bind-dynamic' instead of 'bind-interfaces' allows start of dnsmasq even when 'interface=enp1s0' is still down.
As soon as it is up, it will bind the interface and listen on it.
Issue:
When booting up the helper node, dnsmasq fails to start because of the systemd-resolved is started with preference and has port 53 already in use.
Solution:
Disable the local DNS stub listener from systemd-resolved with the setting 'DNSStubListener=no'.
Issue:
When rebooting the host machine the domain 'okd.lan' libvirt network can not be reached due to missing settings in the resolver.
Solution:
Persistent the local DNS resolver settings with a libvirt network hook script.
Issue:
The host machine can not resolve the domain 'okd.lan' libvirt network because it can not find the resolver.
Solution:
systemd-resolve listens on 127.0.0.53%lo:53 loopback interface.
Include 'DNS=127.0.0.53' in the systemd.network - Network configuration.