Author | |
---|---|
Mateusz Loskot (aka mloskot) |
Follow-up to SIG-Windows weekly meeting from Jan 2, 2024, this is my brainstorm of ideas and issues about the state and future of SWDT. It turned out to be not as structured and systematic as I wished, apologies for the chaos of thoughts, but it is a brainstorm, or rather, a braindump.
The SWDT initial promise of the batteries-included local cluster with Windows worker nodes integration on variety of host operating system although very attractive turned out to be difficult to fulfill in manner that is reliably usable and maintainable at the same time.
The SWDT goals need to be clarified and redefined, so they are achievable.
-
Pre-built images used for node VM-s, especially Windows, impose significant maintenance burden and become outdated quickly.
-
Content of pre-built images is not easy to swap with latest releases of Kubernetes components.
-
Convenience tools like Vagrant require pre-built images and come with their own limitations and bugs, which may become frustrating blockers.
-
VirtualBox on Windows has a long history of performance issues and conflicts with other hypervisors like Windows-native Hyper-V or even with WSL.
-
Despite use of pre-built images, current implementation is still complex and hacky, based on numerous undocumented scripts and fragile flows. It also requires
make
which is neither Windows-native nor portable. -
Although the project is supposed to be intrinsically quite simple and high level, its overall maintenance unfriendliness makes it not attractive to contributors. It is worth to acknowledge that SWDT is an auxiliary project rather than a product.
-
The end-user documentation is too long and fiddly. Not quite reflects, quote: "Our goal is to make Windows ridiculously easy to contribute to, play with".
SWDT as a collection of current "Kubernetes on Windows" knowledge and practices typically scattered in numerous resources as well as user-friendly and configurable application to execute those locally for development, testing and learning purposes on Linux or Windows host.
SWDT as a facade masking complexity of Kubernetes documentation and tools for the basic purpose of running a cluster with Windows-based workloads.
Despite there is multitude of Kubernetes distributions available to
run a local cluster, niche for a solution with simple minimal bare
Kubernetes on Linux and Windows seems to remains vacant. That is,
vanilla Kubernetes without using any amazing magic like the kind does,
but plain kubeadm
joining virtual or physical machines as nodes.
Ku(bare)netes! [tm] ;)
Give users a command line tool that can non-interactively do:
- Create Linux VM and install minimal Linux OS configured to become Kubernetes master node with control plane
- Create Windows VM and install Windows Server OS configured to become Kubernetes worker node
- Install container runtime, CNI and Kubernetes from official release or user-specified build
- Initialise the control plane
- Join the Windows node
and that can do it:
- on Linux or Windows host
- with host-native (or close) virtualisation solution i.e. KVM/QEMU on Linux and Hyper-V on Windows
- without using any pre-built SWDT-specific images.
and such CLI is written in Go, so it can run on Linux and Windows host smoothly.
Major challenges:
- How to non-interactively create VM-s to build cluster nodes?
- How to non-interactively install OS on VM-s to provision cluster nodes?
Especially, how to manage it for Windows node as Windows OS is still not as friendly for non-human operators as Linux is.
libvirt is highly capable, but is still Linux-oriented solution, so it would be only usable for managing Linux node on Linux host.
Although there is Microsoft Hyper-V driver available, it does not seem to be battle tested.
Choosing libvirt comes with risk of becoming a distraction due to getting involved in low-level work of fixing and maintaining the driver which, however beneficial for the greater community, would stand against the own goals of SWDT project.
Microsoft offers plenty of options, but it looks like only the HCS API is feature-complete for management of VM lifecycle. Additionally, HCN API would help to initially setup VNet.
This, however, is a low-level option which will increase complexity and skill requirements what in turn may work against making SWDT a project that is attractive and accessible to new Kubernetes contributors and testers.
Although it's an old school and tedious solution, it actually is fairly easy to reason about. This may solve handling of non-interactive OS installation for building Windows node.
Microsoft offers VHD - 9 GB to download - which is actually a pre-installed OS, so it could potentially solve the OS installation issue.
A powerful high-level solution for day two provisioning of Windows VM, regardless of network configuration For example, setup container runtime, CNI, Kubernetes, etc. and this actually is what I used in my experiments in https://github.com/mloskot/swdt-nextgen
My research suggests me, that PowerShell seems to be de-facto a communication of choice for managing VM-s from Go apps, for example:
- https://github.com/taliesins/terraform-provider-hyperv/
- https://github.com/hashicorp/packer-plugin-hyperv
Similarly to PowerShell, SSH can be used for day two provisioning.
In fact, I have used it in my swdt-nextgen
experiments for setting up Linux node.
I have successfully tested it with Windows too, as alternative to PowerShell Direct,
but it is more fiddly and fragile than PowerShell (arguments handling, escaping,.
outputs manipulation, etc. may become PITA)
Whatever it can do, I, personally, would rather fail trying to fix libvirt driver for Hyper-V :)
I have no idea if and how it could help, but I thought it is worth to mention it. Perhaps WSL could be used as Linux node? History of WSL networking issues scares me. Regardless of what the WSL can do for SWDT, it will be best if SWDT aims for simple Linux and Windows native solutions, VM-s or bare metal hosts.
IFF we resist to keep VM management as one of features of SWDT, then, perhaps, we should revisit the use of Vagrant, but instead of relying on a custom image/box, we should ensure SWDT can work with vanilla Linux and Windows Server images available out there.
The compelling reason to stay with Vagrant is their promise to deliver Vagrant 3.0 in Go. This would open opportunity to write SWDT-specific plugins, in Go, should we discover a need for that. The problem is that HashiCorp seems to be far from delivering the Vagrant 3.0.
Vagrant is certainly a unified API and a facade removing lots of virtualization complexity.
All credit for what follows below goes to Amim, for his idea of lean approach to SWDT.
-
User creates VM-s however she likes, at least two, one Linux and one Windows, but according to certain well-documented basic requirements in order to make VM-s viable as nodes. For example:
- Minimal installation of Linux Debian or Fedora
- Minimal installation of Windows Server 2022
- virtual switches created
- static IP-s assigned
- SSH servers installed on all nodes
- public SSH keys deployed for password-less SSH communication nodes
- password set for local
Administrator
user on Windows nodes, in caseswdt
needs to run PowerShell Direct (i.e. SSH comm runs short) - CNI configuration decided (e.g. pod CIDR)
-
User writes the details of the VM-s in form of simple YAML, in
my-awesome-cluster.yaml
Alternatively, a super friendly mode, user runs
swdt config create --output my-awesome-cluster.yaml
and a beautiful bubbly TUI asks the user sequence of questions, then generates the YAML ;) -
User runs
swdt cluster create --config my-awesome-cluster.yaml
-
The
swdt
takes over the nodes and does what is necessary to create control plane and join worker node(s):- optimises system level configuration: disables swap, loads kernel modules, enables iptables features, disables Windows Firewall
- edits
hosts
files for hostname-based node-to-node communication - installs containerd or other CRI specified by user's configuration and supported by SWDT
- installs CNI specified by user's configuration and supported by SWDT
- installs Kubernetes
- runs tests
Of course, software components like containerd and Kubernetes can be specified as "build this from source from this tag for me, please", then.
-
The
swdt
offers day two commands too:swdt get kubeconfig --output ./.kubeconfig
swdt get nodes
andswdt get pods
as convenient shortcuts that do not require host-local.kubeconfig
swdt node stop|start
swdt cluster update --config my-new-node-spec-here.yaml
that is adding new things should be supported, but reconfiguring existing setup like network should not
Stretching the dream further, it would be awesome if there was swdt cluster destroy
reverting all the swdt config create
changes leaving the VM-s cleaned up.
If we figure a simple usable VM management API, then swdt
could use snapshots/checkpoints,
even if it does not provide a complete VM lifecycle management.
The major benefits of the simplified (lean) approach:
- Clears up almost all of the current issues discussed above.
- Avoids over-engineering SWDT.
- Potentially, may even become virtualization-agnostic and, in the step 1. above, could allow bare metal hosts for nodes - it is all about networking after all.
The SWDT, as explained above, falls into a category of auxiliary projects, so it has slim chances to become a rockstar of Kubernetes distributors. However, there are plenty of reasons to make SWDT a well-designed product, useful to attack real problems and pleasant to work with and contribute to.
It is important that community involved in SIG-Windows agree upon common goals and features, so they find SWDT usable for their own tasks. Otherwise, the project will become deprecated sooner than it is released.
Copied to kubernetes-sigs/sig-windows-dev-tools#277