Below is some basic notes on how i setup and tested sidero on a local laptop
- Read the docs https://www.sidero.dev/docs/v0.4/ and https://www.talos.dev/docs/v0.14/ and understand a bit on the concepts.
- Grab the binaries - https://www.sidero.dev/docs/v0.4/getting-started/prereq-cli-tools/. Install talosctl, clusterctl. You probs have kubectl. Clusterctl is awesome for creatin ga local docker cluster for instant testing, and talosctl is what you use to talk to talos via api (
talosctl -n <nodeip> dmesg | reboot | logs
)
Don't grab your clusterctl from arch AUR, its not the right version!
- Figure out your desired baremetal structure. From your git it looks like you have a pi-master - welcome to level 1000. Im running one just fine, but it does increase the complexity/bullshittery. Probs grab some tissues or stand up amd64 nodes first.
- Now to setup dhcp for dual booting. If you do amd64 its not so bad as its one config, but with pi you need to config it to be able to send either arm64 or amd64 binaries. https://www.sidero.dev/docs/v0.4/getting-started/prereq-dhcp/ is the document. I had problems here, and i had to mod my udm-pro's dnsmasq with extra configs that get applied at boot to make it work. Codeblock on this page gives you enough info to mod your config - cant see your router on
- I used the bootstrapping section a few times to test - https://www.sidero.dev/docs/v0.4/guides/bootstrapping/. Cool use case.
Note if you dont have IPMI talos wont reboot nodes after certain steps - will require some manual power-cycling.
Below are my notes, largely on how i ran my workflow agsint the 'bootstrapping' link above.
You could create/run talos in a vm for this if you prefer/have proxmox/etc avaliable.
To create a sidero
# This will do a cluster with just 1 master node. Set PUBLIC_IP to your computers IP.
export PUBLIC_IP=192.168.2.61
talosctl cluster create \
--kubernetes-version 1.22.2 \
-p 69:69/udp,8081:8081/tcp \
--workers 0 \
--endpoint $PUBLIC_IP
# Need to untaint master else it wont run anything
kubie ctx # Change to your this new nodes context.
kubectl taint node talos-default-master-1 node-role.kubernetes.io/master:NoSchedule-
# This boostraps a entire sidero setup on your testing cluster.
# HOST_NETWORK is critical to make it work else it doesnt have access to the ports it needs
# Ensure PUBLIC_IP is in env from above.
SIDERO_CONTROLLER_MANAGER_HOST_NETWORK=true \
SIDERO_CONTROLLER_MANAGER_API_ENDPOINT=$PUBLIC_IP \
clusterctl init -b talos -c talos -i sidero
You can stop here if you are testing with raw talos/sidero. From here on its how I bootstrap flux to the test cluster, as i use flux to manage it (like most others do)
kubectl create ns flux-system
# Have to get sops secret in the flux-system namespace before bootstrap
cat ~/.config/sops/age/keys.txt |
kubectl -n flux-system create secret generic sops-age \
--from-file=age.agekey=/dev/stdin
# Change url below to your sidero cluster path.
# This is for my cluster at https://github.com/Truxnell/home-cluster (the sidero stuff may still be in the 'development' branch though depending on when this is viewed)
flux install --version=v0.24.0 --export | kubectl apply -f -
kubectl apply -k k8s/clusters/sidero/
Watch it fly. Or do nothing and fail. Either way time for a coffee or whiskey depending on time of day.
watch kubectl --context=sidero-demo \
get servers,machines,clusters
Might be worth checking you can curl / tft the ipxe files before proceeding:
Info here: https://www.sidero.dev/docs/v0.3/getting-started/expose-services/
You can use atftp
on linux to check tftp as well.
Ensure you set your dhcp to boot to the sidero IP (if followign this, your local computer IP) with the right ipxe filenames. (This bit me for a while with the raspi!) Boot up some nodes, they should hit sidero-controller-manager. If you check the logs you should see it connecting and being sent firmware, and action on the node.
Info here: https://www.sidero.dev/docs/v0.3/getting-started/import-machines/
The server should appear in the 'servers' listing in kubectl (metal.sidero.dev/server in lens crd) as 'accepted=false, allocated=false, clean = false'
This will give you a yaml with the servesr features - useful if you are using flux to manage it (i.e. kubectl server xxx -o yaml > server.yaml
and put in your git repo as a server)
You have to manually accept the servers before sidero will consider it as the owner. Once you accept it will wipe it and install the requested talos version on it, ready for commisioning. If the server isn't required by a cluster, it will just bootloop until sidero allocates it a job.
If you use flux the server you can just add accepted: true
to the yaml (spec.accepted=true) and push it, once flux reconciles sidero will see the server is now accepted and will wipe ALL DRIVES, reboot and install
If you dont have IPMI this is a manual reboot step
Once it finishes this, kubectl get servers
should report it as accepted=true, clean = true and allocated=false (if you havent setup a cluster yet)
Ref: https://www.sidero.dev/docs/v0.3/getting-started/create-workload/
I stole others yaml and run flux at this point, but basically we need to tell sidero what machine config we want. It will then chose servers that accomodate that need and get them to join your cluster, bootstrap etcd, etc.
You can generate the yaml with a clusterctl command and the appropriate env
set as the above link shows.
For homelab, we tend to create labels to set machines to be master/worker which can then be picked up in the serverclasses like below:
Master
---
apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
name: masters
spec:
...
selector:
matchLabels:
master: "true"
Pi-masters (Useful to seperate these serverclasses as arm64 and amd64 quite different.)
---
apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
name: pi-masters
namespace: default
spec:
...
selector:
matchLabels:
pi-master: "true"
Worker:
---
apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
name: workers
spec:
...
selector:
matchExpressions:
- key: master
operator: NotIn
values:
- "true"
- key: pi-master
operator: NotIn
values:
- "true"
If you look at our gits you can see generally the below Check out the sidero docs under 'Resource configuraiton' for the details.
Contains the yaml to define a cluster as we want it - CIDR, control plan, # masters/workers etc. Can be generated by clusterctl command as noted in docs.
IPXE environments that define how sidero handles booting. Contains the initramfs & vmlinuz verisons/github links sidero downloads to serve. One will be configured by default, but you will need to create one for arm64 for raspi (as raspi needs ar64 compiled boots) Note, also contains a http link talos.config that needs to be set to your control plane's url.
Here we define how we want the servers to be setup. We can define master/worker/pi-worker so they are configures seperately.
Few things commonly done:
- Add gracefulshutdown labels to the nodes
- Recplace flannel with calico/cilium/etc
- deploy kube-serving-cert
- ntp settings, eth0 settings, etc
Here we have the actual server definitions. Sidero will generate these if they dont exist when a server connects the first time. We tend to grab these and put them in our flux config, so that when we bootstrap the server details are set immediatly, including the 'accepted' flag.
Grab the yaml by outputting the yaml from kubectl
❯ kubectl get servers a19dd13e-6889-e146-ec82-1c697aa8a3bc -o yaml
apiVersion: metal.sidero.dev/v1alpha1
kind: Server
metadata:
creationTimestamp: "2022-02-15T09:32:01Z"
finalizers:
- storage.finalizers.server.k8s.io
generation: 1
name: a19dd13e-6889-e146-ec82-1c697aa8a3bc
resourceVersion: "3150179"
uid: 253702d0-1778-4b11-8632-f10513b88c28
spec:
accepted: false
cpu:
manufacturer: Intel(R) Corporation
version: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
hostname: 10.8.20.44
system:
family: PA
manufacturer: Intel(R) Client Systems
productName: NUC11PAHi7
serialNumber: G6PA125005Z0
skuNumber: RNUC11PAHi7000
version: M15513-304
status:
addresses:
- address: 10.8.20.44
type: InternalIP
power: "on"
This step only once sidero cluster is running.
To be able to interact with the new cluster you need to grab below, as your sidero cluster has your provisioned clusters details.
kubectl get talosconfig \
-l cluster.x-k8s.io/cluster-name=cluster-0 \
-o yaml -o jsonpath='{.items[0].status.talosConfig}' > management-plane-talosconfig.yaml
I found the files came out with no control plane IP - I had to edit the file to add the control plane IP in it
# -- talosconfig - specify talosconfig to use
# -- kubeconfig - asks api for the kubeconfig for the cluster
# -n <ip> - replace with your masters IP. most talosctl commands need the node IP to run against specified.
talosctl --talosconfig management-plane-talosconfig.yaml kubeconfig -n 192.168.2.143
Deep breath first, this is harder, for reasons described here: https://www.sidero.dev/docs/v0.4/guides/rpi4-as-servers/
TLDR: Raspi needs to network boot twice - once to download firwmare toget its BIOS to do anything, then that firmware tells it what its boot settings are (and we set it to net-booth in advance). So it does two netboots before it even hits sidero.
Others in the k8s-at-home community have put in their clusters a better method than what is outlined here.
Check out :
- https://github.com/anthr76/infra/tree/main/clusters/scr1/intergrations/raspberrypi4-uefi
- https://github.com/Truxnell/container-images/tree/main/apps/raspberrypi4-uefi
TLDR:
- Flash a sd for the rpi with EEPROM image https://github.com/raspberrypi/rpi-eeprom/releases
- Boot rpi and setup firmware to network boot first, and ensure you save your changes. This will actually save into the RPI-EFI.fd file on the sd card (including the pi's serial number, meaning you have to make this file for every bloody pi)
- Copy this file into your git, in a foldername with the serial number. If you setup like ours, the dockerfile will create a docker with the folders/firmware per pi in it.
- Below is then how to (manually) patch with a path.yaml to run our docker as a initcontainer to sidero - it will just copy the individual rpi firmware into the sidero conatiner, so it can send the appropriate firmware per PI based off its serial number.
Assuming the above is done, you can create a patch file, sample of mine below.
# patch.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sidero-controller-manager
namespace: sidero-system
spec:
template:
spec:
volumes:
- name: tftp-folder
emptyDir: {}
initContainers:
- image: ghcr.io/truxnell/container-images/raspberrypi4-uefi:latest
imagePullPolicy: Always
name: tftp-folder-setup
command:
- cp
args:
- -r
- /tftp
- /var/lib/sidero/
volumeMounts:
- mountPath: /var/lib/sidero/tftp
name: tftp-folder
containers:
- name: manager
volumeMounts:
- mountPath: /var/lib/sidero/tftp
name: tftp-folder
Then apply the patch to sidero-controller-manager.
# Scale sidero-controller-manager deployment to 0 first!
kubectl -n sidero-system scale deployment sidero-controller-manager --replicas 0
kubectl -n sidero-system patch deployments.apps sidero-controller-manager --patch "$(cat k8s/manifests/sidero-system/sidero-controller-manager/patch.yaml)"
kubectl -n sidero-system scale deployment sidero-controller-manager --replicas 1
This stops new servesr being added while you do maintenance
kubectl edit cluster
# add paused:true to spec
kubectl get machine
NAME CLUSTER AGE PROVIDERID PHASE VERSION
cluster-0-cp-c7h7n cluster-0 8h sidero://5923cf63-8129-8eac-68ce-1c697a611cfd Running v1.21.5
kubectl delete server cluster-0-cp-c7h7n
This should trigger a remote-wipe.
talosctl -n x.x.x.x reset
may also help if you get stuck
Update sidero via clusterctl upgrade plan
you may need to reset the SIDERO_CONTROLLER_MANAGER
... variables again to ensure it retains host-network, else you wont be able to connect to the ipxe again.
> clusterctl upgrade plan
Checking cert-manager version...
Cert-Manager is already up to date
Checking new release availability...
Latest release available for the v1alpha4 API Version of Cluster API (contract):
NAME NAMESPACE TYPE CURRENT VERSION NEXT VERSION
bootstrap-talos cabpt-system BootstrapProvider v0.4.3 Already up to date
control-plane-talos cacppt-system ControlPlaneProvider v0.3.1 Already up to date
cluster-api capi-system CoreProvider v0.4.7 Already up to date
infrastructure-sidero sidero-system InfrastructureProvider v0.4.1 Already up to date
You are already up to date!
Latest release available for the v1beta1 API Version of Cluster API (contract):
NAME NAMESPACE TYPE CURRENT VERSION NEXT VERSION
bootstrap-talos cabpt-system BootstrapProvider v0.4.3 v0.5.2
control-plane-talos cacppt-system ControlPlaneProvider v0.3.1 v0.4.4
cluster-api capi-system CoreProvider v0.4.7 v1.1.1
infrastructure-sidero sidero-system InfrastructureProvider v0.4.1 v0.5.0
You can now apply the upgrade by executing the following command:
clusterctl upgrade apply --contract v1beta1
then apply as stated (in this case, clusterctl upgrade apply --contract v1beta1
)