Bootstrap sidero cluster

Below is some basic notes on how i setup and tested sidero on a local laptop

First steps

Read the docs https://www.sidero.dev/docs/v0.4/ and https://www.talos.dev/docs/v0.14/ and understand a bit on the concepts.
Grab the binaries - https://www.sidero.dev/docs/v0.4/getting-started/prereq-cli-tools/. Install talosctl, clusterctl. You probs have kubectl. Clusterctl is awesome for creatin ga local docker cluster for instant testing, and talosctl is what you use to talk to talos via api (talosctl -n <nodeip> dmesg | reboot | logs)

Don't grab your clusterctl from arch AUR, its not the right version!

Figure out your desired baremetal structure. From your git it looks like you have a pi-master - welcome to level 1000. Im running one just fine, but it does increase the complexity/bullshittery. Probs grab some tissues or stand up amd64 nodes first.
Now to setup dhcp for dual booting. If you do amd64 its not so bad as its one config, but with pi you need to config it to be able to send either arm64 or amd64 binaries. https://www.sidero.dev/docs/v0.4/getting-started/prereq-dhcp/ is the document. I had problems here, and i had to mod my udm-pro's dnsmasq with extra configs that get applied at boot to make it work. Codeblock on this page gives you enough info to mod your config - cant see your router on
I used the bootstrapping section a few times to test - https://www.sidero.dev/docs/v0.4/guides/bootstrapping/. Cool use case.

Note if you dont have IPMI talos wont reboot nodes after certain steps - will require some manual power-cycling.

Below are my notes, largely on how i ran my workflow agsint the 'bootstrapping' link above.

Create local sidero cluster

You could create/run talos in a vm for this if you prefer/have proxmox/etc avaliable.

To create a sidero

# This will do a cluster with just 1 master node.  Set PUBLIC_IP to your computers IP.
export PUBLIC_IP=192.168.2.61
talosctl cluster create \                                 
 --kubernetes-version 1.22.2 \                            
  -p 69:69/udp,8081:8081/tcp \         
  --workers 0 \
  --endpoint $PUBLIC_IP

Untaint master

# Need to untaint master else it wont run anything
kubie ctx # Change to your this new nodes context.
kubectl taint node talos-default-master-1 node-role.kubernetes.io/master:NoSchedule-

Bootstrap sidero

# This boostraps a entire sidero setup on your testing cluster.
# HOST_NETWORK is critical to make it work else it doesnt have access to the ports it needs
# Ensure PUBLIC_IP is in env from above.
SIDERO_CONTROLLER_MANAGER_HOST_NETWORK=true \
SIDERO_CONTROLLER_MANAGER_API_ENDPOINT=$PUBLIC_IP \
clusterctl init -b talos -c talos -i sidero

Bootstrap flux (optional)

Create flux-system

You can stop here if you are testing with raw talos/sidero. From here on its how I bootstrap flux to the test cluster, as i use flux to manage it (like most others do)

kubectl create ns flux-system

Create sops-age secret

# Have to get sops secret in the flux-system namespace before bootstrap
cat ~/.config/sops/age/keys.txt |
    kubectl -n flux-system create secret generic sops-age \
    --from-file=age.agekey=/dev/stdin

Bootstrap flux

# Change url below to your sidero cluster path.
# This is for my cluster at https://github.com/Truxnell/home-cluster (the sidero stuff may still be in the 'development' branch though depending on when this is viewed)
flux install --version=v0.24.0 --export | kubectl apply -f -
kubectl apply -k k8s/clusters/sidero/

Finish setup

Watch provisioning

Watch it fly. Or do nothing and fail. Either way time for a coffee or whiskey depending on time of day.

watch kubectl --context=sidero-demo \
  get servers,machines,clusters

Point servers to sidero cluster

Test you can see the files

Might be worth checking you can curl / tft the ipxe files before proceeding: Info here: https://www.sidero.dev/docs/v0.3/getting-started/expose-services/ You can use atftp on linux to check tftp as well.

Boot a node!

Ensure you set your dhcp to boot to the sidero IP (if followign this, your local computer IP) with the right ipxe filenames. (This bit me for a while with the raspi!) Boot up some nodes, they should hit sidero-controller-manager. If you check the logs you should see it connecting and being sent firmware, and action on the node.

Accept a server.

Info here: https://www.sidero.dev/docs/v0.3/getting-started/import-machines/ The server should appear in the 'servers' listing in kubectl (metal.sidero.dev/server in lens crd) as 'accepted=false, allocated=false, clean = false' This will give you a yaml with the servesr features - useful if you are using flux to manage it (i.e. kubectl server xxx -o yaml > server.yaml and put in your git repo as a server)

You have to manually accept the servers before sidero will consider it as the owner. Once you accept it will wipe it and install the requested talos version on it, ready for commisioning. If the server isn't required by a cluster, it will just bootloop until sidero allocates it a job.

If you use flux the server you can just add accepted: true to the yaml (spec.accepted=true) and push it, once flux reconciles sidero will see the server is now accepted and will wipe ALL DRIVES, reboot and install

If you dont have IPMI this is a manual reboot step

Once it finishes this, kubectl get servers should report it as accepted=true, clean = true and allocated=false (if you havent setup a cluster yet)

Create a workload cluster

Ref: https://www.sidero.dev/docs/v0.3/getting-started/create-workload/

I stole others yaml and run flux at this point, but basically we need to tell sidero what machine config we want. It will then chose servers that accomodate that need and get them to join your cluster, bootstrap etcd, etc. You can generate the yaml with a clusterctl command and the appropriate env set as the above link shows.

For homelab, we tend to create labels to set machines to be master/worker which can then be picked up in the serverclasses like below:

Master

---
apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
  name: masters
spec:

...

  selector:
    matchLabels:
      master: "true"

Pi-masters (Useful to seperate these serverclasses as arm64 and amd64 quite different.)

---
apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
  name: pi-masters
  namespace: default
spec:

...

  selector:
    matchLabels:
      pi-master: "true"

Worker:

---
apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
  name: workers
spec:

...


  selector:
    matchExpressions:
      - key: master
        operator: NotIn
        values:
          - "true"
      - key: pi-master
        operator: NotIn
        values:
          - "true"

Check out other config/yaml

If you look at our gits you can see generally the below Check out the sidero docs under 'Resource configuraiton' for the details.

Clusters

Contains the yaml to define a cluster as we want it - CIDR, control plan, # masters/workers etc. Can be generated by clusterctl command as noted in docs.

Environments

IPXE environments that define how sidero handles booting. Contains the initramfs & vmlinuz verisons/github links sidero downloads to serve. One will be configured by default, but you will need to create one for arm64 for raspi (as raspi needs ar64 compiled boots) Note, also contains a http link talos.config that needs to be set to your control plane's url.

Serverclassesa

Here we define how we want the servers to be setup. We can define master/worker/pi-worker so they are configures seperately.

Few things commonly done:

Add gracefulshutdown labels to the nodes
Recplace flannel with calico/cilium/etc
deploy kube-serving-cert
ntp settings, eth0 settings, etc

Servers

Here we have the actual server definitions. Sidero will generate these if they dont exist when a server connects the first time. We tend to grab these and put them in our flux config, so that when we bootstrap the server details are set immediatly, including the 'accepted' flag.

Grab the yaml by outputting the yaml from kubectl

❯ kubectl get servers a19dd13e-6889-e146-ec82-1c697aa8a3bc -o yaml
apiVersion: metal.sidero.dev/v1alpha1
kind: Server
metadata:
  creationTimestamp: "2022-02-15T09:32:01Z"
  finalizers:
  - storage.finalizers.server.k8s.io
  generation: 1
  name: a19dd13e-6889-e146-ec82-1c697aa8a3bc
  resourceVersion: "3150179"
  uid: 253702d0-1778-4b11-8632-f10513b88c28
spec:
  accepted: false
  cpu:
    manufacturer: Intel(R) Corporation
    version: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  hostname: 10.8.20.44
  system:
    family: PA
    manufacturer: Intel(R) Client Systems
    productName: NUC11PAHi7
    serialNumber: G6PA125005Z0
    skuNumber: RNUC11PAHi7000
    version: M15513-304
status:
  addresses:
  - address: 10.8.20.44
    type: InternalIP
  power: "on"

Get cluster kubeconfig from a running sidero cluster

Get talosconfig

This step only once sidero cluster is running.

To be able to interact with the new cluster you need to grab below, as your sidero cluster has your provisioned clusters details.

kubectl get talosconfig \
  -l cluster.x-k8s.io/cluster-name=cluster-0 \
  -o yaml -o jsonpath='{.items[0].status.talosConfig}' > management-plane-talosconfig.yaml

Get kubeconfig

I found the files came out with no control plane IP - I had to edit the file to add the control plane IP in it

# -- talosconfig - specify talosconfig to use
# -- kubeconfig - asks api for the kubeconfig for the cluster
# -n <ip> - replace with your masters IP.  most talosctl commands need the node IP to run against specified.
talosctl --talosconfig management-plane-talosconfig.yaml kubeconfig -n 192.168.2.143

Setting up a node on a rpi4 (or arm64 device)

Deep breath first, this is harder, for reasons described here: https://www.sidero.dev/docs/v0.4/guides/rpi4-as-servers/

TLDR: Raspi needs to network boot twice - once to download firwmare toget its BIOS to do anything, then that firmware tells it what its boot settings are (and we set it to net-booth in advance). So it does two netboots before it even hits sidero.

Others in the k8s-at-home community have put in their clusters a better method than what is outlined here.
Check out :

TLDR:

Flash a sd for the rpi with EEPROM image https://github.com/raspberrypi/rpi-eeprom/releases
Boot rpi and setup firmware to network boot first, and ensure you save your changes. This will actually save into the RPI-EFI.fd file on the sd card (including the pi's serial number, meaning you have to make this file for every bloody pi)
Copy this file into your git, in a foldername with the serial number. If you setup like ours, the dockerfile will create a docker with the folders/firmware per pi in it.
Below is then how to (manually) patch with a path.yaml to run our docker as a initcontainer to sidero - it will just copy the individual rpi firmware into the sidero conatiner, so it can send the appropriate firmware per PI based off its serial number.

Patch sidero-controller for raspi

Assuming the above is done, you can create a patch file, sample of mine below.

 # patch.yaml
 ---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sidero-controller-manager
  namespace: sidero-system
spec:
  template:
    spec:
      volumes:
        - name: tftp-folder
          emptyDir: {}
      initContainers:
        - image: ghcr.io/truxnell/container-images/raspberrypi4-uefi:latest
          imagePullPolicy: Always
          name: tftp-folder-setup
          command:
            - cp
          args:
            - -r
            - /tftp
            - /var/lib/sidero/
          volumeMounts:
            - mountPath: /var/lib/sidero/tftp
              name: tftp-folder
      containers:
        - name: manager
          volumeMounts:
            - mountPath: /var/lib/sidero/tftp
              name: tftp-folder

Then apply the patch to sidero-controller-manager.


# Scale sidero-controller-manager deployment to 0 first!
kubectl -n sidero-system scale deployment sidero-controller-manager --replicas 0
kubectl -n sidero-system patch deployments.apps sidero-controller-manager --patch "$(cat k8s/manifests/sidero-system/sidero-controller-manager/patch.yaml)"
kubectl -n sidero-system scale deployment sidero-controller-manager --replicas 1

Boot Pi and pray

Remove server from cluster

Pause cluster reconciliation

This stops new servesr being added while you do maintenance

kubectl edit cluster
 
# add paused:true to spec

Remove server

kubectl get machine                                       
NAME                 CLUSTER     AGE   PROVIDERID                                      PHASE     VERSION
cluster-0-cp-c7h7n   cluster-0   8h    sidero://5923cf63-8129-8eac-68ce-1c697a611cfd   Running   v1.21.5

kubectl delete server cluster-0-cp-c7h7n

This should trigger a remote-wipe. talosctl -n x.x.x.x reset may also help if you get stuck

Update sidero

Update sidero via clusterctl upgrade plan you may need to reset the SIDERO_CONTROLLER_MANAGER... variables again to ensure it retains host-network, else you wont be able to connect to the ipxe again.

> clusterctl upgrade plan
Checking cert-manager version...
Cert-Manager is already up to date

Checking new release availability...

Latest release available for the v1alpha4 API Version of Cluster API (contract):

NAME                    NAMESPACE       TYPE                     CURRENT VERSION   NEXT VERSION
bootstrap-talos         cabpt-system    BootstrapProvider        v0.4.3            Already up to date
control-plane-talos     cacppt-system   ControlPlaneProvider     v0.3.1            Already up to date
cluster-api             capi-system     CoreProvider             v0.4.7            Already up to date
infrastructure-sidero   sidero-system   InfrastructureProvider   v0.4.1            Already up to date

You are already up to date!


Latest release available for the v1beta1 API Version of Cluster API (contract):

NAME                    NAMESPACE       TYPE                     CURRENT VERSION   NEXT VERSION
bootstrap-talos         cabpt-system    BootstrapProvider        v0.4.3            v0.5.2
control-plane-talos     cacppt-system   ControlPlaneProvider     v0.3.1            v0.4.4
cluster-api             capi-system     CoreProvider             v0.4.7            v1.1.1
infrastructure-sidero   sidero-system   InfrastructureProvider   v0.4.1            v0.5.0

You can now apply the upgrade by executing the following command:

clusterctl upgrade apply --contract v1beta1

then apply as stated (in this case, clusterctl upgrade apply --contract v1beta1)

truxnell/sidero_guide.md