Skip to content

Instantly share code, notes, and snippets.

@bgulla
Last active November 13, 2024 21:11
Show Gist options
  • Save bgulla/7a6a72bdc5df6febb1e22dbc32f0ca4f to your computer and use it in GitHub Desktop.
Save bgulla/7a6a72bdc5df6febb1e22dbc32f0ca4f to your computer and use it in GitHub Desktop.
RKE2 api-server HA with Kube-VIP

On-Prem RKE2 api-server HA with Kube-VIP

               ,        ,  _______________________________
   ,-----------|'------'|  |                             |
  /.           '-'    |-'  |_____________________________|
 |/|             |    |    
   |   .________.'----'    _______________________________
   |  ||        |  ||      |                             |
   \__|'        \__|'      |_____________________________|

|‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾|
|________________________________________________________|

|‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾|
|________________________________________________________|

What does this accomplish?

On-premise Kubernetes installations are unable to take advantage of cloud-native services like dynamic load-balancers. In order to ensure highly-available clusters, one must deploy a solution that will enable the Kubernetes API-server to be accessible in the event of node failure. While traditionally this would be accomplished using an on-premise load-balancer such as k8s-deployed metal-lb/nginx, these solutions would not work our case because the api-scheduler would not be available to schedule such deployments... therefore, chicken and the egg.

What is Kube-VIP?

The kube-vip project provides High-Availability and load-balancing for both inside and outside a Kubernetes cluster

Learn more here

TLDR?

watch this video by Adrian

Instructions

Prereqs

In order to proceed with this guide, you will need the following:

  • DNS server or modification of /etc/hosts with the node hostnames and rke2 master HA hostname
  • firewalld turned off

Assumptions

In this guide, I will be setting up a 3-node HA RKE2 cluster. I use the .lol domain but swap out for the domain of your choosing.

Host Type IP Notes
rke2a VM 10.0.1.2 etcd
rke2b VM 10.0.1.3 etcd
rke2c VM 10.0.1.4 etcd
rke2master Virtual-IP 10.0.1.5 You will define this IP on your own. Make sure that it is not currently allocated to a node (and remove from DHCP allocation)

If you do not have a DNS server available/configured, the /etc/hosts file on each node will need to include the following.

rke2a 10.0.1.2
rke2b 10.0.1.3
rke2c 10.0.1.4
rke2master 10.0.1.5

1- Bootstrap the Master (rke2a)

The secret behind on-prem HA is kube-vip. We are going to modify their recommended approach with k3s and use it for rke2.

Pre-Installation of RKE2

export RKE2_VIP_IP=10.0.1.5 # IMPORTANT: Update this with the IP that you chose.

# create RKE2's self-installing manifest dir
mkdir -p /var/lib/rancher/rke2/server/manifests/

# Install the kube-vip deployment into rke2's self-installing manifest folder
curl -sL kube-vip.io/k3s |  vipAddress=${RKE2_VIP_IP} vipInterface=eth0 sh | sudo tee /var/lib/rancher/rke2/server/manifests/vip.yaml
# Find/Replace all k3s entries to represent rke2
sed -i 's/k3s/rke2/g' /var/lib/rancher/rke2/server/manifests/vip.yaml

# create the rke2 config file
mkdir -p /etc/rancher/rke2
touch /etc/rancher/rke2/config.yaml
echo "tls-san:" >> /etc/rancher/rke2/config.yaml 
echo "  - ${HOSTNAME}.lol" >> /etc/rancher/rke2/config.yaml
echo "  - ${HOSTNAME}" >> /etc/rancher/rke2/config.yaml
echo "  - rke2master.lol" >> /etc/rancher/rke2/config.yaml
echo "  - rke2master" >> /etc/rancher/rke2/config.yaml

## Optional but recommended
# k9s - ncurses-based k8s dashboard
wget https://github.com/derailed/k9s/releases/download/v0.24.2/k9s_Linux_x86_64.tar.gz -O /tmp/k9s.tgz ; cd /tmp; tar zxvf k9s.tgz ; chmod +x ./k9s; mv ./k9s /usr/local/bin

# update path with rke2-binaries
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc ; echo 'export PATH=${PATH}:/var/lib/rancher/rke2/bin' >> ~/.bashrc ; echo 'alias k=kubectl' >> ~/.bashrc ; source ~/.bashrc ;

Install RKE2:

curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service
sleep 90 #wait ~90 seconds for rke2 to be ready
kubectl get nodes -o wide # should show as ready

Testing the API server on the Virtual-IP (rke2master.lol)

In order for us to ensure that the virtual-ip is serving up the api-server, run the following commands:

mkdir -p $HOME/.kube
export VIP=rke2master
sudo cat /etc/rancher/rke2/rke2.yaml | sed 's/127.0.0.1/'$VIP'/g' > $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

KUBECONFIG=~/.kube/config kubectl get nodes -o wide

This will actually use the virtual-ip that kube-vip created for us at 10.0.1.5 and ensure that the rke2 api is successfully being served on that virtual-ip.

Adding additional RKE2 master nodes (rke2b, rke2c)

Well in order for our cluster to be highly available, naturally we are going to need to have additional masters.

On each of the additional masters, run the following commands:

# IMPORTANT: replace the following with the value of /var/lib/rancher/rke2/server/token from rke2a
export TOKEN=""

mkdir -p /etc/rancher/rke2
touch /etc/rancher/rke2/config.yaml
echo "token: ${TOKEN}" >> /etc/rancher/rke2/config.yaml
echo "server: https://rke2master.lol:9345" >> /etc/rancher/rke2/config.yaml
echo "tls-san:" >> /etc/rancher/rke2/config.yaml
echo "  - ${HOSTNAME}.lol" >> /etc/rancher/rke2/config.yaml
echo "  - ${HOSTNAME}" >> /etc/rancher/rke2/config.yaml
echo "  - rke2master.lol" >> /etc/rancher/rke2/config.yaml
echo "  - rke2master" >> /etc/rancher/rke2/config.yaml

curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service

That's it! As you can see in the config file above, we are actually referencing the virtual-ip/host as the rke2 server and not a host's specific ip/host as we want the reference to move with the availability of etcd.

@MrAmbiG
Copy link

MrAmbiG commented Jul 27, 2022

Hi thanks for this, if rke2a gone down ? what will kube-vip ?

I have 3 master nodes. As long as 1/3 master nodes are alive, your cluster will keep functioning with no downtime.
If rke2a is the master and if it goes down, then another master will be chosen from the other 2 remaining node and the floating ip is assigned to the newly elected master. When the rke2a comes back, the election process reoccurs to choose a new master. As long as there is 1 master node alive, kube-vip will keep the k8s cluster alive.

@esakarya
Copy link

esakarya commented Jul 27, 2022

perfect ! you made my day bro.
if i reinstall rke2a from strach, what i have to do ? i need to follow same rke2a steps or another node steps ?
also what difference exactly , server and agent nodes ?
we need any agent nodes ?

@MrAmbiG
Copy link

MrAmbiG commented Jul 27, 2022

perfect ! you made my day bro. if i reinstall rke2a from strach, what i have to do ? i need to follow same rke2a steps or another node steps ? also what difference exactly , server and agent nodes ? we need any agent nodes ?

Assuming that your rke2a (1st master node) fails (let us say hardware failure, motherboard fries or cpu dies or hard disk dies), the other 2 nodes will take over without any manual intervention. When you want to add the rke2a back, all you have to do is add it back as the new master node (the way you added 2nd, 3rd master node, that is you supply the token, server address, tls-san in the config file, as if you are joining the existing master cluster. I myself have not faced this issue but theoretically as per documentation, it should work.

@busyboy77
Copy link

I was getting Warning FailedCreate 81s (x16 over 4m5s) daemonset-controller Error creating: pods "kube-vip-ds-" is forbidden: error looking up service account kube-system/kube-vip: serviceaccount "kube-vip" not found
and had to deploy the RBAC using curl -s https://kube-vip.io/manifests/rbac.yaml > /var/lib/rancher/rke2/server/manifests/kube-vip-rbac.yaml and rebooted the first master.

@brandtkeller
Copy link

brandtkeller commented Dec 14, 2022

Coming across these notes and curious if anyone has faced the scenario where additional server nodes are unable to join the cluster due to a connection refused on the RKE2_VIP_IP at port 9345?

Edit for more clarity:

The connectivity through the VIP on port 6443 looks to work without issue. I can deploy my first server node and ensure DNS supports the hostname of the cluster via VIP. When I go to join additional nodes - they receive a connection refused for the VIP at port 9345. This guide would make me assume that this worked/s as intended here - but there is also the possibility something changed.

@itoffshore
Copy link

itoffshore commented Aug 18, 2024

Coming across these notes and curious if anyone has faced the scenario where additional server nodes are unable to join the cluster due to a connection refused on the RKE2_VIP_IP at port 9345?

Firewall ports for RKE2 with Cillium (for a Linux only k8s cluster - probably the best networking option) 👍

Protocol	    Source	            Destination	        Description
TCP	 2379	    RKE2 server nodes	RKE2 server nodes	etcd client port
TCP	 2380	    RKE2 server nodes	RKE2 server nodes	etcd peer port
TCP	 2381	    RKE2 server nodes	RKE2 server nodes	etcd metrics port

TCP	 6443	    RKE2 agent nodes	RKE2 server nodes	Kubernetes API
TCP	 9345	    RKE2 agent nodes	RKE2 server nodes	RKE2 supervisor API

TCP	 10250	      All RKE2 nodes    All RKE2 nodes	    kubelet metrics
TCP	 30000-32767  All RKE2 nodes    All RKE2 nodes	    NodePort port range
TCP	 4240	      All RKE2 nodes    All RKE2 nodes	    Cilium CNI health checks
TCP	 4244	      All RKE2 nodes    All RKE2 nodes	    Cilium CNI Hubble Observability
UDP      8472	      All RKE2 nodes    All RKE2 nodes	    Cilium CNI VXLAN
ICMP     8/0	      All RKE2 nodes    All RKE2 nodes	    Cilium CNI health checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment