Skip to content

Instantly share code, notes, and snippets.

@taljacob2
Last active May 11, 2025 12:11
Show Gist options
  • Save taljacob2/6f081c412e42fd4cf6ebccbbad4b3485 to your computer and use it in GitHub Desktop.
Save taljacob2/6f081c412e42fd4cf6ebccbbad4b3485 to your computer and use it in GitHub Desktop.
Install Kubeadm Kubernetes With Cilium CNI On Bare Metal Rocky Linux

Install Kubeadm With Cilium CNI

We assume you have RockyLinux as the operating system for all of your nodes.

At the end you will setup out of the box:

In this example, we will setup a 2 node cluster:

  • 10.0.0.98 as a Control Plane Node
  • 10.0.0.97 as a Worker Node

Setup Nodes

A node can have one of the two roles:

  • Control Plane
  • Worker

For each node you should do the following steps.

Set Hostnames

A node hostname should be k8s-control-plane-XX or k8s-worker-XX depending on the node role you want it to be.

  • For control plane nodes:

    • k8s-control-plane-01
    • k8s-control-plane-02
    • k8s-control-plane-03
    • and so on...
  • For worker nodes:

    • k8s-worker-01
    • k8s-worker-02
    • k8s-worker-03
    • and so on...

Depending on the node role you want it to be, run the corresponding command:

sudo hostnamectl set-hostname “k8s-control-plane-01” && exec bash

or

sudo hostnamectl set-hostname “k8s-worker-01” && exec bash

Edit /etc/hosts

Edit the /etc/hosts file to the public IPs of the nodes, so the nodes could distinguish each other by their hostnames.

You should setup all the nodes' hostnames and ips in each node's /etc/hosts file.

So in this example:

10.0.0.98 k8s-control-plane-01
10.0.0.97 k8s-worker-01

Configure DNS Records In /etc/resolv.conf

Configure DNS, to make sure there is not a self loop - we must target another DNS than ourselves:

Disable Automatic Generation Of /etc/resolv.conf From NetworkManager

Create /etc/NetworkManager/conf.d/90-dns-none.conf file:

sudoedit /etc/NetworkManager/conf.d/90-dns-none.conf

with the following content:

[main]
dns=none

Then reload NetworkManager:

sudo systemctl reload NetworkManager

Now your custom edits for /etc/resolv.conf will remain even after sudo systemctl restart NetworkManager (or a reboot)

Only on the control-plane nodes - Configure DNS wildcard A Record from the machine to itself

This way the machine can resolve its own wildcard DNS names that you will configure with Gateway API or Ingress.

The /etc/hosts file cannot configure wildcard records.

To overcome this issue, we should use dnsmasq.

See https://stackoverflow.com/a/20446931

sudo dnf install -y dnsmasq

Edit the dnsmasq configurations:

sudo vim /etc/dnsmasq.conf

Make sure you have the following settings. Turn them on, and set them to these values.

  • The listen-address should be 127.0.0.1 and your Public IP, in our case 10.0.0.98.

  • The example.com should be the name of your domain name. So if you want the wildcard record of *.example.com your domain name should be set to example.com, and the IP there should be the Public IP.

domain-needed
bogus-priv
interface=lo
bind-interfaces
listen-address=127.0.0.1,10.0.0.98
cache-size=1000
address=/example.com/10.0.0.98
resolv-file=/etc/resolv.dnsmasq
no-poll

now create /etc/resolv.dnsmasq:

sudo vim /etc/resolv.dnsmasq

Add one or two of the nameservers to google public DNS, so our DNS resolver won't endless loop to itself:

nameserver 8.8.8.8
nameserver 8.8.4.4

So it may look something like this:

In this example:

  • The top row is for our local DNS to resolve the LAN.
  • The second row is for google DNS.
nameserver 2a06:c701:ffff::1
nameserver 8.8.8.8

Save the file.

Enable the service:

sudo systemctl enable --now dnsmasq.service

Verify that the service is running on port 53 as expected:

sudo netstat -ltnp | grep :53

You should see the following output:

tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      469974/dnsmasq      
tcp        0      0 10.0.0.98:53            0.0.0.0:*               LISTEN      469974/dnsmasq      
tcp6       0      0 ::1:53                  :::*                    LISTEN      469974/dnsmasq    

On each node (control-planes and workers), set the control-planes as nameservers:

now edit /etc/resolv.conf:

ATTENTION: in the /etc/resolv.conf file you can have only up to 3 nameservers. This is a known issue in K8s.

sudo vim /etc/resolv.conf

with the following content: Set the nameserver to the Public IP of a control-plane:

nameserver 10.0.0.98
options edns0

Then run:

sudo systemctl daemon-reload
sudo systemctl restart NetworkManager

Verify that the changes remained:

cat /etc/resolv.conf

verify that you can resolve a wildcard record for your domain:

assuming your domain is example.com:

nslookup blablabla.example.com

should resolve successfully.

Disable SWAP

sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

Set SELinux Mode As Permissive

sudo setenforce 0
sudo sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/sysconfig/selinux

Setup Firewall Services

Create the following firewall services:

cat << EOF | sudo tee /etc/firewalld/services/vxlan.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>VXLAN</short>
  <description>Virtual Extensible LAN (VXLAN) service</description>
  <port protocol="udp" port="4789"/>
</service>
EOF
cat << EOF | sudo tee /etc/firewalld/services/k8s-control-plane-node.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>Kubernetes Control-Plane Node</short>
  <description>Kubernetes Control-Plane Node services and other required ports</description>
  <include service="kube-control-plane"/>
  <include service="kube-control-plane-secure"/>
  <include service="kube-api"/>
  <include service="bgp"/>
  <include service="vxlan"/>
  <include service="http"/>
  <include service="https"/>
  <include service="dns"/>
</service>
EOF
cat << EOF | sudo tee /etc/firewalld/services/k8s-worker-node.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>Kubernetes Worker Node</short>
  <description>Kubernetes Worker Node services for node communication and other required ports</description>
  <include service="kube-worker"/>
  <include service="bgp"/>
  <include service="vxlan"/>
  <include service="http"/>
  <include service="https"/>
  <include service="dns"/>
</service>
EOF
cat << EOF | sudo tee /etc/firewalld/services/cilium-common-node.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>cilium-common-node</short>
  <description>Cilium common ports for health checks, monitoring, and networking</description>
  
  <!-- Cluster Health Checks -->
  <port protocol="tcp" port="4240"/>
  
  <!-- Hubble Server -->
  <port protocol="tcp" port="4244"/>
  
  <!-- Hubble Relay -->
  <port protocol="tcp" port="4245"/>
  
  <!-- Mutual Authentication Port -->
  <port protocol="tcp" port="4250"/>
  
  <!-- Spire Agent Health Check -->
  <port protocol="tcp" port="4251"/>
  
  <!-- cilium-agent pprof Server -->
  <port protocol="tcp" port="6060"/>
  
  <!-- cilium-operator pprof Server -->
  <port protocol="tcp" port="6061"/>
  
  <!-- Hubble Relay pprof Server -->
  <port protocol="tcp" port="6062"/>
  
  <!-- cilium-envoy Health Listener -->
  <port protocol="tcp" port="9878"/>
  
  <!-- cilium-agent Health Status API -->
  <port protocol="tcp" port="9879"/>
  
  <!-- cilium-agent gops Server -->
  <port protocol="tcp" port="9890"/>
  
  <!-- Operator gops Server -->
  <port protocol="tcp" port="9891"/>
  
  <!-- Hubble Relay gops Server -->
  <port protocol="tcp" port="9893"/>
  
  <!-- cilium-envoy Admin API -->
  <port protocol="tcp" port="9901"/>
  
  <!-- cilium-agent Prometheus Metrics -->
  <port protocol="tcp" port="9962"/>
  
  <!-- cilium-operator Prometheus Metrics -->
  <port protocol="tcp" port="9963"/>
  
  <!-- cilium-envoy Prometheus Metrics -->
  <port protocol="tcp" port="9964"/>
  
  <!-- WireGuard Encryption Tunnel Endpoint -->
  <port protocol="udp" port="51871"/>
</service>
EOF
cat << EOF | sudo tee /etc/firewalld/services/cilium-worker-node.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>cilium-worker-node</short>
  <description>Cilium worker ports for VXLAN overlay, health checks, and etcd access</description>
  
  <!-- VXLAN Overlay -->
  <port protocol="udp" port="8472"/>
  
  <!-- Health Checks -->
  <port protocol="tcp" port="4240"/>
  
  <!-- etcd Access -->
  <port protocol="tcp" port="2379"/>
  <port protocol="tcp" port="2380"/>
  
  <!-- ICMP Health Checks -->
  <protocol value="icmp"/>
</service>
EOF
cat << EOF | sudo tee /etc/firewalld/services/cilium-control-plane-node.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>cilium-control-plane-node</short>
  <description>Cilium control-plane ports for VXLAN overlay, etcd access, and health checks</description>
  
  <!-- etcd Access -->
  <port protocol="tcp" port="2379"/>
  <port protocol="tcp" port="2380"/>
  
  <!-- VXLAN Overlay -->
  <port protocol="udp" port="8472"/>
  
  <!-- Health Checks -->
  <port protocol="tcp" port="4240"/>
  
  <!-- ICMP Health Checks -->
  <protocol value="icmp"/>
</service>
EOF

Apply the firewall services:

  • If this node is a control plane node, run:

    sudo firewall-cmd --zone=public --add-service=k8s-control-plane-node --permanent 
    sudo firewall-cmd --zone=public --add-service=cilium-control-plane-node --permanent
    sudo firewall-cmd --zone=public --add-service=cilium-common-node --permanent
    sudo firewall-cmd --zone=public --add-service=cilium-worker-node --permanent
    sudo firewall-cmd --reload
    sudo firewall-cmd --zone=public --list-services
    

    If you want this control plane node, to also serve as a worker node for running pods, then also run:

    sudo firewall-cmd --zone=public --add-service=k8s-worker-node --permanent
    sudo firewall-cmd --reload
    sudo firewall-cmd --zone=public --list-services
    
  • If this node is a worker node, run:

    sudo firewall-cmd --zone=public --add-service=k8s-worker-node --permanent
    sudo firewall-cmd --zone=public --add-service=cilium-common-node --permanent
    sudo firewall-cmd --zone=public --add-service=cilium-worker-node --permanent
    sudo firewall-cmd --reload
    sudo firewall-cmd --zone=public --list-services
    

NOTE:

In case you sudo firewall-cmd --permanent --add-service=... and you can't see the service in the output of sudo firewall-cmd --zone=public --list-services, then you should sudo reboot the machine and try again, and it will work correctly.

Add Kernel Modules & Parameters

sudo tee /etc/modules-load.d/containerd.conf <<EOF
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter

Add kernel parameters:

sudo vi /etc/sysctl.d/k8s.conf

Set the folowing content to the file:

net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1

Save the file, and then apply with:

sudo sysctl --system
sudo dnf config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo
sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo systemctl --now enable docker

Give the current user permissions to run docker commands:

sudo usermod -a -G docker $(whoami)

NOTE:

To be assigned the new group, you must log out and in again. Check with the id command to verify that the group has been added:

exit
ssh user@password
id | grep docker

Configure Containerd

Configure containerd settings to use systemd cgroup (because RockyLinux uses systemd by default):

containerd config default | sudo tee /etc/containerd/config.toml >/dev/null 2>&1
sudo sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable containerd

To verify conatinerd service status, run:

sudo systemctl status containerd

Install CNI (Control Plane Interface) Plugins:

curl -LO https://github.com/opencontainers/runc/releases/download/v1.2.2/runc.amd64
sudo install -m 755 runc.amd64 /usr/local/sbin/runc
rm runc.amd64
curl -LO https://github.com/containernetworking/plugins/releases/download/v1.6.1/cni-plugins-linux-amd64-v1.6.1.tgz
sudo mkdir -p /opt/cni/bin
sudo tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.6.1.tgz
rm cni-plugins-linux-amd64-v1.6.1.tgz

Install kubeadm, kubelet, kubectl, kustomize

In this example we set the version to 1.32. Change it if you want to.

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

Enable autocomplete for kubeadm, system wide:

kubeadm completion bash | sudo tee /etc/bash_completion.d/kubeadm > /dev/null
sudo chmod a+r /etc/bash_completion.d/kubeadm
bash

Enable autocomplete for kubectl, system wide:

kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl > /dev/null
sudo chmod a+r /etc/bash_completion.d/kubectl
bash

Enable k alias for kubectl, system wide:

cat << EOF | sudo tee -a /etc/profile.d/alias.sh
alias k=kubectl
complete -o default -F __start_kubectl k
EOF

Enable color for kubectl, with kubecolor:

wget https://github.com/kubecolor/kubecolor/releases/download/v0.4.0/kubecolor_0.4.0_linux_amd64.tar.gz
tar xvf kubecolor_0.4.0_linux_amd64.tar.gz
sudo install -o root -g root -m 0755 kubecolor /usr/local/bin/kubecolor
cat << EOF | sudo tee -a /etc/profile.d/alias.sh
alias kubectl=kubecolor
EOF
rm kubecolor
rm kubecolor_0.4.0_linux_amd64.tar.gz 
rm LICENSE 
rm README.md 

Install Kustomize by downloading the binary:

curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
sudo install -o root -g root -m 0755 kustomize /usr/local/bin/kustomize
rm kustomize

Enable kustomize autocompletion

kustomize completion bash | sudo tee /etc/bash_completion.d/kustomize
sudo chmod a+r /etc/bash_completion.d/kustomize
bash

Enable now kubelet service:

sudo systemctl enable --now kubelet

Create The Cluster

ATTENTION:

For a clean installation, make sure you remove any leftovers of a previous installation of kubeadm or a CNI that may exist.

In each node of the cluster, run:

sudo systemctl stop kubelet
sudo systemctl stop containerd.service 
sudo kubeadm reset -f
sudo rm -fdr /etc/cni/net.d
sudo rm -fdr $HOME/.kube
sudo reboot # to clear the additional network adapters in `ip a`

Navigate to the first control plane node.

State the pod CIDR you want. Note that each node will always have 255 pods (/24 mask) so set a mask larger than 24. for example 16 is good, so there will be 255 nodes with 255 pods. In this example we set the pod CIDR to 10.10.0.0/16.

You can optionally set the service CIDR via the --service-cidr option. For reordering puposes, we set it to 10.11.0.0/16.

WARNING: YOU MUST MAKE SURE THAT THE POD AND SERVICE CIDRS YOU PICK ARE NOT RELATED IN ANY WAY TO THE SUBNET OF YOUR NODE MACHINES, OR ELSE, YOU WILL NOT BE ABLE TO CONNECT TO YOUR MACHINES ANYMORE(!!!) UNTIL YOU WILL UNINSTALL THE CLUSTER WITH sudo kubeadm reset -f ON ALL THE NODES. The pod CIDR and service CIDR should not be real network being used. They should be entirely free and unused networks.

sudo kubeadm init --pod-network-cidr=10.10.0.0/16 --service-cidr 10.11.0.0/16

Configure kubectl to track the cluster:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

IMPORTANT:

By default, the cluster will not schedule pods on the control plane nodes for security reasons. If you want to be able to schedule pods on the control plane nodes, for example for a single machine Kubernetes cluster, run:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

The output will look something like:

node "k8s-control-plane-01" untainted ...

This will remove the node-role.kubernetes.io/control-plane:NoSchedule taint from any nodes that have it, including the control plane nodes, meaning that the scheduler will then be able to schedule pods everywhere.

OPTIONAL: Join Nodes To The Cluster

In the sudo kubeadm init output from eariler, you will also see a command to join nodes to your cluster.

You can join a node to the cluster with the given command outputted when creating the cluster in the previous step.

(Don't forget using sudo in the commad)

It will be something like this:

sudo kubeadm join k8s-control-plane-01:6443 --token 69s57o.3muk7ey0j0zknw69 \
  --discovery-token-ca-cert-hash sha256:8000dff8e803e2bf687f3dae80b4bc1376e5bd770e7a752a3c9fa314de6449fe

You can also generate the token later, by running the following command from the control-plane node:

sudo kubeadm token create --print-join-command

OPTIONAL: You can label the node with a role:

kubectl label nodes k8s-worker-01 node-role.kubernetes.io/<role>=

so now you can see its role with

kubectl get nodes k8s-worker-01

OPTIONAL: Configure kubectl To The Cluster From The New Node

From the new node, the kubectl command will not connect to the cluster by default, and this is intentional!

The reason for that is it is by default for users to manage the cluster from a control-plane node, and not a worker node.

In case you do want to use kubectl in the new node whether it is a control plane node or a worker node, you could do this by downloading the /etc/kubernetes/admin.conf file from the first control-plane node (which was copied to /home/tal/.kube/config in this example) to the new node:

sftp tal@k8s-control-plane-01:/home/tal/.kube/config .

Configure kubectl:

mkdir -p $HOME/.kube
sudo cp -i config $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Verify it worked by making some kubectl commands:

kubectl cluster-info
kubectl get ns

Setup Helm & Gatway CRDs & Cilium CNI

Navigate back to the first control plane node.

You should see the connected nodes via:

kubectl get nodes

and they all should be in STATUS NotReady.

You also should see that the pods of coredns are in Pending STATUS:

kubectl get pods -n kube-system

This is because no CNI is installed yet, and the nodes are in taint of NoSchedule.

To make the nodes in Ready STATUS, you should install the CNI manifests, and this will also remove the taint on the nodes.

In this example we will install Cilium CNI via a Helm chart.

Setup Helm

Install Helm:

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Enable helm autocompletion:

helm completion bash | sudo tee /etc/bash_completion.d/helm > /dev/null
sudo chmod a+r /etc/bash_completion.d/helm
bash

Setup Cilium Helm Repository:

helm repo add cilium https://helm.cilium.io/

Fetch for all repos updates if you have already set it up:

helm repo update

Install Cilium CLI (OPTIONAL But Helpful)

Download Cilium CLI:

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

Add Cilium autocomplete for bash:

cilium completion bash | sudo tee /etc/bash_completion.d/cilium > /dev/null
sudo chmod a+r /etc/bash_completion.d/cilium
bash

Install Gateway CRDs

We also would like to use Kubernetes Gateway API, so we need to install the Kubernetes Gateway CRDs. In this example we used version 1.2.0.

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_gateways.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_referencegrants.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_grpcroutes.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/experimental/gateway.networking.k8s.io_tlsroutes.yaml

Install Cilium CNI

NOTE: set the --set operator.replicas=<NUMBER> to the number of nodes you plan of having. If you're not sure, set the NUMBER to 1.

NOTE:

In this installation we also enable monitoring with:

  • Prometheus And Grafana
  • Hubble UI

In this example we used version 1.16.5.

helm install cilium cilium/cilium --version 1.16.5 --namespace kube-system --set ipam.mode=kubernetes --set ipam.operator.clusterPoolIPv4PodCIDRList=10.10.0.0/16 --set prometheus.enabled=true --set operator.prometheus.enabled=true --set hubble.enabled=true --set hubble.metrics.enableOpenMetrics=true --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" --set gatewayAPI.enabled=true --set kubeProxyReplacement=true --set gatewayAPI.hostNetwork.enabled=true --set operator.replicas=1 --set envoy.securityContext.capabilities.keepCapNetBindService=true --set envoy.securityContext.capabilities.envoy="{NET_BIND_SERVICE,NET_ADMIN,SYS_ADMIN}" --set loadBalancer.l7.backend=envoy --set externalIPs.enabled=true --set envoy.enabled=true --set bpf.masquerade=true --set debug.enabled=true --set debug.verbose=flow --set hubble.relay.enabled=true --set hubble.ui.enabled=true

Then verify that all the pods are in Running STATUS (this may take a minute):

[tal@k8s-control-plane-01 monitoring]$ k get pods -A -w
NAMESPACE           NAME                                           READY   STATUS    RESTARTS       AGE
kube-system         cilium-envoy-pmvw9                             1/1     Running   0              9m11s
kube-system         cilium-lmx2h                                   1/1     Running   0              9m11s
kube-system         cilium-operator-799f498c8-csn8q                1/1     Running   0              9m11s
kube-system         coredns-668d6bf9bc-67vpv                       1/1     Running   1 (8m4s ago)   10m
kube-system         coredns-668d6bf9bc-tvnjm                       1/1     Running   0              10m
kube-system         etcd-k8s-control-plane-01                      1/1     Running   0              10m
kube-system         hubble-relay-5db68b98c9-9tkl4                  1/1     Running   0              9m11s
kube-system         hubble-ui-69d69b64cf-lm82d                     2/2     Running   0              9m11s
kube-system         kube-apiserver-k8s-control-plane-01            1/1     Running   0              10m
kube-system         kube-controller-manager-k8s-control-plane-01   1/1     Running   0              10m
kube-system         kube-proxy-t7krs                               1/1     Running   0              10m
kube-system         kube-scheduler-k8s-control-plane-01            1/1     Running   0              10m

(OPTIONAL) View Cilium status with the Cilum CLI:

[tal@k8s-control-plane-01 monitoring]$ cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    OK
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

DaemonSet              cilium             Desired: 1, Ready: 1/1, Available: 1/1
DaemonSet              cilium-envoy       Desired: 1, Ready: 1/1, Available: 1/1
Deployment             cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
Containers:            cilium             Running: 1
                       cilium-envoy       Running: 1
                       cilium-operator    Running: 1
                       hubble-relay       Running: 1
                       hubble-ui          Running: 1
Cluster Pods:          6/6 managed by Cilium
Helm chart version:    1.16.5
Image versions         cilium             quay.io/cilium/cilium:v1.16.5@sha256:758ca0793f5995bb938a2fa219dcce63dc0b3fa7fc4ce5cc851125281fb7361d: 1
                       cilium-envoy       quay.io/cilium/cilium-envoy:v1.30.8-1733837904-eaae5aca0fb988583e5617170a65ac5aa51c0aa8@sha256:709c08ade3d17d52da4ca2af33f431360ec26268d288d9a6cd1d98acc9a1dced: 1
                       cilium-operator    quay.io/cilium/operator-generic:v1.16.5@sha256:f7884848483bbcd7b1e0ccfd34ba4546f258b460cb4b7e2f06a1bcc96ef88039: 1
                       hubble-relay       quay.io/cilium/hubble-relay:v1.16.5@sha256:6cfae1d1afa566ba941f03d4d7e141feddd05260e5cd0a1509aba1890a45ef00: 1
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.13.1@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b: 1
                       hubble-ui          quay.io/cilium/hubble-ui:v0.13.1@sha256:e2e9313eb7caf64b0061d9da0efbdad59c6c461f6ca1752768942bfeda0796c6: 1

The cluster installation is done! 🎉

You can stop here or continue with the tutorial.

(VERY RECOMMENDED, BUT OPTIONAL) Install & Expose Grafana And Hubble UI Via HTTPRoute Gateway

Install Prometheus and Grafana

kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.16.5/examples/kubernetes/addons/prometheus/monitoring-example.yaml

Then verify that all the pods are in Running STATUS (this may take a minute):

[tal@k8s-control-plane-01 monitoring]$ k get pods -n cilium-monitoring -w
NAMESPACE           NAME                                   READY   STATUS    RESTARTS       AGE
cilium-monitoring   grafana-5c69859d9-dgnrf                1/1     Running   0              9m
cilium-monitoring   prometheus-6fc896bc5d-khvn8            1/1     Running   0              9m

Expose Grafana And Hubble UI

Create a gateway that opens port 80 on the first control plane 0.0.0.0:80 public IP (that port must be available in the machine. If that port is not available, you should pick another port instead), and expose Grafana & Hubble UI through that gateway on port 80 via HTTPRoutes:

Create the following manifests:

gateway.yaml (this gateway will serve for the whole cluster permanently for all future use)

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: gateway
  namespace: gateway
spec:
  gatewayClassName: cilium
  listeners:
  - protocol: HTTP
    port: 80
    name: http-1
    allowedRoutes:
      namespaces:
        from: All

httproute-hubble-ui.yaml:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: hubble-ui
  namespace: kube-system
spec:
  parentRefs:
  - name: gateway
    namespace: gateway
  hostnames:
  - "hubble-ui.cilium.rocks"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
      - name: hubble-ui
        kind: Service
        port: 8081

httproute-grafana.yaml

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: grafana
  namespace: cilium-monitoring
spec:
  parentRefs:
  - name: gateway
    namespace: gateway
  hostnames:
  - "grafana.cilium.rocks"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
      - name: grafana
        kind: Service
        port: 3000

IMPORTANT:

To expose a service trough the gateway, the service must be of type LoadBalancer. Thus, we replace the default ClusterIP services of Grafana and Hubble UI with LoadBalancer types.

service-hubble-ui.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: hubble-ui
    app.kubernetes.io/part-of: cilium
    k8s-app: hubble-ui
  name: hubble-ui
  namespace: kube-system
spec:
  ports:
  - name: port-1
    port: 8081
    protocol: TCP
    targetPort: 8081
  - name: port-2
    port: 8090
    protocol: TCP
    targetPort: 8090
  selector:
    k8s-app: hubble-ui
  type: LoadBalancer

service-grafana.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
    component: core
  name: grafana
  namespace: cilium-monitoring
spec:
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 3000
  selector:
    app: grafana
  type: LoadBalancer

To apply, first create the gateway namespace:

k create ns gateway

Now, apply all the files:

k apply -f gateway.yaml
k apply -f httproute-grafana.yaml
k apply -f httproute-hubble-ui.yaml
k apply -f service-hubble-ui.yaml
k apply -f service-grafana.yaml

Verify that port 80 is exposed on the machine by cilium-envoy:

sudo netstat -ltnp | grep ':80 '

You should see something like this:

tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      7283/cilium-envoy 

Add a StorageClass, PV and PVC to retain the storage of prometheus:

storage-class-prometheus.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cluster-prometheus
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true

pvc-prometheus.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cluster-prometheus
spec:
  storageClassName: cluster-prometheus
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

pv-prometheus.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: cluster-prometheus
spec:
  storageClassName: cluster-prometheus
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/var/lib/kubernetes/cluster-prometheus"
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s-control-plane-01
k apply -f storage-class-prometheus.yaml
k apply -f pvc-prometheus.yaml
k apply -f pv-prometheus.yaml

Patch Prometheus to persist its database in a Persistent Volume, and retain its database to 5 years, instead of the default 15 days:

kubectl patch deployment prometheus \
-n cilium-monitoring \
--patch '{
  "spec": {
    "template": {
      "spec": {
        "containers": [
          {
            "name": "prometheus",
            "args": [
              "--config.file=/etc/prometheus/prometheus.yaml",
              "--storage.tsdb.path=/prometheus/",
              "--log.level=info",
              "--enable-feature=exemplar-storage",
              "--storage.tsdb.retention.time=5y"
            ],
            "securityContext": {
              "allowPrivilegeEscalation": false,
              "runAsUser": 0
            },
           "volumeMounts": [
              {
                "mountPath": "/prometheus/",
                "name": "prometheus"
              }
            ]
          }
        ],
        "volumes": [
          {
            "name": "prometheus",
            "persistentVolumeClaim": {
              "claimName": "cluster-prometheus"
            }
          }
        ]
      }
    }
  }
}'

Edit Grafana configurations:

k edit -n cilium-monitoring cm grafana-config 

Make sure to set the following configurations:

    [session]
    # If you use session in https only, default is false
    cookie_secure = false

    [auth.anonymous]
    # enable anonymous access
    enabled = true

    # specify role for unauthenticated users
    org_role = Viewer

    [security]
    # set to true if you host Grafana behind HTTPS. default is false.
    cookie_secure = false

    # set cookie SameSite attribute. defaults to `lax`. can be set to "lax", "strict" and "none"
    cookie_samesite = strict

    [users]
    # disable user signup / registration
    allow_sign_up = false

    # Default role new users will be automatically assigned (if auto_assign_org above is set to true)
    auto_assign_org_role = Viewer

Apply the new configurations:

k rollout restart deployment -n cilium-monitoring grafana
k get pod -n cilium-monitoring -w

Add a StorageClass, PV and PVC to retain the storage of grafana:

storage-class-grafana.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: grafana
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true

pvc-grafana.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cluster-grafana
spec:
  storageClassName: cluster-grafana
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

pv-grafana.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: cluster-grafana
spec:
  storageClassName: cluster-grafana
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/var/lib/kubernetes/cluster-grafana"
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s-control-plane-01
k apply -f storage-class-grafana.yaml
k apply -f pvc-grafana.yaml
k apply -f pv-grafana.yaml

Patch Grafana to persist its database in a Persistent Volume:

kubectl patch deployment grafana \
-n cilium-monitoring \
--type='strategic' \
--patch '{
  "spec": {
    "template": {
      "spec": {
        "containers": [
          {
            "name": "grafana-core",
            "securityContext": {
              "allowPrivilegeEscalation": false,
              "runAsUser": 0
            },
            "volumeMounts": [
              {
                "mountPath": "/var/lib/grafana",
                "name": "grafana"
              }
            ]
          }
        ],
        "volumes": [
          {
            "name": "grafana",
            "persistentVolumeClaim": {
              "claimName": "cluster-grafana"
            }
          }
        ]
      }
    }
  }
}'

Testing The Exposure From A Client Machine

From a client machine

  • NOTE: To make a request to the control plane through the external NodePorts that is opened by the services, you must disable firewalld entirely on all the control plane nodes and all the worker nodes:

    sudo systemctl stop firewalld
    sudo systemctl disable firewalld
    

    then you could make a request to the NodePort to navigate to the service.

  • To make a request to the domain name stated in the HTTPRoutes for each of the services:

    In the client machine, add the kubernetes control plane node to the /etc/hosts of the machine, or add a control-plane as a nameserver in the /etc/resolv.conf of the machine, or add the control plane node to the local DNS of the machine, with the services domain names and the control plane public IP:

    So in our case:

    Add the following line to /etc/hosts:

    10.0.0.98 hubble-ui.cilium.rocks grafana.cilium.rocks
    

    Or, add a control-plane as a nameserver server to the client machine, by adding it to the /etc/resolv.conf of the client machine:

    nameserver 10.0.0.98
    

    Or in the local DNS of the machine, add a wildcard record for *.clilium.rocks to resolve to the control plane 10.0.0.98, so all the services (and future services) will be resolved to the control plane.

    Then, in the client machine you can test this with curl in the CLI:

    curl -H 'Host: hubble-ui.cilium.rocks' http://10.0.0.98:80
    
    curl -H 'Host: grafana.cilium.rocks' http://10.0.0.98:80
    

    or via the browser:

To Uninstall Cilium CNI

In the first control plane, run:

helm uninstall cilium -n kube-system

To Uninstall The Kubernetes Cluster

In each node of the cluster, run:

sudo systemctl stop kubelet
sudo systemctl stop containerd.service 
sudo kubeadm reset -f
sudo rm -fdr /etc/cni/net.d
sudo rm -fdr $HOME/.kube
sudo reboot # to clear the additional network adapters in `ip a`

To reinstall the cluster, run all the commands from the "Create The Cluster" section of this document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment