Based on:
- https://docs.cilium.io/en/latest/gettingstarted/k8s-install-eks/
- https://docs.cilium.io/en/latest/gettingstarted/kubeproxy-free/#nodeport-xdp-acceleration
Currently does not work with Cilium 1.8.0-rc3 due to cilium/cilium#12078
Thus use cilium:latest
for now.
Spin up EKS cluster using eksctl
:
$ eksctl create cluster --name test-cluster --without-nodegroup
$ kubectl -n kube-system delete daemonset aws-node
Install Cilium in order to be able to create node group:
$ helm install cilium ./cilium \
--namespace kube-system \
--set global.eni=true \
--set config.ipam=eni \
--set global.egressMasqueradeInterfaces=eth0 \
--set global.tunnel=disabled \
--set global.nodeinit.enabled=true
Create a node group. We need SSH access in order to be able to install a newer kernel and additional tools. We also need to select a larger instance type in order to be able to run XDP on the provided ENA device. Use nodegroup-config.yaml for convenience:
$ cat nodegroup-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: test-cluster
region: us-west-2
nodeGroups:
- name: ng-1
instanceType: m5n.xlarge
desiredCapacity: 2
ssh:
allow: true
$ eksctl create nodegroup -f nodegroup-config.yaml
Get external IP of nodes:
$ EXT_IPS=$(kubectl get no -o jsonpath='{$.items[*].status.addresses[?(@.type=="ExternalIP")].address }{"\n"}' | tr ' ' '\n')
Install a newer kernel on each node:
bash:
$ for ip in $EXT_IPS ; do ssh ec2-user@$ip "sudo amazon-linux-extras install -y kernel-ng && sudo reboot"; done
zsh:
$ for ip in ${(f)EXT_IPS} ; do ssh ec2-user@$ip "sudo amazon-linux-extras install -y kernel-ng && sudo reboot"; done
Wait until nodes come back up and check their kernel version (should say 5.4.38-17.76.amzn2.x86_64
or similar):
bash:
$ for ip in $EXT_IPS ; do ssh ec2-user@$ip "uname -r"; done
zsh:
$ for ip in ${(f)EXT_IPS} ; do ssh ec2-user@$ip "uname -r"; done
Install ethtool
on nodes in order to be able to configure ENA NIC queues:
bash:
$ for ip in $EXT_IPS ; do ssh ec2-user@$ip "sudo yum install -y ethtool"; done
zsh:
$ for ip in ${(f)EXT_IPS} ; do ssh ec2-user@$ip "sudo yum install -y ethtool"; done
Since there was a bug regarding XDP in earlier versions of the ena
driver, make sure to use at
least version 2.2.8:
bash:
$ for ip in $EXT_IPS ; do ssh ec2-user@$ip "sudo ethtool -i eth0"; done
zsh:
$ for ip in ${(f)EXT_IPS} ; do ssh ec2-user@$ip "sudo ethtool -i eth0"; done
If the reported driver version is < 2.2.8, a new version of the ena
driver needs to be built and
installed according to the instructions given
here.
For kernel-ng
version 5.4.38-17.76.amzn2.x86_64
a prebuilt rpm of ena
driver version 2.2.9g
can be dowloaded
here.
Set MTU on the netdevs (default is 9001, for XDP the ENA driver allows max. 3818):
bash:
$ for ip in $EXT_IPS ; do ssh ec2-user@$ip "sudo ip link set dev eth0 mtu 3818"; done
zsh:
$ for ip in ${(f)EXT_IPS} ; do ssh ec2-user@$ip "sudo ip link set dev eth0 mtu 3818"; done
Get the current NIC queues configured on the netdevs.
bash:
$ for ip in $EXT_IPS ; do ssh ec2-user@$ip "sudo ethtool -l eth0"; done
zsh:
$ for ip in ${(f)EXT_IPS} ; do ssh ec2-user@$ip "sudo ethtool -l eth0"; done
The output is expected to look something like this:
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 0
Combined: 4
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 4
Set NIC queues on the netdevs to be able to use XDP (use 1/2 of the value given for Combined
above):
bash:
$ for ip in $EXT_IPS ; do ssh ec2-user@$ip "sudo ethtool -L eth0 combined 2"; done
zsh:
$ for ip in ${(f)EXT_IPS} ; do ssh ec2-user@$ip "sudo ethtool -L eth0 combined 2"; done
Get the K8S API server IP and port:
$ export API_SERVER_IP=$(kubectl get ep kubernetes -o jsonpath='{$.subsets[0].addresses[0].ip}')
$ export API_SERVER_PORT=443
Delete kube-proxy daemonset:
$ kubectl -n kube-system delete daemonset kube-proxy
Configure Cilium for kube-proxy free mode with NodePort XDP acceleration:
$ helm upgrade cilium ./cilium \
--namespace kube-system \
--reuse-values \
--set global.autoDirectNodeRoutes=true \
--set global.kubeProxyReplacement=strict \
--set global.nodePort.acceleration=native \
--set global.nodePort.mode=snat \
--set global.k8sServiceHost=$API_SERVER_IP \
--set global.k8sServicePort=$API_SERVER_PORT
Restart the Cilium daemonset:
$ kubectl rollout restart -n kube-system ds/cilium