The AWS EKS team works extremely hard. We appreciate all of their effort.
But the aws-vpc-cni
requires fine-tuning of complex settings, and:
- Limits the number of pods you can run on an EC2, based on the number of ENIs that instance size (or type) can support. Pod density is valuable.
- Requires you to play with settings like
WARM_ENI_TARGET
,WARM_IP_TARGET
,WARM_PREFIX_TARGET
, etc... - Runs into conditions where Pods get stuck in "Creating," since IP management gets tricky based on cluster pod churn, and aws-vpc-cni...and
ENABLE_PREFIX_DELEGATION
+ branching can lead to a lot of wasted IPs
If you've ever sat there and watched your cluster or pods "get stuck" because of a failure to assign an IP address, you know the pain of this.
Cilium's vxlan
overlay approach obviates these problems completely.
- Trade-off: You won't be able to use "Pod Security Groups" with this implementation
- Trade-off: Another downside of this approach is that you can no longer create services of type LoadBalancer with the aws loadbalancer controller of type NLB in IP mode. In other words, NLBs can no longer direct traffic directly to your pods, but can only send traffic to instances that are listening on a NodePort. This means that you can not use the Pod readiness gates of the controller, and you can no longer guarantee 0-downtime deployments/upgrades of your ingress pods/nodes. If you're okay with a few dropped requests, then great. If not, then think twice! --> thanks to /u/DPRegular
- Benefit: No more "stuck pods" or IP starvation, ever
- Benefit: No more pod density/max pods limitations on your nodes - you can safeuly use t3/t4 micro, small, etc. with your autoscaler of choice. WE RECOMMEND KARPENTER!
Place cilium-provisioner.tf, dynamic-cilium-values.tpl, cilium-provisioner.tf
in the same folder as your EKS terraform module.
You can also use _cilium-provisioner.sh
+ dynamic-cilium-values.tpl
without terraform. Just read the instructions in the script, rename dynamic-cilium-values.tpl
to .yaml, and hard-code your values.
Read this, and make sure you have all the necessary port/firewall configs in place: https://docs.cilium.io/en/v1.9/operations/system_requirements/#firewall-rules
This assumes:
- You're using the EKS module for terraform
- You're using Linux Kernel 5.15.x on your AL2/BottleRocket/Ubuntu nodes
- Your EKS
Service IPv4 range
-->10.100.0.0/16
- Your EKS cluster endpoint is accessible 🫠
- You're using the latest cilium-cli release
- Worker nodes are in private subnets, and have NAT gateways setup on the VPC
For your nodegroups, you need to find some standard way to tag your EC2s (the installation script relies on this fact).
This is because you have to flush IPTables on any existing/running nodes in your cluster that were using the default aws-vpc-cni
plugin, after it is disabled and cilium is installed.
The installation script retrieves those instance IDs by tag, autoamtically, and then flushes IPTables on those nodes by using aws ssm send-command
.
You'll probably want to modify/tweak this to fit your setup.
instance_ids=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=*${CLUSTER_NAME}-base-compute*" "Name=instance-state-name,Values=running" --query "Reservations[].Instances[].InstanceId" --output text)
In our case, we tag our "base" nodes (those which are not auto-scaled with karpenter
) with this pattern:
nameOfCluster-base-compute-*
--> example EC2 name: v3-qa-1-base-compute-1
You will see on line 59 of the _cilium-provisioner.sh
script that we use this tag to identify the existing nodes that we have just installed cilium on.
This is an extremely important step. Don't skip it.
At startup, all of your nodes will need to be tainted with:
- key: node.cilium.io/agent-not-ready
value: "true"
effect: NoExecute
- READ Karpenter docs regarding taints: https://karpenter.sh/preview/concepts/provisioners/#cilium-startup-taint
- READ Cilium docs regarding taints: https://docs.cilium.io/en/stable/installation/taints/#considerations-on-node-pool-taints-and-unmanaged-pods
- Also see the "EKS" installation instructions tab here: https://docs.cilium.io/en/stable/installation/taints/#considerations-on-node-pool-taints-and-unmanaged-pods
- This particular configuration doesn't include L7 traffic shaping or loadbalancing
- This particular configuration doesn't rely on an egress gateway, although our testing showed that using cilium's egress gateway implementation also works (we tend to avoid any additional complexity wherever possible)
- This particular configuration is ready for use with cilium's magical cluster mesh networking
- This particular configuration does not rely on cilium's ingress gateway (we use aws-load-balancer-controller for that)
Make sure that:
- karpenter -->
hostNetwork: true
- aws-load-balancer-controller -->
hostNetwork: true
- metrics-server -->
hostNetwork.enabled: true
...otherwise, they won't work.
We only install the following "EKS addons" generically:
- kube-proxy
- coreDNS
- aws-vpc-cni
We also install the following from their most recent helm charts, and not as addons, since "addon" versions gave us issues:
- aws-efs-csi-driver
- aws-ebs-csi-driver
- external-dns
- aws-load-balancer-controller
To undo the changes made by cilium and completely uninstall it, just run:
cilium uninstall
This will restore the aws-vpc-cni
plugin.
After removing cilium, you'll need to make sure you restart all your pods and/or revert any changes to your helm values.yaml
for any of the aforementioned services. You'll definitely want to make sure you restart kube-proxy and coreDNS as well.