checkout cilium repo and run it in kind
git clone https://github.com/cilium/cilium.git
cd cilium
REPO_ROOT=$PWD
KUBEPROXY_MODE="none" make kind
make kind-image
make kind-install-cilium
if we want to change some options (kubeproxy replacement and pprof on) we can change the values during the installation
cilium install \
--chart-directory=${ROOT_DIR}/install/kubernetes/cilium \
--helm-values=${ROOT_DIR}/contrib/testing/kind-values.yaml \
--version= \
>/dev/null 2>&1 &
or after install
# https://docs.cilium.io/en/v1.13/configuration/
kubectl edit configmap cilium-config -n kube-system
# set pprof: "true"
# set kube-proxy-replacement: strict
cilium config also allow to set values, it is important to know that those are not automatically applied, and the pods needs to be restarted
kubectl rollout restart daemonset cilium -n kube-system
We may want to see metrics for our tests, install the monitoring.yaml that scrapes the agents metrics ports and exposes prometheus console in a service NodePort
kubectl get nodes -o wide
# get nodeport to open prometheus
kubectl get monitoring services
If we want to obtain a pprof, after enabling the option on the config, forward the port 6060 in the pod and obtain the dump
kubectl -n kube-system get pods -o wide | grep worker
cilium-f78xr 1/1 Running 0 135m 192.168.10.2 kind-worker <none> <none>
cilium-operator-5946647599-src9x 1/1 Running 0 43h 192.168.10.2 kind-worker <none> <none>
kubectl -n kube-system port-forward pod/cilium-f78xr 6060:6060
Forwarding from 127.0.0.1:6060 -> 6060
Forwarding from [::1]:6060 -> 6060
curl "http://127.0.0.1:6060/debug/pprof/profile?seconds=15" > cpu.pprof
With this setup we can start micro-benchmarking the agent, in the sense that we can create API objects and observe the agent behavior.
Per example, we can stress the agent to create a 200 pods with a parallelism of 50 and obtain a pprof of the CPU and memory
kubectl apply -f stress-pod.yaml
curl "http://127.0.0.1:6060/debug/pprof/heap?seconds=15" > mem.pprof
curl "http://127.0.0.1:6060/debug/pprof/profile?seconds=15" > cpu.pprof
From the graphs we can see that there is a lot of CPU time and memory used to process the conversion to cmaps, also a considerable amount on time spent in the GC
Another useful tests are repetitive operations for a long time for detecting memory leaks
This is just as simple as creating a loop doing the same operation for a long time, and observing the graph the memory consumed by the agent
while true; kubectl run test --image busybox -- sleep 1; sleep 1; kubectl delete pod test; done