root
access is required- make sure you are using a sufficiently new
tcpdump
version that supports the-T
flag. This is the version I used for testing:
root@ada-ec-1:/home/ada# tcpdump --version
tcpdump version 4.99.1
libpcap version 1.10.1 (with TPACKET_V3)
OpenSSL 3.0.2 15 Mar 2022
Calico CNI uses VXLAN transport between nodes, encapsulating traffic over UDP port 4789.
There are 3 significant hops on each Kubernetes host:
- the Pod's internal interface
- the
vxlan.calico
bridged interface on the host - the primary interface on the host (in my case
ens4
)
The objective is to use tcpdump
at each hop on each node, filtering for a specific kind of traffic for brevity, as a means to prove that traffic is being handled correctly at each hop.
A tcpdump
process will be started for each hop, and then the Admin Console will be accessed through the browser as a means to generate traffic that can be viewed from each hop perspective.
First, install a 3 node Embedded Cluster. A KOTS application is not required, but helps for creating traffic in the cluster. The KOTS Admin Console is useful in this regard, it will generate enough traffic.
Identify the kotsadm
pod as well as the coredns
pods:
root@ada-ec-1:/home/ada# kubectl get po -A -o wide | grep -E 'coredns|kotsadm'
kotsadm kotsadm-58f859745b-x4nh8 1/1 Running 0 83m 10.244.114.75 ada-ec-1 <none> <none>
kotsadm kotsadm-rqlite-0 1/1 Running 0 84m 10.244.114.74 ada-ec-1 <none> <none>
kotsadm kurl-proxy-kotsadm-6549dc76b4-wtfjf 1/1 Running 1 (84m ago) 84m 10.244.114.73 ada-ec-1 <none> <none>
kube-system coredns-86b5dd7f68-9zdn8 1/1 Running 0 69m 10.244.56.129 ada-ec-2 <none> <none>
kube-system coredns-86b5dd7f68-kp2bk 1/1 Running 0 69m 10.244.106.65 ada-ec-3 <none> <none>
Notice:
kotsadm-58f859745b-x4nh8
andcoredns-xxxxx
pods are all assigned to different nodes. This is what we want for our test procedure. KOTS will generate DNS requests which will be received by the CoreDNS pods - we would expect to see DNS traffic moving between all 3 pods. We choose to look for DNS traffic because it is essential for the cluster to function, as well as being conveniently small enough to read raw packet data. HTTP(S) packets are much larger and require additional tooling likewireshark
to trace effectively.
If the pods in your cluster are not scheduled on different nodes, you can delete one of the pods that co-exists on the same node and wait for the scheduler to start a new pod.
Further Notice: almost any application can be used for the purpose of this test instead of kotsadm. kotsadm and CoreDNS have been chosen because they are pre-installed in an Embedded Cluster. For instance, you could deploy a DaemonSet like the following, based on the
nicolaka/netshoot
image, in order to ensure that there are pods on each node. Then, identify the specific netshoot pod that is no co-located with CoreDNS. The netshoot image provides a number of network troubleshooting tools, and for the purpose of this test usingkubectl exec
to enter a netshoot pod and issuenslookup
requests would also generate DNS traffic that would be visible in the tcpdump of CoreDNS. The goldpinger application is another good candidate. Deploying Goldpinger will create a DaemonSet, and the pods will continuously contact one another, thus creating traffic that would be visible in tcpdump.
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: netshoot
labels:
app: netshoot
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: netshoot
template:
metadata:
labels:
app: netshoot
spec:
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: netshoot
image: "docker.io/nicolaka/netshoot"
imagePullPolicy: Always
securityContext:
privileged: true
tty: true
stdin: true
Focusing on pod kotsadm-58f859745b-x4nh8
, first, we must identify the Calico VNI.
Each node will have a route in the network routing table to each pod IP that is scheduled on that node, respectively.
With the pod IP 10.244.114.75
, identify the Calico VNI that matches that IP from the routing table:
ada@ada-ec-1:/home/ada# ip route | grep 10.244.114.75
10.244.114.75 dev cali9d8b6a7d855 scope link
With the Calico VNI cali9d8b6a7d855
we can build the first tcpdump process.
Execute, replacing <Calico VNI>
with your VNI for the kotsadm pod:
sudo tcpdump -lttttnnvv -i <Calico VNI> port 53 -w $(hostname)-pod.pcap --print
This will start dumping DNS traffic to the screen and write them to a file using the machine's hostname as an identifier.
Next, locate the vxlan.calico
bridge device:
ada@ada-ec-1:/home/ada# ip link | grep calico
3: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
Start a tcpdump process on the vxlan.calico
device. The vxlan.calico
device name is standardized, so do not expect this to change in your cluster:
sudo tcpdump -lttttnnvv -i vxlan.calico -w $(hostname)-vxlan-calico.pcap --print
Packets will begin to dump to the screen as well as being written to file. Since filtering inner packets of VXLAN requires manually computing header offsets, this filter is broad and may show more than just DNS traffic.
Finally, capture traffic at the primary network interface on the host.
Identify the primary interface from ip route
:
ada@ada-ec-1:~$ ip route
default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.72 metric 100
10.128.0.1 dev ens4 proto dhcp scope link src 10.128.0.72 metric 100
10.244.56.128/26 via 10.244.56.128 dev vxlan.calico onlink
10.244.106.64/26 via 10.244.106.64 dev vxlan.calico onlink
blackhole 10.244.114.64/26 proto 80
10.244.114.65 dev calif4d82a88506 scope link
10.244.114.67 dev cali4f86469bbe0 scope link
10.244.114.68 dev cali225fb7d9728 scope link
10.244.114.69 dev cali545655219e6 scope link
10.244.114.73 dev cali8aa1a704abc scope link
10.244.114.74 dev cali8a9c5dc2e9d scope link
10.244.114.75 dev cali9d8b6a7d855 scope link
169.254.169.254 via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.72 metric 100
Note the device that is being used as the default gateway - this is the interface that traffic will use to reach the other nodes of the cluster:
default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.72 metric 100
NOTICE: If your machine has multiple NICs or is attached to multiple VLANs, you will need to identify the interface that your host uses for cluster networking - it may not necessarily be the default gateway!
In my case the device name is ens4
.
Set up a third tcpdump process listening for only VXLAN traffic on this interface, replacing <primary interface>
with your device name:
sudo tcpdump -l -ttttnvvn -i <primary interface> -T vxlan port 4789 -w $(hostname)-primary-interface.pcap --print
Packets will be printed to screen as well as to file. Again, as with the vxlan.calico
interface, we will capture more than just DNS traffic since the proper filter is very difficult to write correctly.
SSH to Node 2. On node 2, we have identified one of the coredns
pods to target:
ada@ada-ec-2:/home/ada# kubectl get po -A -o wide | grep -E 'coredns|kotsadm' | grep ec-2
kube-system coredns-86b5dd7f68-9zdn8 1/1 Running 0 69m 10.244.56.129 ada-ec-2 <none> <none>
Repeat all three steps from Node 1, substituting the coredns
pod for the Pod Capture step:
ada@ada-ec-2:~$ ip route | grep 10.244.56.129
10.244.56.129 dev cali2b903875fe4 scope link
ada@ada-ec-2:~$ sudo tcpdump -lttttnnvv -i cali2b903875fe4 port 53 -w $(hostname)-pod.pcap --print
ada@ada-ec-2:~$ sudo tcpdump -lttttnnvv -i vxlan.calico -w $(hostname)-vxlan-calico.pcap --print
ada@ada-ec-2:~$ ip route
default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.95 metric 100
10.128.0.1 dev ens4 proto dhcp scope link src 10.128.0.95 metric 100
blackhole 10.244.56.128/26 proto 80
10.244.56.129 dev cali2b903875fe4 scope link
10.244.56.131 dev cali03d11f1090d scope link
10.244.56.133 dev cali899b732ead4 scope link
10.244.106.64/26 via 10.244.106.64 dev vxlan.calico onlink
10.244.114.64/26 via 10.244.114.64 dev vxlan.calico onlink
169.254.169.254 via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.95 metric 100
ada@ada-ec-2:~$ sudo tcpdump -lttttnvvn -i ens4 -T vxlan port 4789 -w $(hostname)-primary-interface.pcap --print
SSH to Node 3. Again, repeat all three steps from Node 2 and Node 1, substituting the final coredns
pod for the Pod Capture step:
ada@ada-ec-3:~$ ip route | grep 10.244.106.65
10.244.106.65 dev cali895d469e349 scope link
ada@ada-ec-3:~$ sudo tcpdump -lttttnnvv -i cali895d469e349 port 53 -w $(hostname)-pod.pcap --print
ada@ada-ec-3:~$ sudo tcpdump -lttttnnvv -i vxlan.calico -w $(hostname)-vxlan-calico.pcap --print
ada@ada-ec-3:~$ ip route
default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.96 metric 100
10.128.0.1 dev ens4 proto dhcp scope link src 10.128.0.96 metric 100
10.244.56.128/26 via 10.244.56.128 dev vxlan.calico onlink
blackhole 10.244.106.64/26 proto 80
10.244.106.65 dev cali895d469e349 scope link
10.244.106.67 dev calib457edb7c3e scope link
10.244.106.68 dev calidd95cfb3b52 scope link
10.244.114.64/26 via 10.244.114.64 dev vxlan.calico onlink
169.254.169.254 via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.96 metric 100
ada@ada-ec-3:~$ sudo tcpdump -lttttnnvv -i ens4 -T vxlan port 4789 -w $(hostname)-primary-interface.pcap --print
Bring up the Admin Console in the browser, and navigate through tabs, refresh the page, etc. in order to generate traffic in the cluster.
If you are using something other than kotsadm, a decent way to generate the right traffic is by using the nslookup
or dig
command in the shell to perform DNS requests:
nslookup kubernetes.default
dig kubernetes.default
nslookup kubernetes.default.svc.cluster.local
dig kubernetes.default.svc.cluster.local
Any valid Service name can be used here:
root@ada-ec-1:/home/ada# kubectl get services -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 162m
kotsadm kotsadm ClusterIP 10.99.230.140 <none> 3000/TCP 160m
kotsadm kotsadm-rqlite ClusterIP 10.97.116.105 <none> 4001/TCP 160m
kotsadm kotsadm-rqlite-headless ClusterIP None <none> 4001/TCP 160m
kotsadm kurl-proxy-kotsadm NodePort 10.101.80.185 <none> 8800:30000/TCP 160m
kotsadm postgres ClusterIP 10.105.80.214 <none> 5432/TCP 144m
kotsadm replicated ClusterIP 10.96.56.26 <none> 3000/TCP 144m
kotsadm slackernews ClusterIP 10.106.0.218 <none> 3000/TCP 144m
kotsadm slackernews-nginx NodePort 10.102.10.158 <none> 443:443/TCP 144m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 162m
kube-system metrics-server ClusterIP 10.99.252.80 <none> 443/TCP 162m
nslookup kotsadm.kotsadm.svc.cluster.local
dig slackernews.kotsadm.svc.cluster.local
nslookup kube-dns.kube-system.svc.cluster.local
dig metrics-servier.kube-system.svc.cluster.local
From the 9 tcpdump processes that are running, we can trace the traffic from the origin pod -> vxlan bridge -> host -> destination host -> vxlan bridge -> pod. If any hop is not serving traffic, it should be immediately obvious by the fact that packets are not being printed to the screen. We can further analyze an individual packet to see how it's being handled:
# kotsadm pod capture
2024-10-09 22:08:48.793105 IP (tos 0x0, ttl 64, id 33878, offset 0, flags [DF], proto UDP (17), length 97)
10.244.114.75.41102 > 10.96.0.10.53: [bad udp cksum 0x8807 -> 0x8cdd!] 3321+ [1au] AAAA? kotsadm-rqlite.kotsadm.svc.cluster.local. ar: . OPT UDPsize=1232 (69)
2024-10-09 22:08:48.793112 IP (tos 0x0, ttl 64, id 41852, offset 0, flags [DF], proto UDP (17), length 97)
10.244.114.75.39862 > 10.96.0.10.53: [bad udp cksum 0x8807 -> 0x9bc4!] 773+ [1au] A? kotsadm-rqlite.kotsadm.svc.cluster.local. ar: . OPT UDPsize=1232 (69)
2024-10-09 22:08:48.793628 IP (tos 0x0, ttl 62, id 65403, offset 0, flags [DF], proto UDP (17), length 190)
10.96.0.10.53 > 10.244.114.75.41102: [udp sum ok] 3321*- q: AAAA? kotsadm-rqlite.kotsadm.svc.cluster.local. 0/1/1 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1728502735 7200 1800 86400 30 ar: . OPT UDPsize=1232 (162)
2024-10-09 22:08:48.793693 IP (tos 0x0, ttl 62, id 65404, offset 0, flags [DF], proto UDP (17), length 153)
10.96.0.10.53 > 10.244.114.75.39862: [udp sum ok] 773*- q: A? kotsadm-rqlite.kotsadm.svc.cluster.local. 1/0/1 kotsadm-rqlite.kotsadm.svc.cluster.local. A 10.97.116.105 ar: . OPT UDPsize=1232 (125)
From this packet, we can see 2 requests being made, one A request and one AAAA request - both requests are answered by the DNS service. The origin of the request is 10.244.114.75.41102 > 10.96.0.10.53
which we know is the IP of the kotsadm pod from a previous step:
root@ada-ec-1:/home/ada# kubectl get po -A -o wide | grep -E 'coredns|kotsadm'
kotsadm kotsadm-58f859745b-x4nh8 1/1 Running 0 83m 10.244.114.75 ada-ec-1 <none> <none>
We know that the destination IP is the CoreDNS service, as well:
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 162m
And we can confirm a response from CoreDNS from the return trip:
2024-10-09 22:08:48.793693 IP (tos 0x0, ttl 62, id 65404, offset 0, flags [DF], proto UDP (17), length 153)
10.96.0.10.53 > 10.244.114.75.39862: [udp sum ok] 773*- q: A? kotsadm-rqlite.kotsadm.svc.cluster.local. 1/0/1 kotsadm-rqlite.kotsadm.svc.cluster.local. A 10.97.116.105 ar: . OPT UDPsize=1232 (125)
Here we see the responding IP: 10.96.0.10.53 > 10.244.114.75.39862
. A response from the CoreDNS Service indicates success.
Also observe from the VXLAN bridge captures, there are regular updates to/from port 10250, which is the Kubelet process port - this would also imply that nodes are able to communicate with each other over the container network:
2024-10-09 23:52:04.177274 IP (tos 0x0, ttl 64, id 8855, offset 0, flags [none], proto UDP (17), length 102)
10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56928, offset 0, flags [DF], proto TCP (6), length 52)
10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0xa33a (correct), seq 18754, ack 218896, win 9687, options [nop,nop,TS val 1045132544 ecr 3731670274], length 0
2024-10-09 23:52:05.136048 IP (tos 0x0, ttl 64, id 9384, offset 0, flags [none], proto UDP (17), length 143)
10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56929, offset 0, flags [DF], proto TCP (6), length 93)
10.244.106.64.19922 > 10.244.114.65.10250: Flags [P.], cksum 0x04ee (correct), seq 18754:18795, ack 218896, win 9687, options [nop,nop,TS val 1045133502 ecr 3731670274], length 41
2024-10-09 23:52:05.137453 IP (tos 0x0, ttl 64, id 39582, offset 0, flags [none], proto UDP (17), length 189)
10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23722, offset 0, flags [DF], proto TCP (6), length 139)
10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0x86f7 (correct), seq 218896:218983, ack 18795, win 494, options [nop,nop,TS val 3731671275 ecr 1045133502], length 87
2024-10-09 23:52:05.137515 IP (tos 0x0, ttl 64, id 39583, offset 0, flags [none], proto UDP (17), length 1232)
10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23723, offset 0, flags [DF], proto TCP (6), length 1182)
10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0x71b6 (correct), seq 218983:220113, ack 18795, win 494, options [nop,nop,TS val 3731671275 ecr 1045133502], length 1130
2024-10-09 23:52:05.137655 IP (tos 0x0, ttl 64, id 9385, offset 0, flags [none], proto UDP (17), length 102)
10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56930, offset 0, flags [DF], proto TCP (6), length 52)
10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x9b11 (correct), seq 18795, ack 218983, win 9687, options [nop,nop,TS val 1045133504 ecr 3731671275], length 0
2024-10-09 23:52:05.137694 IP (tos 0x0, ttl 64, id 9386, offset 0, flags [none], proto UDP (17), length 102)
10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56931, offset 0, flags [DF], proto TCP (6), length 52)
10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x9691 (correct), seq 18795, ack 220113, win 9709, options [nop,nop,TS val 1045133504 ecr 3731671275], length 0
2024-10-09 23:52:05.490628 IP (tos 0x0, ttl 64, id 9686, offset 0, flags [none], proto UDP (17), length 142)
10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56932, offset 0, flags [DF], proto TCP (6), length 92)
10.244.106.64.19922 > 10.244.114.65.10250: Flags [P.], cksum 0x7af8 (correct), seq 18795:18835, ack 220113, win 9709, options [nop,nop,TS val 1045133857 ecr 3731671275], length 40
2024-10-09 23:52:05.492010 IP (tos 0x0, ttl 64, id 39761, offset 0, flags [none], proto UDP (17), length 170)
10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23724, offset 0, flags [DF], proto TCP (6), length 120)
10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0xb1ce (correct), seq 220113:220181, ack 18835, win 494, options [nop,nop,TS val 3731671629 ecr 1045133857], length 68
2024-10-09 23:52:05.492056 IP (tos 0x0, ttl 64, id 39762, offset 0, flags [none], proto UDP (17), length 954)
10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23725, offset 0, flags [DF], proto TCP (6), length 904)
10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0xba2a (correct), seq 220181:221033, ack 18835, win 494, options [nop,nop,TS val 3731671629 ecr 1045133857], length 852
2024-10-09 23:52:05.492188 IP (tos 0x0, ttl 64, id 9687, offset 0, flags [none], proto UDP (17), length 102)
10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56933, offset 0, flags [DF], proto TCP (6), length 52)
10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x9361 (correct), seq 18835, ack 220181, win 9709, options [nop,nop,TS val 1045133858 ecr 3731671629], length 0
2024-10-09 23:52:05.492213 IP (tos 0x0, ttl 64, id 9688, offset 0, flags [none], proto UDP (17), length 102)
10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56934, offset 0, flags [DF], proto TCP (6), length 52)
10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x8ff6 (correct), seq 18835, ack 221033, win 9731, options [nop,nop,TS val 1045133859 ecr 3731671629], length 0