Skip to content

Instantly share code, notes, and snippets.

@adamancini
Last active October 10, 2024 14:28
Show Gist options
  • Save adamancini/dd39592912ecbc1d6e33662968c0e70d to your computer and use it in GitHub Desktop.
Save adamancini/dd39592912ecbc1d6e33662968c0e70d to your computer and use it in GitHub Desktop.
Instrumented Calico VXLAN test procedure

Instrumented Calico VXLAN test procedure 

Prerequisites

  • root access is required
  • make sure you are using a sufficiently new tcpdump version that supports the -T flag.  This is the version I used for testing:
root@ada-ec-1:/home/ada# tcpdump --version
tcpdump version 4.99.1
libpcap version 1.10.1 (with TPACKET_V3)
OpenSSL 3.0.2 15 Mar 2022

Background

Calico CNI uses VXLAN transport between nodes, encapsulating traffic over UDP port 4789.

There are 3 significant hops on each Kubernetes host: 

  1. the Pod's internal interface 
  2. the vxlan.calico bridged interface on the host 
  3. the primary interface on the host (in my case ens4)

The objective is to use tcpdump at each hop on each node, filtering for a specific kind of traffic for brevity, as a means to prove that traffic is being handled correctly at each hop.

A tcpdump process will be started for each hop, and then the Admin Console will be accessed through the browser as a means to generate traffic that can be viewed from each hop perspective.

Procedure

First, install a 3 node Embedded Cluster.  A KOTS application is not required, but helps for creating traffic in the cluster.  The KOTS Admin Console is useful in this regard, it will generate enough traffic.

Identify the kotsadm pod as well as the coredns pods:


root@ada-ec-1:/home/ada# kubectl get po -A -o wide | grep -E 'coredns|kotsadm'
kotsadm            kotsadm-58f859745b-x4nh8                      1/1     Running            0             83m   10.244.114.75   ada-ec-1   <none>           <none>
kotsadm            kotsadm-rqlite-0                              1/1     Running            0             84m   10.244.114.74   ada-ec-1   <none>           <none>
kotsadm            kurl-proxy-kotsadm-6549dc76b4-wtfjf           1/1     Running            1 (84m ago)   84m   10.244.114.73   ada-ec-1   <none>           <none>
kube-system        coredns-86b5dd7f68-9zdn8                      1/1     Running            0             69m   10.244.56.129   ada-ec-2   <none>           <none>
kube-system        coredns-86b5dd7f68-kp2bk                      1/1     Running            0             69m   10.244.106.65   ada-ec-3   <none>           <none>

Notice:  kotsadm-58f859745b-x4nh8 and coredns-xxxxx pods are all assigned to different nodes.  This is what we want for our test procedure.  KOTS will generate DNS requests which will be received by the CoreDNS pods - we would expect to see DNS traffic moving between all 3 pods.  We choose to look for DNS traffic because it is essential for the cluster to function, as well as being conveniently small enough to read raw packet data.  HTTP(S) packets are much larger and require additional tooling like wireshark to trace effectively.

If the pods in your cluster are not scheduled on different nodes, you can delete one of the pods that co-exists on the same node and wait for the scheduler to start a new pod.

Further Notice: almost any application can be used for the purpose of this test instead of kotsadm. kotsadm and CoreDNS have been chosen because they are pre-installed in an Embedded Cluster. For instance, you could deploy a DaemonSet like the following, based on the nicolaka/netshoot image, in order to ensure that there are pods on each node. Then, identify the specific netshoot pod that is no co-located with CoreDNS. The netshoot image provides a number of network troubleshooting tools, and for the purpose of this test using kubectl exec to enter a netshoot pod and issue nslookup requests would also generate DNS traffic that would be visible in the tcpdump of CoreDNS. The goldpinger application is another good candidate. Deploying Goldpinger will create a DaemonSet, and the pods will continuously contact one another, thus creating traffic that would be visible in tcpdump.

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: netshoot
  labels:
    app: netshoot
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: netshoot
  template:
    metadata:
      labels:
        app: netshoot
    spec:
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/disk-pressure
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/memory-pressure
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/pid-pressure
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/unschedulable
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/network-unavailable
        operator: Exists
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      containers:
        - name: netshoot
          image: "docker.io/nicolaka/netshoot"
          imagePullPolicy: Always
          securityContext:
            privileged: true
          tty: true
          stdin: true

Node 1

Pod capture

Focusing on pod kotsadm-58f859745b-x4nh8, first, we must identify the Calico VNI.

Each node will have a route in the network routing table to each pod IP that is scheduled on that node, respectively.

With the pod IP 10.244.114.75, identify the Calico VNI that matches that IP from the routing table:

ada@ada-ec-1:/home/ada# ip route | grep 10.244.114.75
10.244.114.75 dev cali9d8b6a7d855 scope link

With the Calico VNI cali9d8b6a7d855 we can build the first tcpdump process.

Execute, replacing <Calico VNI> with your VNI for the kotsadm pod:

sudo tcpdump -lttttnnvv -i <Calico VNI> port 53 -w $(hostname)-pod.pcap --print

This will start dumping DNS traffic to the screen and write them to a file using the machine's hostname as an identifier.

VXLAN Bridge capture

Next, locate the vxlan.calico bridge device:

ada@ada-ec-1:/home/ada# ip link | grep calico
3: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000

Start a tcpdump process on the vxlan.calico device.  The vxlan.calico device name is standardized, so do not expect this to change in your cluster:

sudo tcpdump -lttttnnvv -i vxlan.calico -w $(hostname)-vxlan-calico.pcap --print

Packets will begin to dump to the screen as well as being written to file.  Since filtering inner packets of VXLAN requires manually computing header offsets, this filter is broad and may show more than just DNS traffic.

Host primary interface capture

Finally, capture traffic at the primary network interface on the host. Identify the primary interface from ip route:

ada@ada-ec-1:~$ ip route
default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.72 metric 100
10.128.0.1 dev ens4 proto dhcp scope link src 10.128.0.72 metric 100
10.244.56.128/26 via 10.244.56.128 dev vxlan.calico onlink
10.244.106.64/26 via 10.244.106.64 dev vxlan.calico onlink
blackhole 10.244.114.64/26 proto 80
10.244.114.65 dev calif4d82a88506 scope link
10.244.114.67 dev cali4f86469bbe0 scope link
10.244.114.68 dev cali225fb7d9728 scope link
10.244.114.69 dev cali545655219e6 scope link
10.244.114.73 dev cali8aa1a704abc scope link
10.244.114.74 dev cali8a9c5dc2e9d scope link
10.244.114.75 dev cali9d8b6a7d855 scope link
169.254.169.254 via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.72 metric 100

Note the device that is being used as the default gateway - this is the interface that traffic will use to reach the other nodes of the cluster:

default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.72 metric 100

NOTICE:  If your machine has multiple NICs or is attached to multiple VLANs, you will need to identify the interface that your host uses for cluster networking - it may not necessarily be the default gateway!

In my case the device name is ens4.

Set up a third tcpdump process listening for only VXLAN traffic on this interface, replacing <primary interface> with your device name:

sudo tcpdump -l -ttttnvvn -i <primary interface> -T vxlan port 4789 -w $(hostname)-primary-interface.pcap --print

Packets will be printed to screen as well as to file.  Again, as with the vxlan.calico interface, we will capture more than just DNS traffic since the proper filter is very difficult to write correctly.

Node 2

SSH to Node 2.  On node 2, we have identified one of the coredns pods to target:

ada@ada-ec-2:/home/ada# kubectl get po -A -o wide | grep -E 'coredns|kotsadm' | grep ec-2
kube-system        coredns-86b5dd7f68-9zdn8                      1/1     Running            0             69m   10.244.56.129   ada-ec-2   <none>           <none>

Repeat all three steps from Node 1, substituting the coredns pod for the Pod Capture step:

Pod capture
ada@ada-ec-2:~$ ip route | grep 10.244.56.129
10.244.56.129 dev cali2b903875fe4 scope link

ada@ada-ec-2:~$ sudo tcpdump -lttttnnvv -i cali2b903875fe4 port 53 -w $(hostname)-pod.pcap --print
VXLAN Bridge capture
ada@ada-ec-2:~$ sudo tcpdump -lttttnnvv -i vxlan.calico -w $(hostname)-vxlan-calico.pcap --print
Host primary interface capture
ada@ada-ec-2:~$ ip route
default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.95 metric 100
10.128.0.1 dev ens4 proto dhcp scope link src 10.128.0.95 metric 100
blackhole 10.244.56.128/26 proto 80
10.244.56.129 dev cali2b903875fe4 scope link
10.244.56.131 dev cali03d11f1090d scope link
10.244.56.133 dev cali899b732ead4 scope link
10.244.106.64/26 via 10.244.106.64 dev vxlan.calico onlink
10.244.114.64/26 via 10.244.114.64 dev vxlan.calico onlink
169.254.169.254 via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.95 metric 100

ada@ada-ec-2:~$ sudo tcpdump -lttttnvvn -i ens4 -T vxlan port 4789 -w $(hostname)-primary-interface.pcap --print

Node 3

SSH to Node 3.  Again, repeat all three steps from Node 2 and Node 1, substituting the final coredns pod for the Pod Capture step:

Pod capture
ada@ada-ec-3:~$ ip route | grep 10.244.106.65
10.244.106.65 dev cali895d469e349 scope link

ada@ada-ec-3:~$ sudo tcpdump -lttttnnvv -i cali895d469e349 port 53 -w $(hostname)-pod.pcap --print
VXLAN Bridge capture
ada@ada-ec-3:~$ sudo tcpdump -lttttnnvv -i vxlan.calico -w $(hostname)-vxlan-calico.pcap --print
Host primary interface capture
ada@ada-ec-3:~$ ip route
default via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.96 metric 100
10.128.0.1 dev ens4 proto dhcp scope link src 10.128.0.96 metric 100
10.244.56.128/26 via 10.244.56.128 dev vxlan.calico onlink
blackhole 10.244.106.64/26 proto 80
10.244.106.65 dev cali895d469e349 scope link
10.244.106.67 dev calib457edb7c3e scope link
10.244.106.68 dev calidd95cfb3b52 scope link
10.244.114.64/26 via 10.244.114.64 dev vxlan.calico onlink
169.254.169.254 via 10.128.0.1 dev ens4 proto dhcp src 10.128.0.96 metric 100

ada@ada-ec-3:~$ sudo tcpdump -lttttnnvv -i ens4 -T vxlan port 4789 -w $(hostname)-primary-interface.pcap --print

Generate traffic

From the Admin Console

Bring up the Admin Console in the browser, and navigate through tabs, refresh the page, etc. in order to generate traffic in the cluster.

If you are using something other than kotsadm, a decent way to generate the right traffic is by using the nslookup or dig command in the shell to perform DNS requests:

nslookup kubernetes.default
dig kubernetes.default

nslookup kubernetes.default.svc.cluster.local
dig kubernetes.default.svc.cluster.local

Any valid Service name can be used here:

root@ada-ec-1:/home/ada# kubectl get services -A
NAMESPACE     NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes                ClusterIP   10.96.0.1       <none>        443/TCP                  162m
kotsadm       kotsadm                   ClusterIP   10.99.230.140   <none>        3000/TCP                 160m
kotsadm       kotsadm-rqlite            ClusterIP   10.97.116.105   <none>        4001/TCP                 160m
kotsadm       kotsadm-rqlite-headless   ClusterIP   None            <none>        4001/TCP                 160m
kotsadm       kurl-proxy-kotsadm        NodePort    10.101.80.185   <none>        8800:30000/TCP           160m
kotsadm       postgres                  ClusterIP   10.105.80.214   <none>        5432/TCP                 144m
kotsadm       replicated                ClusterIP   10.96.56.26     <none>        3000/TCP                 144m
kotsadm       slackernews               ClusterIP   10.106.0.218    <none>        3000/TCP                 144m
kotsadm       slackernews-nginx         NodePort    10.102.10.158   <none>        443:443/TCP              144m
kube-system   kube-dns                  ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   162m
kube-system   metrics-server            ClusterIP   10.99.252.80    <none>        443/TCP                  162m
nslookup kotsadm.kotsadm.svc.cluster.local
dig slackernews.kotsadm.svc.cluster.local

nslookup kube-dns.kube-system.svc.cluster.local
dig metrics-servier.kube-system.svc.cluster.local

Analysis

From the 9 tcpdump processes that are running, we can trace the traffic from the origin pod -> vxlan bridge -> host -> destination host -> vxlan bridge -> pod. If any hop is not serving traffic, it should be immediately obvious by the fact that packets are not being printed to the screen. We can further analyze an individual packet to see how it's being handled:

# kotsadm pod capture

2024-10-09 22:08:48.793105 IP (tos 0x0, ttl 64, id 33878, offset 0, flags [DF], proto UDP (17), length 97)
    10.244.114.75.41102 > 10.96.0.10.53: [bad udp cksum 0x8807 -> 0x8cdd!] 3321+ [1au] AAAA? kotsadm-rqlite.kotsadm.svc.cluster.local. ar: . OPT UDPsize=1232 (69)
2024-10-09 22:08:48.793112 IP (tos 0x0, ttl 64, id 41852, offset 0, flags [DF], proto UDP (17), length 97)
    10.244.114.75.39862 > 10.96.0.10.53: [bad udp cksum 0x8807 -> 0x9bc4!] 773+ [1au] A? kotsadm-rqlite.kotsadm.svc.cluster.local. ar: . OPT UDPsize=1232 (69)
2024-10-09 22:08:48.793628 IP (tos 0x0, ttl 62, id 65403, offset 0, flags [DF], proto UDP (17), length 190)
    10.96.0.10.53 > 10.244.114.75.41102: [udp sum ok] 3321*- q: AAAA? kotsadm-rqlite.kotsadm.svc.cluster.local. 0/1/1 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1728502735 7200 1800 86400 30 ar: . OPT UDPsize=1232 (162)
2024-10-09 22:08:48.793693 IP (tos 0x0, ttl 62, id 65404, offset 0, flags [DF], proto UDP (17), length 153)
    10.96.0.10.53 > 10.244.114.75.39862: [udp sum ok] 773*- q: A? kotsadm-rqlite.kotsadm.svc.cluster.local. 1/0/1 kotsadm-rqlite.kotsadm.svc.cluster.local. A 10.97.116.105 ar: . OPT UDPsize=1232 (125)

From this packet, we can see 2 requests being made, one A request and one AAAA request - both requests are answered by the DNS service. The origin of the request is 10.244.114.75.41102 > 10.96.0.10.53 which we know is the IP of the kotsadm pod from a previous step:

root@ada-ec-1:/home/ada# kubectl get po -A -o wide | grep -E 'coredns|kotsadm'
kotsadm            kotsadm-58f859745b-x4nh8                      1/1     Running            0             83m   10.244.114.75   ada-ec-1   <none>           <none>

We know that the destination IP is the CoreDNS service, as well:

kube-system   kube-dns                  ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   162m

And we can confirm a response from CoreDNS from the return trip:

2024-10-09 22:08:48.793693 IP (tos 0x0, ttl 62, id 65404, offset 0, flags [DF], proto UDP (17), length 153)
    10.96.0.10.53 > 10.244.114.75.39862: [udp sum ok] 773*- q: A? kotsadm-rqlite.kotsadm.svc.cluster.local. 1/0/1 kotsadm-rqlite.kotsadm.svc.cluster.local. A 10.97.116.105 ar: . OPT UDPsize=1232 (125)

Here we see the responding IP: 10.96.0.10.53 > 10.244.114.75.39862. A response from the CoreDNS Service indicates success.

Also observe from the VXLAN bridge captures, there are regular updates to/from port 10250, which is the Kubelet process port - this would also imply that nodes are able to communicate with each other over the container network:

2024-10-09 23:52:04.177274 IP (tos 0x0, ttl 64, id 8855, offset 0, flags [none], proto UDP (17), length 102)
   10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56928, offset 0, flags [DF], proto TCP (6), length 52)
   10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0xa33a (correct), seq 18754, ack 218896, win 9687, options [nop,nop,TS val 1045132544 ecr 3731670274], length 0
2024-10-09 23:52:05.136048 IP (tos 0x0, ttl 64, id 9384, offset 0, flags [none], proto UDP (17), length 143)
   10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56929, offset 0, flags [DF], proto TCP (6), length 93)
   10.244.106.64.19922 > 10.244.114.65.10250: Flags [P.], cksum 0x04ee (correct), seq 18754:18795, ack 218896, win 9687, options [nop,nop,TS val 1045133502 ecr 3731670274], length 41
2024-10-09 23:52:05.137453 IP (tos 0x0, ttl 64, id 39582, offset 0, flags [none], proto UDP (17), length 189)
   10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23722, offset 0, flags [DF], proto TCP (6), length 139)
   10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0x86f7 (correct), seq 218896:218983, ack 18795, win 494, options [nop,nop,TS val 3731671275 ecr 1045133502], length 87
2024-10-09 23:52:05.137515 IP (tos 0x0, ttl 64, id 39583, offset 0, flags [none], proto UDP (17), length 1232)
   10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23723, offset 0, flags [DF], proto TCP (6), length 1182)
   10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0x71b6 (correct), seq 218983:220113, ack 18795, win 494, options [nop,nop,TS val 3731671275 ecr 1045133502], length 1130
2024-10-09 23:52:05.137655 IP (tos 0x0, ttl 64, id 9385, offset 0, flags [none], proto UDP (17), length 102)
   10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56930, offset 0, flags [DF], proto TCP (6), length 52)
   10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x9b11 (correct), seq 18795, ack 218983, win 9687, options [nop,nop,TS val 1045133504 ecr 3731671275], length 0
2024-10-09 23:52:05.137694 IP (tos 0x0, ttl 64, id 9386, offset 0, flags [none], proto UDP (17), length 102)
   10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56931, offset 0, flags [DF], proto TCP (6), length 52)
   10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x9691 (correct), seq 18795, ack 220113, win 9709, options [nop,nop,TS val 1045133504 ecr 3731671275], length 0
2024-10-09 23:52:05.490628 IP (tos 0x0, ttl 64, id 9686, offset 0, flags [none], proto UDP (17), length 142)
   10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56932, offset 0, flags [DF], proto TCP (6), length 92)
   10.244.106.64.19922 > 10.244.114.65.10250: Flags [P.], cksum 0x7af8 (correct), seq 18795:18835, ack 220113, win 9709, options [nop,nop,TS val 1045133857 ecr 3731671275], length 40
2024-10-09 23:52:05.492010 IP (tos 0x0, ttl 64, id 39761, offset 0, flags [none], proto UDP (17), length 170)
   10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23724, offset 0, flags [DF], proto TCP (6), length 120)
   10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0xb1ce (correct), seq 220113:220181, ack 18835, win 494, options [nop,nop,TS val 3731671629 ecr 1045133857], length 68
2024-10-09 23:52:05.492056 IP (tos 0x0, ttl 64, id 39762, offset 0, flags [none], proto UDP (17), length 954)
   10.128.0.72.55693 > 10.128.0.96.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 23725, offset 0, flags [DF], proto TCP (6), length 904)
   10.244.114.65.10250 > 10.244.106.64.19922: Flags [P.], cksum 0xba2a (correct), seq 220181:221033, ack 18835, win 494, options [nop,nop,TS val 3731671629 ecr 1045133857], length 852
2024-10-09 23:52:05.492188 IP (tos 0x0, ttl 64, id 9687, offset 0, flags [none], proto UDP (17), length 102)
   10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56933, offset 0, flags [DF], proto TCP (6), length 52)
   10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x9361 (correct), seq 18835, ack 220181, win 9709, options [nop,nop,TS val 1045133858 ecr 3731671629], length 0
2024-10-09 23:52:05.492213 IP (tos 0x0, ttl 64, id 9688, offset 0, flags [none], proto UDP (17), length 102)
   10.128.0.96.33307 > 10.128.0.72.4789: VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 56934, offset 0, flags [DF], proto TCP (6), length 52)
   10.244.106.64.19922 > 10.244.114.65.10250: Flags [.], cksum 0x8ff6 (correct), seq 18835, ack 221033, win 9731, options [nop,nop,TS val 1045133859 ecr 3731671629], length 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment