Running a multi-site Kubernetes cluster on TalosOS with Cilium as the CNI, we needed:
- Inter-node encryption — all traffic between nodes encrypted (nodes communicate over public internet)
- Egress Gateway — specific pods' external traffic routed through a gateway node for geo-IP requirements
- Talos host firewall —
NetworkDefaultActionConfig: ingress: blockfor node-level security
These three requirements created an "impossible triangle":
Cilium installs CT --notrack iptables rules (in the raw table) for WireGuard-decrypted traffic (mark 0x0D00). This makes return packets UNTRACKED in the kernel's conntrack. Talos's nftables ingress chain has ct state established,related → accept, which doesn't match UNTRACKED packets. The cilium_host interface (where decrypted traffic enters the host stack) isn't in Talos's hardcoded interface whitelist (lo, siderolink, kubespan). Result: TCP SYN-ACK replies are silently dropped by the Talos firewall.
Confirmed with pwru kernel tracing:
nft_do_chain → sk_skb_reason_drop(SKB_DROP_REASON_NETFILTER_DROP)
KubeSpan (Talos's built-in WireGuard mesh) is incompatible with Cilium's bpf.masquerade: true. Cilium's BPF SNAT runs on eth0 egress, but KubeSpan's nftables OUTPUT chain intercepts packets and routes them through the kubespan WireGuard interface. Return traffic arrives on kubespan, not eth0, so Cilium's BPF reverse-SNAT never fires. TCP connections hang because SYN-ACK packets are never un-SNATed back to the pod IP. (siderolabs/talos#11235)
Cilium Egress Gateway has a hard requirement for bpf.masquerade: true (validated in pkg/egressgateway/manager.go). KubeSpan requires disabling BPF masquerade. Therefore: KubeSpan and Egress Gateway cannot coexist... or so we thought.
bpf.hostLegacyRouting: true breaks the deadlock:
# Cilium HelmRelease values
encryption:
enabled: false # KubeSpan handles encryption instead
bpf:
masquerade: true # Required for Egress Gateway — KEPT
hostLegacyRouting: true # Fixes KubeSpan + BPF masquerade conflict
egressGateway:
enabled: true # Works because bpf.masquerade is still true
bandwidthManager:
bbr: false # BBR requires BPF host routing, incompatible with legacy
enabled: true # EDT bandwidth manager still works with CUBIC# Talos machine config
machine:
network:
kubespan:
enabled: true # Encrypts all inter-node traffic-
bpf.hostLegacyRouting: trueforces packets through the kernel networking stack instead of BPF direct-redirect between interfaces. KubeSpan's nftables rules and kernel conntrack can properly track connections — both SNAT and reverse-SNAT happen in the netfilter framework, not split between BPF and netfilter. -
bpf.masquerade: trueis still set, satisfying Egress Gateway's hard requirement. The BPF masquerade SNAT still runs oneth0egress for traffic that reacheseth0(like Egress Gateway's VXLAN-tunneled traffic to the gateway node). -
Egress Gateway works because its traffic flow is: pod →
cil_to_netdevoneth0→ BPF redirects via VXLAN to the gateway node → gateway node does SNAT. The VXLAN outer packet goes through KubeSpan (encrypted), but the Egress Gateway BPF logic runs before KubeSpan intercepts. -
KubeSpan's
kubespaninterface IS in Talos's hardcoded firewall whitelist — all traffic arriving onkubespanbypasses the nftables ingress chain entirely. No NOTRACK conflict.
Do NOT enable advertiseKubernetesNetworks in KubeSpan config — it's not needed in VXLAN tunnel mode (Cilium handles pod routing via VXLAN overlay) and is explicitly unsupported with Cilium. The fix from siderolabs/talos#9043 is also not required in tunnel mode.
Add KubeSpan's WireGuard port to the Talos firewall rules (both controlplane and worker templates):
apiVersion: v1alpha1
kind: NetworkRuleConfig
name: kubespan-wireguard
portSelector:
ports:
- 51820
protocol: udp
ingress:
# Allow from all cluster node IPs
- subnet: <node-ip>/32
# ...Cilium WireGuard port (51871) can be removed since Cilium encryption is disabled.
| Check | Result |
|---|---|
| Cilium health | 12/12 reachable, all Node 1/1 |
| KubeSpan mesh | All peers UP |
| Worker-to-worker host TCP | OK (was broken with Cilium WireGuard + Talos firewall) |
| Egress Gateway | Working — pods exit via gateway node IP |
| Cilium connectivity test (ping intra/cross node) | Passed |
| Talos host firewall | Active and working |
bandwidthManager.bbr: false is required because BBR needs BPF host routing, which is incompatible with hostLegacyRouting: true. The bandwidth manager still works with CUBIC congestion control. The performance impact of legacy host routing is minimal compared to VXLAN + WireGuard overhead already present.
Finding this solution required kernel-level debugging with pwru:
- Forward path works: SYN → BPF redirect to
cilium_wg0→ WireGuard encrypt → UDP to remote node → WireGuard decrypt →cil_from_wireguard→cilium_net/cilium_host→ip_rcv→tcp_v4_rcv→ SYN accepted - Return path breaks: SYN-ACK → WireGuard encrypt → UDP to originating node → WireGuard decrypt → enters via
cilium_host→ip_rcv→ nftables INPUT chain drops it (SKB_DROP_REASON_NETFILTER_DROP) - Why nftables drops it: Cilium's
CILIUM_PRE_rawchain setsCT --notrackon mark0x0D00packets. The SYN-ACK is UNTRACKED, not ESTABLISHED. Talos'sct state established,relatedrule doesn't match. No port-specific rule matches the SYN-ACK (destination port is ephemeral). Default policy: drop. - Why
cilium_hostisn't whitelisted: Talos's interface whitelist is hardcoded tolo,siderolink,kubespaninnftables_chain_config.go. Not configurable via machine config.
- TalosOS v1.12.5 (kernel 6.18.15)
- Cilium v1.19.1
- 12 nodes across 2 hosters, communicating over public internet
- VXLAN tunnel mode + KubeSpan WireGuard encryption