Skip to content

Instantly share code, notes, and snippets.

@aojea
Last active July 16, 2021 01:12
Show Gist options
  • Select an option

  • Save aojea/11142ff56fda5b83b43df54824ce1793 to your computer and use it in GitHub Desktop.

Select an option

Save aojea/11142ff56fda5b83b43df54824ce1793 to your computer and use it in GitHub Desktop.
kubernetes loadbalancers and netkat
  1. Create a multinode cluster kind create cluster --config config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
  1. Run a fake loadbalancer controller
cd /home/aojea/go/src/github.com/aojea/networking-controllers/cmd/loadbalancer
go build .
kind get kubeconfig > kind.conf
./loadbalancer --kubeconfig kind.conf --iprange "10.111.111.0/24"
  1. Create a deployment with one pod and expose it with a Loadbalancer
kubectl apply -f https://gist.githubusercontent.com/aojea/369ad6a5d4cbb6b0fbcdd3dd909d9887/raw/0ecd0564c1db4c7103de07accce99da1c7bf91c3/loadbalancer.yaml
  1. Obtain the LoadBalancer assigned IP
kubectl get service lb-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
10.111.111.129

The loadbalancer has externalTrafficPolicy: Local check that is only reachable from the node where the pod is running

kubectl get pods -l app=MyApp -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
test-deployment-6f45696bd5-xlp96   1/1     Running   0          10m   10.244.2.2   kind-worker2   <none>           <none>

Get the nodes IPs

kubectl get nodes -o wide
NAME                 STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION            CONTAINER-RUNTIME
kind-control-plane   Ready    control-plane,master   34m   v1.21.2   172.18.0.4    <none>        Ubuntu 21.04   4.18.0-301.1.el8.x86_64   containerd://1.5.2
kind-worker          Ready    <none>                 33m   v1.21.2   172.18.0.3    <none>        Ubuntu 21.04   4.18.0-301.1.el8.x86_64   containerd://1.5.2
kind-worker2         Ready    <none>                 33m   v1.21.2   172.18.0.2    <none>        Ubuntu 21.04   4.18.0-301.1.el8.x86_64   containerd://1.5.2
  1. Emulate the external loadbalancer installin a route in the host to the loadbalancer IP through the node with the pod

example: pod is in kind-worker2 with ip 172.18.0.2 and loadbalancer IP 10.111.111.129

sudo ip route add 10.111.111.129 via 172.18.0.2
  1. check it preserves the source ip
echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc 10.111.111.129 80
HTTP/1.1 200 OK
Date: Thu, 15 Jul 2021 23:30:35 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8

172.18.0.1:34028
  1. and it doesn't work if we target a node without pods backing that service
sudo ip route del 10.111.111.129 via 172.18.0.2
sudo ip route add 10.111.111.129 via 172.18.0.3

echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connection timed out.
  1. if we use externalTrafficPolicy cluster it works but IP is not preserved
kubectl patch service lb-service -p '{"spec":{"externalTrafficPolicy":"Cluster"}}'
service/lb-service patched
$ echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.111.111.129:80.
HTTP/1.1 200 OK
Date: Thu, 15 Jul 2021 23:34:22 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8

172.18.0.3:14569
Ncat: 36 bytes sent, 133 bytes received in 0.01 seconds.
  1. revert the service change and set external traffic policy local
kubectl patch service lb-service -p '{"spec":{"externalTrafficPolicy":"Local"}}'
service/lb-service patched
$ echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connection timed out.
  1. test the service from within node without the pod
echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Connection to 10.111.111.129 80 port [tcp/http] succeeded!
HTTP/1.1 200 OK
Date: Thu, 15 Jul 2021 23:36:50 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8

We think it shoudn't work because we are in a node without pods however, since the traffic comes from a node within the cluster the traffic is considered internal and externalTrafficPolicy doesn't apply the reason is that kube-proxy installs some iptables rules that capture the traffic within the node.

This causes a problem to test LoadBalancers, because it requires traffic to be sent from an external node, something that is not easy: Kubernetes cluster use to be isolated, expose apiserver endpoint e2e should not assume direct connectivity from the e2e binary

Solution:

bypass iptables --> netkat

Netkat

netkat works same as netcat but using raw sockets

The socket receive a copy of all packets, but it allows to add a BPF filter We don't need all packets, just the ones used by netcat We know some parts of the tuple so we can filter, we use hooks to add BPF filters

On the socket we filter the packets that we don't WANT On ingress we filter the packets that we WANT, so the host doesn't close our connection

Once we receive the packets we have raw packets (with ethernet headers) We are interested only in the stream of data carried over the connection but we are bypassing the kernel TCP/IP stack, so we need something to process the raw packets

Solution: use an userspace TCP/IP stack https://pkg.go.dev/gvisor.dev/gvisor/pkg/tcpip/stack

At this point we have all the pieces, we just need to glue them together Solution: golang

  • obtain the connection details: source, dest IP and port, interface
  • bpf2go generate the eBPF code (the eBPF code is parametrizable from golang)
  • inject the BPF code to the raw socket and to the ingress interface

As result, we have a new version of netcat that bypass iptables (needs CAP_NET_RAW) It has a bonus, with the -d flag it has a sniffer XD

sudo ./netkat -d -l 127.0.0.1 80
2021/07/16 01:58:08 Creating raw socket
2021/07/16 01:58:08 Adding ebpf ingress filter on interface lo
2021/07/16 01:58:08 filter {LinkIndex: 1, Handle: 0:1, Parent: ffff:fff2, Priority: 0, Protocol: 3}
2021/07/16 01:58:08 Creating user TCP/IP stack
2021/07/16 01:58:08 Listening on 127.0.0.1:80
I0716 01:58:31.206864   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ee flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715804214 TSEcr:0 SACKPermitted:true}
I0716 01:58:31.206937   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ee flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715804214 TSEcr:0 SACKPermitted:true}
I0716 01:58:32.239938   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ef flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715805247 TSEcr:0 SACKPermitted:true}
I0716 01:58:32.240044   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ef flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715805247 TSEcr:0 SACKPermitted:true}
I0716 01:58:34.287962   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5f0 flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715807295 TSEcr:0 SACKPermitted:true}
I0716 01:58:34.288096   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5f0 flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715807295 TSEcr:0 SACKPermitted:true}
^C2021/07/16 01:58:35 Exiting: received signal
2021/07/16 01:58:35 Done
  1. Run netkat in a pod (using hostnetwork) we are going to simulate an external connecting from inside a node
 kubectl apply -f https://gist.githubusercontent.com/aojea/952c82a58da625fbd9b8aca35f0e63f1/raw/d636a71ffe690633be448999d635199ef8724300/netkat.yaml
pod/netkat created
  1. Check where is running
$ kubectl get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
netkat                             1/1     Running   0          9s    172.18.0.3   kind-worker    <none>           <none>
test-deployment-6f45696bd5-xlp96   1/1     Running   0          63m   10.244.2.2   kind-worker2   <none>           <none>
  1. login to the container

NOTE: If you have RLIMIT_MEMLOCK errors type "ulimit -l unlimited"

kubectl exec -it netkat ash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
/ # ip route add 10.111.111.15 via 172.18.0.2
echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | ./netkat 10.111.111.15 80
2021/07/16 01:06:54 routes {Ifindex: 18 Dst: 10.111.111.15/32 Src: 172.18.0.3 Gw: 172.18.0.2 Flags: [] Table: 254}
2021/07/16 01:06:54 Creating raw socket
2021/07/16 01:06:54 Adding ebpf ingress filter on interface eth0
2021/07/16 01:06:54 filter {LinkIndex: 18, Handle: 0:1, Parent: ffff:fff2, Priority: 0, Protocol: 3}
2021/07/16 01:06:54 Creating user TCP/IP stack
2021/07/16 01:06:54 Dialing ...
2021/07/16 01:06:54 Connection established
2021/07/16 01:06:54 Connection error: <nil>
HTTP/1.1 200 OK
Date: Fri, 16 Jul 2021 01:06:54 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8
172.18.0.3:26913

we can see it preserves the source ip

  1. check that if we target a different node it doesn't work as expected because the service has externalTrafficPolicy: local # and it considers the connection as external
/ # ip route del 10.111.111.15 via 172.18.0.3
/ # ip route add 10.111.111.15 via 172.18.0.4
/ # echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | ./netkat 10.111.111.15 80
2021/07/16 01:11:38 routes {Ifindex: 18 Dst: 10.111.111.15/32 Src: 172.18.0.2 Gw: 172.18.0.4 Flags: [] Table: 254}
2021/07/16 01:11:38 Creating raw socket
2021/07/16 01:11:38 Adding ebpf ingress filter on interface eth0
2021/07/16 01:11:38 filter {LinkIndex: 18, Handle: 0:1, Parent: ffff:fff2, Priority: 0, Protocol: 3}
2021/07/16 01:11:38 Creating user TCP/IP stack
2021/07/16 01:11:38 Dialing ...
2021/07/16 01:11:43 Dialing error: context deadline exceeded
2021/07/16 01:11:43 Can't connect to server: context deadline exceeded
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment