Skip to content

Instantly share code, notes, and snippets.

@YutaroHayakawa
Last active May 1, 2022 19:21
Show Gist options
  • Save YutaroHayakawa/d1e26746acfe5fd6ca495ef6cfd0e04a to your computer and use it in GitHub Desktop.
Save YutaroHayakawa/d1e26746acfe5fd6ca495ef6cfd0e04a to your computer and use it in GitHub Desktop.
Demo environment for Cilium BGP Control Plane feature

Topology

flowchart LR
  subgraph router0
    router0:net0("net0 (unnumbered)")
    router0:net1("net1 (unnumbered)")
  end
  router0:net0---tor0:net0
  router0:net1---tor1:net0
  subgraph rack0
    subgraph tor0
      tor0:net0("net0 (unnumbered)")
      tor0:net1("net1 (10.0.1.1/24)")
      tor0:net2("net2 (10.0.2.1/24)")
    end
    subgraph kind-node0
      kind-node0:net0("net0 (10.0.1.2/24)")
      kind-node0:lxc0("lxc0")
    end
    subgraph kind-node1
      kind-node1:net0("net0 (10.0.2.2/24)")
      kind-node1:lxc0("lxc0")
    end
    subgraph netshoot-pod0
      netshoot-pod0:eth0("eth0")
    end
    subgraph netshoot-pod1
      netshoot-pod1:eth0("eth0")
    end
    tor0:net0-.-tor0:net1
    tor0:net0-.-tor0:net2
    tor0:net1---kind-node0:net0
    tor0:net2---kind-node1:net0
    kind-node0:net0-.-kind-node0:lxc0
    kind-node1:net0-.-kind-node1:lxc0
    kind-node0:lxc0---netshoot-pod0:eth0
    kind-node1:lxc0---netshoot-pod1:eth0
  end

  subgraph rack1
    subgraph tor1
      tor1:net0("net0 (unnumbered)")
      tor1:net1("net1 (10.0.3.1/24)")
      tor1:net2("net2 (10.0.4.1/24)")
    end
    subgraph kind-node2
      kind-node2:net0("net0 (10.0.3.2/24)")
      kind-node2:lxc0("lxc0")
    end
    subgraph kind-node3
      kind-node3:net0("net0 (10.0.4.2/24)")
      kind-node3:lxc0("lxc0")
    end
    subgraph netshoot-pod2
      netshoot-pod2:eth0("eth0")
    end
    subgraph netshoot-pod3
      netshoot-pod3:eth0("eth0")
    end
    tor1:net0-.-tor1:net1
    tor1:net0-.-tor1:net2
    tor1:net1---kind-node2:net0
    tor1:net2---kind-node3:net0
    kind-node2:net0-.-kind-node2:lxc0
    kind-node3:net0-.-kind-node3:lxc0
    kind-node2:lxc0---netshoot-pod2:eth0
    kind-node3:lxc0---netshoot-pod3:eth0
  end
Loading

Demo Schenario

  1. Create an environment
make deploy
  1. Confirm Cilium is running with native routing mode and Pod to Pod communication is not working
# Show Cilium configuration
cat values.yaml

# Push 'd' on the mtr window and keep this MTR running on the window for later use.
SRC_POD=$(kubectl get pods -o wide | grep control-plane | awk '{ print($1); }')
DST_IP=$(kubectl get pods -o wide | grep worker3 | awk '{ print($6); }')
kubectl exec -it $SRC_POD -- mtr $DST_IP
  1. Show neighboring status of routers and confirm Cilium nodes are not peering with them
make show-neighbors
  1. Show how CiliumBGPPeeringPolicy looks like
# Important point: It is Per-rack configuration (not per-node).
cat cilium-bgp-peering-policies.yaml
  1. Apply policy and see MTR result changed
# Expected behavior is MTR shows the route like Pod->kind-control-plane->tor0->router0->tor1->kind-worker3->Pod
make apply-policy

Usage

Create/Destroy/Reload the Lab

# Create
make deploy

# Destroy
make destroy

# Reload
make reload

Apply/Delete BGP policies

# Apply
make apply-policy

# Delete
make delete-policy

Show neighboring state of all nodes

make show-neighbors

Show RIB of all router nodes

make show-rib

Show FIB of all nodes

make show-fib

How kind in containerlab works?

Wiring veth to Kind nodes

The essential part is a network-mode definition of serverX nodes in the topo.yaml. container:<containerX> is an option to make a container which shares a network namespace with containerX. By using this, we can define a containerlab nodes that shares the network namespace with Kind nodes. By defining links to those containers, we can wire veth to the Kind nodes.

Setup routing to containerlab network on Kind nodes

Now, the Kind nodes connected to the containerlab network, but their default route still points to the Docker network bridge. To route the packets to the containerlab side, we need to modify default route. We also need to assign IP address to net0 interfaces in this case.

Connecting Kind nodes to outside of containerlab network

Since we modify the default route of Kind nodes, nodes are losing the connectivity to the outside world (they are relying on the Docker's NAT in regular case). To solve this problem, we need to prepare the NAT node in containerlab network side. In this example router0 is in charge of that (it advertises default route with BGP).

Hacks

Currently, Containerlab has a tricky spec that for container:<containerX> which containerX must be the name of the node inside the containerlab manifest (e.g. router0, tor0, etc...). So, just specifying Kind node's container name in containerX doesn't work. To overcome this limitation, we need special naming convention for Kind nodes. Please don't change the Kind's cluster name and Containerlab's lab name. If you want to change that, please ask me.

---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
name: rack0
spec:
nodeSelector:
matchLabels:
rack: rack0
virtualRouters:
- localASN: 65010
exportPodCIDR: true
neighbors:
- peerAddress: "10.0.0.1/32"
peerASN: 65010
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
name: rack1
spec:
nodeSelector:
matchLabels:
rack: rack1
virtualRouters:
- localASN: 65011
exportPodCIDR: true
neighbors:
- peerAddress: "10.0.0.2/32"
peerASN: 65011
kind: Cluster
name: clab-kind-in-clab
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
podSubnet: "10.1.0.0/16"
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: "10.0.1.2"
node-labels: "rack=rack0"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: "10.0.2.2"
node-labels: "rack=rack0"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: "10.0.3.2"
node-labels: "rack=rack1"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: "10.0.4.2"
node-labels: "rack=rack1"
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5000"]
endpoint = ["http://kind-registry:5000"]
deploy:
kind create cluster --config cluster.yaml
sudo containerlab -t topo.yaml deploy
helm install --kube-context kind-clab-kind-in-clab -n kube-system cilium cilium/cilium --version v1.12.0-rc1 -f values.yaml
kubectl apply -f netshoot-ds.yaml
reload:
kind delete clusters clab-kind-in-clab
sudo containerlab -t topo.yaml destroy
kind create cluster --config cluster.yaml
sudo containerlab -t topo.yaml deploy
helm install --kube-context kind-clab-kind-in-clab -n kube-system cilium cilium/cilium --version v1.12.0-rc1 -f values.yaml
kubectl apply -f netshoot-ds.yaml
destroy:
kind delete clusters clab-kind-in-clab
sudo containerlab -t topo.yaml destroy
apply-policy:
kubectl apply -f cilium-bgp-peering-policies.yaml
delete-policy:
kubectl delete -f cilium-bgp-peering-policies.yaml
show-rib:
@echo "======== router0 ========"
docker exec -it clab-kind-in-clab-router0 vtysh -c 'show bgp ipv4 wide'
@echo "======== tor0 ========"
docker exec -it clab-kind-in-clab-tor0 vtysh -c 'show bgp ipv4 wide'
@echo "======== tor1 ========"
docker exec -it clab-kind-in-clab-tor1 vtysh -c 'show bgp ipv4 wide'
show-fib:
@echo "======== router0 ========"
docker exec -it clab-kind-in-clab-router0 ip r
@echo "======== tor0 ========"
docker exec -it clab-kind-in-clab-tor0 ip r
@echo "======== tor1 ========"
docker exec -it clab-kind-in-clab-tor1 ip r
show-neighbors:
@echo "======== router0 ========"
docker exec -it clab-kind-in-clab-router0 vtysh -c 'show bgp ipv4 summary wide'
@echo "======== tor0 ========"
docker exec -it clab-kind-in-clab-tor0 vtysh -c 'show bgp ipv4 summary wide'
@echo "======== tor1 ========"
docker exec -it clab-kind-in-clab-tor1 vtysh -c 'show bgp ipv4 summary wide'
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: netshoot
spec:
selector:
matchLabels:
app: netshoot
template:
metadata:
labels:
app: netshoot
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: netshoot
image: nicolaka/netshoot:latest
command: ["sleep", "infinite"]
name: kind-in-clab
topology:
kinds:
linux:
cmd: bash
nodes:
router0:
kind: linux
image: frrouting/frr:v8.2.2
labels:
app: frr
exec:
# NAT everything in here to go outside of the lab
- iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# Loopback IP (IP address of the router itself)
- ip addr add 10.0.0.0/32 dev lo
# Terminate rest of the 10.0.0.0/8 in here
- ip route add blackhole 10.0.0.0/8
# Boiler plate to make FRR work
- touch /etc/frr/vtysh.conf
- sed -i -e 's/bgpd=no/bgpd=yes/g' /etc/frr/daemons
- /usr/lib/frr/frrinit.sh start
# FRR configuration
- >-
vtysh -c 'conf t'
-c 'frr defaults datacenter'
-c 'router bgp 65000'
-c ' bgp router-id 10.0.0.0'
-c ' no bgp ebgp-requires-policy'
-c ' neighbor ROUTERS peer-group'
-c ' neighbor ROUTERS remote-as external'
-c ' neighbor ROUTERS default-originate'
-c ' neighbor net0 interface peer-group ROUTERS'
-c ' neighbor net1 interface peer-group ROUTERS'
-c ' address-family ipv4 unicast'
-c ' redistribute connected'
-c ' exit-address-family'
-c '!'
tor0:
kind: linux
image: frrouting/frr:v8.2.2
labels:
app: frr
exec:
- ip link del eth0
- ip addr add 10.0.0.1/32 dev lo
- ip addr add 10.0.1.1/24 dev net1
- ip addr add 10.0.2.1/24 dev net2
- touch /etc/frr/vtysh.conf
- sed -i -e 's/bgpd=no/bgpd=yes/g' /etc/frr/daemons
- /usr/lib/frr/frrinit.sh start
- >-
vtysh -c 'conf t'
-c 'frr defaults datacenter'
-c 'router bgp 65010'
-c ' bgp router-id 10.0.0.1'
-c ' no bgp ebgp-requires-policy'
-c ' neighbor ROUTERS peer-group'
-c ' neighbor ROUTERS remote-as external'
-c ' neighbor SERVERS peer-group'
-c ' neighbor SERVERS remote-as internal'
-c ' neighbor net0 interface peer-group ROUTERS'
-c ' neighbor 10.0.1.2 peer-group SERVERS'
-c ' neighbor 10.0.2.2 peer-group SERVERS'
-c ' address-family ipv4 unicast'
-c ' redistribute connected'
-c ' exit-address-family'
-c '!'
tor1:
kind: linux
image: frrouting/frr:v8.2.2
labels:
app: frr
exec:
- ip link del eth0
- ip addr add 10.0.0.2/32 dev lo
- ip addr add 10.0.3.1/24 dev net1
- ip addr add 10.0.4.1/24 dev net2
- touch /etc/frr/vtysh.conf
- sed -i -e 's/bgpd=no/bgpd=yes/g' /etc/frr/daemons
- /usr/lib/frr/frrinit.sh start
- >-
vtysh -c 'conf t'
-c 'frr defaults datacenter'
-c 'router bgp 65011'
-c ' bgp router-id 10.0.0.2'
-c ' bgp bestpath as-path multipath-relax'
-c ' no bgp ebgp-requires-policy'
-c ' neighbor ROUTERS peer-group'
-c ' neighbor ROUTERS remote-as external'
-c ' neighbor SERVERS peer-group'
-c ' neighbor SERVERS remote-as internal'
-c ' neighbor net0 interface peer-group ROUTERS'
-c ' neighbor 10.0.3.2 peer-group SERVERS'
-c ' neighbor 10.0.4.2 peer-group SERVERS'
-c ' address-family ipv4 unicast'
-c ' redistribute connected'
-c ' exit-address-family'
-c '!'
server0:
kind: linux
image: nicolaka/netshoot:latest
network-mode: container:control-plane
exec:
# Cilium currently doesn't support BGP Unnumbered
- ip addr add 10.0.1.2/24 dev net0
# Cilium currently doesn't support importing routes
- ip route replace default via 10.0.1.1
server1:
kind: linux
image: nicolaka/netshoot:latest
network-mode: container:worker
exec:
- ip addr add 10.0.2.2/24 dev net0
- ip route replace default via 10.0.2.1
server2:
kind: linux
image: nicolaka/netshoot:latest
network-mode: container:worker2
exec:
- ip addr add 10.0.3.2/24 dev net0
- ip route replace default via 10.0.3.1
server3:
kind: linux
image: nicolaka/netshoot:latest
network-mode: container:worker3
exec:
- ip addr add 10.0.4.2/24 dev net0
- ip route replace default via 10.0.4.1
links:
- endpoints: ["router0:net0", "tor0:net0"]
- endpoints: ["router0:net1", "tor1:net0"]
- endpoints: ["tor0:net1", "server0:net0"]
- endpoints: ["tor0:net2", "server1:net0"]
- endpoints: ["tor1:net1", "server2:net0"]
- endpoints: ["tor1:net2", "server3:net0"]
---
tunnel: disabled
ipam:
mode: kubernetes
ipv4NativeRoutingCIDR: 10.0.0.0/8
bgpControlPlane:
enabled: true
k8s:
requireIPv4PodCIDR: true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment