tc-bpf behavior

tc is a utility provided by iproute2 that can be used to interact with the Linux Traffic Control (TC) Subsystem. Support was added in kernel version 4.1 to allow BPF programs to be loaded as TC classifiers (or filters) and actions.

This document describes testing done on the behavior of adding and removing BPF programs using the tc-bpf tool.

The main goal was to understand how tc-bpf behaves in order to determine whether it could be added as an option for bpfd. The specific initial goals documented here were to understand how BPF programs can be added and deleted, and also how to control the order of execution of multiple programs added to the same interface and qdisc.

Setup

Testing was done using two VirtualBox VMs running Fedora 36 version 5.18.5-200.fc36.x86_64. The VMs are named "bpf1" and "bpf2" and are connected by a host-only network using devices enp0s9.

BPF Programs

The following three simple BPF programs were used.

accept-all.c

#include <linux/bpf.h>
#include <linux/pkt_cls.h>

__attribute__((section("ingress"), used))
int accept() {
    return TC_ACT_OK;
}

drop-all.c

#include <linux/bpf.h>
#include <linux/pkt_cls.h>

__attribute__((section("ingress"), used))
int drop_all() {
    return TC_ACT_SHOT;
}

drop-icmp.c

#include <linux/bpf.h>
#include <linux/pkt_cls.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <arpa/inet.h>

__attribute__((section("ingress"), used))
int drop(struct __sk_buff *skb) {
    const int l3_off = ETH_HLEN;                      // IP header offset
    const int l4_off = l3_off + sizeof(struct iphdr); // L4 header offset

    void *data = (void*)(long)skb->data;
    void *data_end = (void*)(long)skb->data_end;
    if (data_end < data + l4_off)
        return TC_ACT_OK;

    struct ethhdr *eth = data;
    if (eth->h_proto != htons(ETH_P_IP))
       return TC_ACT_OK;

    struct iphdr *ip = (struct iphdr *)(data + l3_off);
    if (ip->protocol != IPPROTO_ICMP)
        return TC_ACT_OK;

    return TC_ACT_SHOT;
}

Note: the drop-icmp.c code was obtained from Firewalling with BPF/XDP: Examples and Deep Dive

Testing

Adding BPF Filters

The BPF programs are added on VM bpf2 as ingress tc filters.

TC is a complex and powerful, and this effort only evaluates a small portion of it. In particular, adding BPF programs as ingress filters with the direct-action mode and clsact qdisc. Additionally, we make use of the preference option to explicitly define the priority, or order in which the filters are evaluated. Below are a few comments on these options.

ingress: Filters are applied to egress by default, so ingress must be specified.
direct-action: instructs eBPF classifier to not invoke external TC actions, instead use the TC actions return codes. For example, TC_ACT_OK (allows packet to proceed) and TC_ACT_SHOT (drops packet) are used in the code above.
clsact qdisc: BPF programs can be associated with different types of qdis's. However, clsact is a fairly new qdisc that was added primarily for BPF classifiers. It can only hold classifiers, it can be used for both ingress and egress and is the recommended qdisc for BPF classifiers. clsact isn't currently documented in the tc man pages. See here for more info.
preference: The preference option is also not documented in the man pages, but it's use is mentioned here.

ping and arping are used from bpf1 to test connectivity via ICMP and ARP, respectively. It is also useful to run tcpdump on both vms to see what's happening, but I won't discuss the details here.

Before testing starts, confirm that both ping and arping from bpf1 to bpf2 work.

The qdisc's on an interface can be displayed as follows

bpf2:~/bpf$ tc qdisc show dev enp0s9
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64

Note that there is one default fq_codel qdisc on the interface.

Create a clsact qdisc on bpf2

sudo tc qdisc add dev enp0s9 clsact

Display qdiscs again to confirm creation

bpf2:~/bpf$ tc qdisc show dev enp0s9
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
qdisc clsact ffff: parent ffff:fff1

Create an ingress filter to drop icmp packets

sudo tc filter add dev enp0s9 ingress bpf da obj drop-icmp.o sec ingress

Display the ingress filters

bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49152 bpf chain 0 
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8

Confirm that ping from bpf1 -> bpf2 no longer works, but arping continues to work.

Add a drop all filter

sudo tc filter add dev enp0s9 ingress bpf da obj drop-all.o sec ingress

Display the ingress filters

bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49151 bpf chain 0 
filter protocol all pref 49151 bpf chain 0 handle 0x1 drop-all.o:[ingress] direct-action not_in_hw id 194 tag 3b185187f1855c4c 
filter protocol all pref 49152 bpf chain 0 
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8

We now see a second BPF filter. The first filter was added with preference 49152 and the second filter was added with preference 49151. Lower preference is higher priority, and we can confirm that the drop-all filter is taking precedence by confirming that both ping and arping from bpf1 to bpf2 now fail.

Let's add a third accept-all filter

sudo tc filter add dev enp0s9 ingress bpf da obj accept-all.o sec ingress

Display filters

bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49150 bpf chain 0 
filter protocol all pref 49150 bpf chain 0 handle 0x1 accept-all.o:[ingress] direct-action not_in_hw id 197 tag a04f5eef06a7f555 
filter protocol all pref 49151 bpf chain 0 
filter protocol all pref 49151 bpf chain 0 handle 0x1 drop-all.o:[ingress] direct-action not_in_hw id 194 tag 3b185187f1855c4c 
filter protocol all pref 49152 bpf chain 0 
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8

We now have a third accept-all filter at preference 49150, and can confirm that it is taking precedence over the others because ping and arping both work again.

The same type of testing can be used to confirm each of the following steps, but I'll leave the details up to the reader from now on.

By default, the system adds filters starting at preference 49152 and decrements the preference value by 1 for each new filter that is added. However, it's possible to specify the preference explicitly as follows

sudo tc filter add dev enp0s9 preference 49140 ingress bpf da obj drop-icmp.o sec ingress

The use of the preference option isn't documented on the tc man page, but there is some discussion of it here.

We can add multiple filters at the same preference level using

sudo tc filter add dev enp0s9 preference 49140 ingress bpf da obj drop-all.o sec ingress
sudo tc filter add dev enp0s9 preference 49140 ingress bpf da obj accept-all.o sec ingress

Now the filter look like

bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49140 bpf chain 0 
filter protocol all pref 49140 bpf chain 0 handle 0x3 accept-all.o:[ingress] direct-action not_in_hw id 206 tag a04f5eef06a7f555 
filter protocol all pref 49140 bpf chain 0 handle 0x2 drop-all.o:[ingress] direct-action not_in_hw id 203 tag 3b185187f1855c4c 
filter protocol all pref 49140 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 200 tag 8af399635c5cc8a8 
filter protocol all pref 49150 bpf chain 0 
filter protocol all pref 49150 bpf chain 0 handle 0x1 accept-all.o:[ingress] direct-action not_in_hw id 197 tag a04f5eef06a7f555 
filter protocol all pref 49151 bpf chain 0 
filter protocol all pref 49151 bpf chain 0 handle 0x1 drop-all.o:[ingress] direct-action not_in_hw id 194 tag 3b185187f1855c4c 
filter protocol all pref 49152 bpf chain 0 
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8

When multiple filters are added at the same preference level, more recently filters take precedence. I.e., they are executed in the reverse order in which they were added. Additionally, tc filter show displays the filters from highest priority to lowest priority.

Deleting Filters and Cleaning Up

There are multiple ways to delete filters, but here are a few

When there is only one filter at a given preference, it can be deleted as follows:

sudo tc filter delete dev enp0s9 ingress pref 49150

When there are multiple filters at a given preference level, more specificity is needed to delete a given filter.

sudo tc filter delete dev enp0s9 preference 49140 handle 0x2 ingress bpf

Or, deleting the qdisc cleans up everything.

sudo tc qdisc del dev enp0s9 clsact

Conclusions

BPF programs can be easily added to and removed from interfaces using the tc utility.
By default, filters are executed in the reverse order in which they are added.
The order can be controlled using the preference option
Within a given preference, filters are executed in the reverse order in which they are added.
I have not seen an explicit way to re-order filters. However, it should be possible to re-order filters by adding filters at a new preference and then deleting the filters at at the old preference level.
tc-bpf seems compatible with bpfd, though further investigation is required to confirm that all the APIs can be supported.

anfredette/tc-bpf-behavior.md