tc
is a utility provided by iproute2
that can be used to interact with the Linux Traffic Control (TC) Subsystem. Support was added in kernel version 4.1 to allow BPF programs to be loaded as TC classifiers (or filters) and actions.
This document describes testing done on the behavior of adding and removing BPF programs using the tc-bpf tool.
The main goal was to understand how tc-bpf behaves in order to determine whether it could be added as an option for bpfd. The specific initial goals documented here were to understand how BPF programs can be added and deleted, and also how to control the order of execution of multiple programs added to the same interface and qdisc.
Testing was done using two VirtualBox VMs running Fedora 36 version 5.18.5-200.fc36.x86_64. The VMs are named "bpf1" and "bpf2" and are connected by a host-only network using devices enp0s9.
The following three simple BPF programs were used.
accept-all.c
#include <linux/bpf.h>
#include <linux/pkt_cls.h>
__attribute__((section("ingress"), used))
int accept() {
return TC_ACT_OK;
}
drop-all.c
#include <linux/bpf.h>
#include <linux/pkt_cls.h>
__attribute__((section("ingress"), used))
int drop_all() {
return TC_ACT_SHOT;
}
drop-icmp.c
#include <linux/bpf.h>
#include <linux/pkt_cls.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <arpa/inet.h>
__attribute__((section("ingress"), used))
int drop(struct __sk_buff *skb) {
const int l3_off = ETH_HLEN; // IP header offset
const int l4_off = l3_off + sizeof(struct iphdr); // L4 header offset
void *data = (void*)(long)skb->data;
void *data_end = (void*)(long)skb->data_end;
if (data_end < data + l4_off)
return TC_ACT_OK;
struct ethhdr *eth = data;
if (eth->h_proto != htons(ETH_P_IP))
return TC_ACT_OK;
struct iphdr *ip = (struct iphdr *)(data + l3_off);
if (ip->protocol != IPPROTO_ICMP)
return TC_ACT_OK;
return TC_ACT_SHOT;
}
Note: the drop-icmp.c code was obtained from Firewalling with BPF/XDP: Examples and Deep Dive
The BPF programs are added on VM bpf2 as ingress tc filters.
TC is a complex and powerful, and this effort only evaluates a small portion of it. In particular, adding BPF programs as ingress
filters with the direct-action
mode and clsact
qdisc. Additionally, we make use of the preference
option to explicitly define the priority, or order in which the filters are evaluated. Below are a few comments on these options.
ingress
: Filters are applied to egress by default, so ingress must be specified.direct-action
: instructs eBPF classifier to not invoke external TC actions, instead use the TC actions return codes. For example,TC_ACT_OK
(allows packet to proceed) andTC_ACT_SHOT
(drops packet) are used in the code above.clsact
qdisc: BPF programs can be associated with different types of qdis's. However,clsact
is a fairly new qdisc that was added primarily for BPF classifiers. It can only hold classifiers, it can be used for both ingress and egress and is the recommended qdisc for BPF classifiers.clsact
isn't currently documented in the tc man pages. See here for more info.preference
: Thepreference
option is also not documented in the man pages, but it's use is mentioned here.
ping
and arping
are used from bpf1 to test connectivity via ICMP and ARP, respectively. It is also useful to run tcpdump
on both vms to see what's happening, but I won't discuss the details here.
Before testing starts, confirm that both ping
and arping
from bpf1 to bpf2 work.
The qdisc's on an interface can be displayed as follows
bpf2:~/bpf$ tc qdisc show dev enp0s9
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
Note that there is one default fq_codel qdisc on the interface.
Create a clsact qdisc on bpf2
sudo tc qdisc add dev enp0s9 clsact
Display qdiscs again to confirm creation
bpf2:~/bpf$ tc qdisc show dev enp0s9
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc clsact ffff: parent ffff:fff1
Create an ingress filter to drop icmp packets
sudo tc filter add dev enp0s9 ingress bpf da obj drop-icmp.o sec ingress
Display the ingress filters
bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49152 bpf chain 0
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8
Confirm that ping
from bpf1 -> bpf2 no longer works, but arping
continues to work.
Add a drop all filter
sudo tc filter add dev enp0s9 ingress bpf da obj drop-all.o sec ingress
Display the ingress filters
bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49151 bpf chain 0
filter protocol all pref 49151 bpf chain 0 handle 0x1 drop-all.o:[ingress] direct-action not_in_hw id 194 tag 3b185187f1855c4c
filter protocol all pref 49152 bpf chain 0
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8
We now see a second BPF filter. The first filter was added with preference 49152 and the second filter was added with preference 49151. Lower preference is higher priority, and we can confirm that the drop-all filter is taking precedence by confirming that both ping
and arping
from bpf1 to bpf2 now fail.
Let's add a third accept-all filter
sudo tc filter add dev enp0s9 ingress bpf da obj accept-all.o sec ingress
Display filters
bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49150 bpf chain 0
filter protocol all pref 49150 bpf chain 0 handle 0x1 accept-all.o:[ingress] direct-action not_in_hw id 197 tag a04f5eef06a7f555
filter protocol all pref 49151 bpf chain 0
filter protocol all pref 49151 bpf chain 0 handle 0x1 drop-all.o:[ingress] direct-action not_in_hw id 194 tag 3b185187f1855c4c
filter protocol all pref 49152 bpf chain 0
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8
We now have a third accept-all filter at preference 49150, and can confirm that it is taking precedence over the others because ping
and arping
both work again.
The same type of testing can be used to confirm each of the following steps, but I'll leave the details up to the reader from now on.
By default, the system adds filters starting at preference 49152 and decrements the preference value by 1 for each new filter that is added. However, it's possible to specify the preference explicitly as follows
sudo tc filter add dev enp0s9 preference 49140 ingress bpf da obj drop-icmp.o sec ingress
The use of the preference
option isn't documented on the tc man page, but there is some discussion of it here.
We can add multiple filters at the same preference level using
sudo tc filter add dev enp0s9 preference 49140 ingress bpf da obj drop-all.o sec ingress
sudo tc filter add dev enp0s9 preference 49140 ingress bpf da obj accept-all.o sec ingress
Now the filter look like
bpf2:~/bpf$ tc filter show dev enp0s9 ingress
filter protocol all pref 49140 bpf chain 0
filter protocol all pref 49140 bpf chain 0 handle 0x3 accept-all.o:[ingress] direct-action not_in_hw id 206 tag a04f5eef06a7f555
filter protocol all pref 49140 bpf chain 0 handle 0x2 drop-all.o:[ingress] direct-action not_in_hw id 203 tag 3b185187f1855c4c
filter protocol all pref 49140 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 200 tag 8af399635c5cc8a8
filter protocol all pref 49150 bpf chain 0
filter protocol all pref 49150 bpf chain 0 handle 0x1 accept-all.o:[ingress] direct-action not_in_hw id 197 tag a04f5eef06a7f555
filter protocol all pref 49151 bpf chain 0
filter protocol all pref 49151 bpf chain 0 handle 0x1 drop-all.o:[ingress] direct-action not_in_hw id 194 tag 3b185187f1855c4c
filter protocol all pref 49152 bpf chain 0
filter protocol all pref 49152 bpf chain 0 handle 0x1 drop-icmp.o:[ingress] direct-action not_in_hw id 191 tag 8af399635c5cc8a8
When multiple filters are added at the same preference level, more recently filters take precedence. I.e., they are executed in the reverse order in which they were added. Additionally, tc filter show
displays the filters from highest priority to lowest priority.
There are multiple ways to delete filters, but here are a few
When there is only one filter at a given preference, it can be deleted as follows:
sudo tc filter delete dev enp0s9 ingress pref 49150
When there are multiple filters at a given preference level, more specificity is needed to delete a given filter.
sudo tc filter delete dev enp0s9 preference 49140 handle 0x2 ingress bpf
Or, deleting the qdisc cleans up everything.
sudo tc qdisc del dev enp0s9 clsact
- BPF programs can be easily added to and removed from interfaces using the tc utility.
- By default, filters are executed in the reverse order in which they are added.
- The order can be controlled using the
preference
option - Within a given preference, filters are executed in the reverse order in which they are added.
- I have not seen an explicit way to re-order filters. However, it should be possible to re-order filters by adding filters at a new preference and then deleting the filters at at the old preference level.
tc-bpf
seems compatible withbpfd
, though further investigation is required to confirm that all the APIs can be supported.