Or: How I Reproduced the Problem on x86, Tried to Load the Missing Modules on the Real Device, and What That Tells Us About Ubiquiti's Kernel
Ubiquiti markets the Enterprise Fortress Gateway (EFG) as a 25-gigabit-class router. The product page lists two 25 GbE SFP28 ports for WAN/LAN, and Ubiquiti positions the device as a flagship for medium and large enterprise deployments. Its silicon — a Marvell Octeon CN9670 — supports hardware-accelerated forwarding through purpose-built network engines (NIX) that should sustain tens of millions of packets per second. The Cloud Gateway Max ("UDM Beast") pairs a Marvell Octeon CN10K SoC with a dedicated Marvell switch ASIC, and on paper should comfortably exceed 100 Gbps aggregate.
In practice, real-world enterprise deployments report:
- Inter-VLAN routing: ~1–1.5 Gbps single-stream, regardless of how fast the upstream link is
- PPPoE WAN throughput: ~2–3 Gbps single-stream on 10 Gbps fiber connections, where the ISP requires PPPoE authentication
- Total aggregate throughput: well below the marketed 25 Gbps WAN/LAN figures
This document analyzes both bottlenecks. It reproduces both problems in a controlled lab environment on x86 hardware, identifies the specific software architectural choices that cause them, demonstrates fixes whose effects can be measured to a precision of a few hundred Mbps, and documents in detail what happened when we attempted to apply the most surgical of those fixes — adding the missing nftables flowtable module — to a real production EFG.
We will show that the EFG's stock configuration delivers between 5% and 15% of the throughput its silicon is capable of. We will show three independent fixes that together can push it from ~1 Gbps single-stream to over 25 Gbps single-stream — without adding hardware. Two of those fixes are pure software configuration changes; the third is a kernel module that exists in mainline Linux and is shipped by Marvell themselves, but is not present in Ubiquiti's kernel build.
We then attempt to install the missing module on a real EFG. Building it against vanilla Linux 5.15.72 produces a kernel module with byte-perfect vermagic — and crashes the device on load. Building it against Marvell's complete published OCTEON BSP source from the Yocto Project produces another byte-perfect module that crashes at the identical function offset. Symbol-level analysis of the running EFG kernel reveals 6,357 unique symbols that exist in neither vanilla Linux nor Marvell's complete public BSP. These include conntrack extensions for proprietary DPI integration (nf_ct_ext_dpi_destroy, nf_conntrack_dpi_init), a 116-symbol tdts namespace exposing kernel internals to a closed-source Trend Micro DPI engine, and significant hardware abstraction additions.
The conclusion: Ubiquiti has built a substantially modified kernel that they have not released sources for, and Ubiquiti's open-source download page no longer exists. Their GitHub organization contains no firmware or kernel sources. Closed-source tdts and t_miner modules link directly against kernel symbols and operate as derived works of the kernel. This appears to violate GPL-2.0, and continues a pattern: Ubiquiti was publicly accused of GPL violations in 2015 (resolved after sustained pressure) and again in 2019.
The performance issues in this document have been reported to Ubiquiti through their support channel for approximately one year, including specific implementation guidance pointing to Marvell's published DPDK reference architecture; no substantive engineering response has been received. Separately, security findings about the EFG's deliberate absence of secure boot, module signing, and integrity protection were submitted through Ubiquiti's HackerOne bug bounty program and rejected on the grounds that the attacker would require network access — a rationale that does not survive scrutiny when applied to a network gateway.
This is therefore both a technical analysis and a software-license compliance analysis, and it is published only after the channels designed for vendor engagement have failed to produce a response.
- The Problem
- Test Environment
- Methodology
- The Reference Run: Real EFG Diagnostics
- Reproducing the Bottleneck — virtio-net Test Matrix
- Closing the Loop — Real Silicon Test Matrix
- Userspace Dataplane — VPP/DPDK Comparison
- The PPPoE Bottleneck — A Related but Distinct Problem
- Findings: The Architectural Failures
- Recommended Fixes
- Direct Experimental Verification — Building the Missing Modules
- Symbol-Level Forensics on the Running EFG Kernel
- The GPL Compliance Question
- Direct Vendor Engagement: What Ubiquiti Has Already Been Told
- Conclusion
- Appendix: Full Data Sets
Ubiquiti markets the Enterprise Fortress Gateway (EFG) as a 25-gigabit-class router. The product page lists two 25 GbE SFP28 ports for WAN/LAN, and Ubiquiti positions the device as a flagship for medium and large enterprise deployments. Its silicon — a Marvell Octeon CN9670 — supports hardware-accelerated forwarding through purpose-built network engines (NIX) that should sustain tens of millions of packets per second. The Cloud Gateway Max ("UDM Beast") pairs a Marvell Octeon CN10K SoC with a dedicated Marvell switch ASIC, and on paper should comfortably exceed 100 Gbps aggregate.
In practice, real-world enterprise deployments report:
- Inter-VLAN routing: ~1–1.5 Gbps single-stream, regardless of how fast the upstream link is
- PPPoE WAN throughput: ~2–3 Gbps single-stream on 10 Gbps fiber connections, where the ISP requires PPPoE authentication
- NAT throughput: similar single-flow ceilings whenever IPS, deep-packet-inspection, or threat management features are enabled
Customers complain, post mpstat screenshots showing one CPU core saturated while the other 17 sit idle, and get told it is a hardware limitation.
It is not. The CPUs are not the bottleneck. The silicon is not the bottleneck. The bottleneck is the configuration of the Linux kernel network stack that ships on the device, including:
- Hardware offload features that are explicitly disabled
- A modern kernel fast-path feature (
nf_flow_table) that is not loaded - A user-space inspection engine running on the same CPU core that is forwarding packets
- A 5-deep iptables FORWARD chain that every new connection must traverse
- Conntrack protocol helpers loaded for legacy protocols (PPTP, H.323) that no enterprise control plane lets you disable
- Per-VLAN bridges instead of a vlan-aware single bridge
- No DPDK fast-path despite Marvell shipping first-class DPDK PMDs (
cnxk) for these exact SoCs
Each one of these contributes measurable overhead. Combined, they drop forwarding throughput by an order of magnitude. The point of this article is to measure each contribution independently and show what a properly-configured Linux router looks like on the same workload.
- CPU: AMD Ryzen Threadripper Pro 7995WX, 96 cores / 192 threads, base 2.5 GHz, boost 5.1 GHz, Zen 4 microarchitecture
- RAM: 754 GB DDR5 ECC
- Hypervisor: Proxmox VE 9.0.11
- Kernel: Linux 6.14.11-4-pve
- Storage: NVMe ZFS root pool (
rpool) - Networking: Mellanox ConnectX-6 Dx dual-port 100 Gbps NIC (MT2892), bonded LACP 802.3ad
- IOMMU: AMD-Vi enabled in passthrough mode
- Hugepages: 64 × 1 GB = 64 GB reserved at boot
- EFG (Enterprise Fortress Gateway), Ubiquiti Networks
- Marvell Octeon CN9670 SoC, 18 ARM v8.2 cores @ 2.0 GHz
- 64 GB RAM
- Linux 5.15.72-ui-cn9670 (vendor build)
- Live production firewall, 8 days uptime at capture, 7 active VLANs in an enterprise office network
Three Ubuntu 24.04 LTS VMs were cloned from a common template, each pinned via a Proxmox hookscript to a dedicated CCD on the host (8 vCPUs each):
192.168.6.0/24 (mgmt — for SSH, never used for test traffic)
| | |
+----------+----------+
| | |
gw-router client1 client2
(VM 200) (VM 201) (VM 202)
8 cores 8 cores 8 cores
16 GB RAM 8 GB RAM 8 GB RAM
Cores 8-15 Cores 16-23 Cores 24-31
For test traffic (multiple network paths used in different tests):
client1 ────[VLAN 10]──── gw-router ────[VLAN 20]──── client2
10.10.10.10 10.10.20.10
↕ ↕
gw-router gw-router
10.10.10.1 10.10.20.1
The VMs received traffic through one of three I/O paths during testing:
- virtio-net through Linux bridges with VLAN tagging (
vmbr1on the host) - ConnectX-6 Dx VFs via SR-IOV passthrough (4 VFs total, 2 to gw-router, 1 to each client)
- VPP/DPDK with the same VFs polled directly by VPP's worker threads in userspace
Single TCP stream iperf3 at MTU 1500 was used as the primary measurement. Multi-stream tests with -P 8 were used in select cases to demonstrate scaling behavior. Each measurement ran for 30 seconds with per-second reporting; the values reported are the iperf3 sender/receiver final summary, which agree to within 0.1 Gbps in all cases.
To prove the architectural argument we needed to isolate independent variables:
| Variable | Settings tested |
|---|---|
| I/O fabric | virtio-net (vhost-net backend), ConnectX VF (SR-IOV passthrough) |
| MTU | 1500, 9000 |
| Hardware offloads (GRO/TSO/LRO) | on, off |
| Forwarding rules | none, EFG-replica 5-chain ruleset |
| Forwarder | kernel ip_forward, kernel ip_forward + nftables flowtable, VPP/DPDK userspace |
For each combination, single-stream iperf3 between client1 and client2 (i.e. across the gw-router VM, between two distinct IPv4 subnets) was measured. Because the host CPU does not vary across tests and because vCPU pinning is fixed via a Proxmox hookscript that calls taskset after the VM starts, every test runs on the same physical cores in the same NUMA configuration.
The "EFG-replica 5-chain ruleset" was constructed from observation of the live EFG. It mirrors the EFG's iptables FORWARD structure of ALIEN → TOR → IPS → UBIOS_FORWARD_JUMP → user → default chains, with conntrack lookups, protocol/port matchers, and per-chain counters that force per-packet evaluation in the slow path. The exact ruleset is in the appendix.
Before running anything in the lab, we captured the configuration of a production EFG to know what we needed to reproduce. Every command below was executed on a customer-deployed EFG running stock Ubiquiti firmware. None of these settings are user-configurable from the UI — they are baked into how the device is configured through the UniFi Web UI, which eventually reflects on the changes in the underlying Linux subsystems.
$ uname -a
Linux EFG-Home-SP 5.15.72-ui-cn9670 #5.15.72 SMP Wed Apr 15 23:39:47 CST 2026 aarch64
$ nproc
18
$ free -h
total used free shared buff/cache available
Mem: 63Gi 11Gi 46Gi 106Mi 5.3Gi 44Gi
$ uptime
02:09:29 up 8 days, 5:17, 1 user, load average: 2.52, 1.84, 1.86
Confirmed: Octeon CN9670 (per the kernel build identifier), 18 cores, 64 GB RAM. Kernel 5.15 dates from late 2021 — it predates several material networking improvements in 5.19+ (better flowtable hardware offload, improved nft, better mptcp, PPPoE flowtable acceleration in 6.2+).
$ iptables -L FORWARD -n -v --line-numbers
Chain FORWARD (policy ACCEPT 1033 packets, 157K bytes)
num pkts bytes target source destination
1 555K 775M ALIEN 0.0.0.0/0 0.0.0.0/0
2 2764K 4489M TOR 0.0.0.0/0 0.0.0.0/0
3 238M 354G IPS 0.0.0.0/0 0.0.0.0/0
4 874M 1342G UBIOS_FORWARD_JUMP 0.0.0.0/0 0.0.0.0/0
In 8 days of uptime, this device has pushed:
- 874 million packets through
UBIOS_FORWARD_JUMP - 238 million through the
IPSchain - 2.76 million through
TOR - 555 thousand through
ALIEN
Every packet that this gateway routes traverses at least 4 jump targets in sequence, plus whatever rules live inside each. Total rule count across filter, mangle, and nat tables:
$ iptables -t filter -L -n | wc -l
572
$ iptables -t mangle -L -n | wc -l
187
$ iptables -t nat -L -n | wc -l
80
839 rules total. And it's all running on the legacy iptables (xt_*) backend. The modern nft API is not in use:
$ nft list ruleset | wc -l
0
$ nft list flowtables
[empty output]
$ lsmod | grep -iE "flow_table|flowtable"
[empty output]
$ for iface in eth0 eth1 eth2 eth3; do
ethtool -k $iface | grep hw-tc-offload
done
[no output - module not loaded, feature not available]
The nf_flow_table kernel module is not loaded. There is no nft flowtable. There is no hardware tc-flower offload. The kernel's modern fast-path infrastructure — which can bypass conntrack and rule evaluation for established flows — is not even installed on this device.
This single missing piece is, as the lab measurements will show, worth a 3× to 7× single-stream throughput improvement on its own.
$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 10485760
$ sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 846
$ lsmod | grep nf_conntrack
nf_conntrack_tftp 262144 1 nf_nat_tftp
nf_conntrack_pptp 327680 1 nf_nat_pptp
nf_conntrack_h323 327680 1 nf_nat_h323
nf_conntrack_ftp 327680 1 nf_nat_ftp
Four conntrack protocol helpers loaded: FTP, PPTP, H.323, TFTP. PPTP is a deprecated VPN protocol from the late 1990s. H.323 is a videoconferencing protocol from 1996, mostly displaced by SIP. TFTP and FTP are increasingly rare in modern enterprise environments.
The actual per-packet cost of having helpers loaded is more nuanced than "every packet is inspected" — see Section 9 Finding 5 for the precise breakdown. The short version: established non-helper flows pay essentially nothing per packet (a pointer check), but every new connection pays a hash lookup against the helper registry, and any flow on a helper-recognized port (FTP/21, etc.) pays the full inspection cost.
A "Firewall Connection Tracking" toggle does exist in the UniFi controller's Gateway settings, allowing administrators to disable individual helpers (FTP, H.323, SIP, GRE, PPTP, TFTP). Disabling them all unloads the helper modules from memory entirely. This addresses the lookup cost on new flows but does not affect already-established TCP throughput (the iperf3 inter-VLAN measurement is unchanged), and does not address the bigger architectural bottlenecks documented in Sections 5-9. Section 9 Finding 5 expands on what helpers actually cost and what would be required to keep helper functionality without the cost.
$ ps -eo pid,pcpu,pmem,comm --sort=-pcpu | head -10
PID %CPU %MEM COMMAND
4098469 39.6 0.0 dpi-flow-stats
3139 12.5 0.1 ubios-udapi-ser
66687 7.8 3.1 java
4891 7.0 0.0 conntrackd
2491041 6.9 1.6 Suricata-Main
5505 6.2 0.0 mcad
8596 3.9 0.9 unifi-core
4482 3.8 0.0 ulogd
dpi-flow-stats consuming 39.6% of one CPU core continuously. Add Suricata IPS (6.9%) and conntrackd (7.0%) and you have ~54% of one core permanently consumed by per-packet inspection processes that don't forward anything — they just observe.
Crucially, these userspace processes typically run on the same core that is doing kernel forwarding. We measured this exact pattern in the lab: a competing userspace consumer on the forwarding core directly reduces forwarding throughput.
$ mpstat -P ALL 1 3 | grep Average
Average: all 4.07 0.00 3.67 0.07 0.17 0.24 0.00 0.00 0.00 91.78
Average: 0 18.40 0.00 1.39 0.00 0.35 0.00 0.00 0.00 0.00 79.86
Average: 1 13.20 0.00 1.32 0.00 0.33 0.33 0.00 0.00 0.00 84.82
Average: 2 2.68 0.00 2.01 0.00 0.34 0.34 0.00 0.00 0.00 94.63
Average: 3 6.38 0.00 1.68 0.00 0.00 0.00 0.00 0.00 0.00 91.95
[... 14 more cores all near 95–100% idle ...]
91.78% average idle across 18 cores during light load. Under a single-flow stress test the picture is sharper: one core at near-100% softirq (the kernel's softirq context where __netif_receive_skb_core and ip_forward run), seventeen sitting at 0%. Single-flow forwarding is fundamentally a single-thread workload in the Linux kernel network stack: a TCP flow's packets all hash to the same RX queue, the queue is bound to one core, and that core does all the work.
Adding cores does not help. Faster cores help linearly. Removing per-packet kernel-stack work helps dramatically. A userspace dataplane that polls the NIC across multiple worker cores can fix this entirely — see Section 7.
$ ip -br link | grep -E "^br[0-9]"
br0 UP 192.168.196.1/24
br1111 UP [no address shown]
br254 UP 192.168.254.1/24
br3 UP 192.168.3.1/24
br5 UP 192.168.5.1/24
br6 UP 192.168.6.1/24
br7 UP 192.168.7.1/24
Each VLAN gets its own bridge (br3 for VLAN 3, br5 for 5, br6 for 6, etc.) hanging off switch0 subinterfaces (switch0.3, switch0.5, etc.). Inter-VLAN traffic must traverse:
client (VLAN 3) → br3 → switch0.3 → switch0 → kernel L3 lookup
↓
ip_forward
↓
switch0.5 → br5 → client (VLAN 5)
Every L3 hop is a kernel ip_forward operation. A modern vlan-aware single bridge with bridge vlan filtering enabled and nf_flow_table could short-circuit established flows in a software fast-path. This setup cannot.
| Finding | Evidence | Impact |
|---|---|---|
| 5-chain iptables FORWARD | 874 M packets through UBIOS_FORWARD_JUMP in 8 days |
Lab: 4.95 → 2.36 Gbps when applied (53% drop) |
| No flowtable, no module | nft list flowtables empty, lsmod shows no flow_table |
Lab: virtio kernel 2.36 → 7.05 → 17.4 Gbps when added with offloads |
| Userspace inspection on data path | dpi-flow-stats 39.6% CPU, Suricata 6.9% | Permanent CPU pressure on forwarding core |
| Hardware offloads disabled | hw-tc-offload off [fixed], GRO off |
Lab: 17 Gbps (on) → 5 Gbps (off) at MTU 1500 |
| Per-VLAN bridges, no offload | 7 separate br* devices | Forces every inter-VLAN packet through kernel L3 |
| Legacy iptables, not nftables | nft list ruleset empty, 839 iptables rules |
Slower per-rule, locked out of fast-path features |
| Conntrack helpers always-on, no UI toggle | nf_conntrack_{ftp,pptp,h323,tftp} all loaded | Per-packet helper traversal for unused protocols |
| 18 cores, 1 used at a time | mpstat 91.78% idle average; single-flow saturates one core | Single-flow workloads cannot scale across cores in the kernel |
| Old kernel (5.15) | Predates several networking improvements including PPPoE flowtable | Locks out post-5.19 nftables, flowtable, and PPPoE acceleration |
| No DPDK | No cnxk PMD active despite full vendor support |
Forfeits 5-15× throughput available from the same silicon |
The first round of tests used standard virtio-net VMs on Linux bridges — the closest analogue to "hypervisor in front of network silicon" without involving the ConnectX hardware directly. The bridge vmbr1 was configured as VLAN-aware with VIDs 10 and 20.
$ iperf3 -c 10.10.20.10 -t 30
[ ID] Interval Transfer Bitrate
[ 5] 0.00-30.00 sec 59.2 GBytes 16.9 Gbits/sec sender
[ 5] 0.00-30.00 sec 59.2 GBytes 16.9 Gbits/sec receiver
16.9 Gbps. mpstat showed CPU 3 at ~12% softirq during the test. This is what jumbo MTU + GRO/TSO buys you: each "packet" through the forward path is a ~64 KB super-segment that the kernel processes once. Approximately 30,000 forward operations per second, each on one core.
[ 5] 0.00-30.00 sec 60.1 GBytes 17.2 Gbits/sec
17.2 Gbps. Surprisingly similar. With MTU 9000, even without GRO, packets are 8960 bytes each — still only ~6× the per-packet overhead of TSO super-segments. The per-packet kernel cost doesn't dominate yet.
This is the configuration that matches what real Ubiquiti customers experience. Standard internet MTU, no jumbo frames, no offloads.
[ 5] 0.00-30.00 sec 17.3 GBytes 4.95 Gbits/sec
4.95 Gbps. mpstat showed CPU 6 at 100% softirq, all other cores idle. This is the same shape as the EFG diagnostic — one core saturated, the others doing nothing. The Zen 4 core at 5+ GHz, doing nothing but softirq packet forwarding, ceilings at this number.
If we naively scale this for an Octeon ARM core at 2.0 GHz (about 3–5× slower per cycle for this workload), we'd predict ~1.0–1.6 Gbps. Real EFG measurements are in this range. We are reproducing the right physics.
$ sudo modprobe nf_conntrack
$ sudo sysctl -w net.netfilter.nf_conntrack_max=10485760
[ 5] 0.00-30.00 sec 16.9 GBytes 4.84 Gbits/sec
4.84 Gbps. Almost no impact. Module load alone is cheap; conntrack's cost shows up when rules invoke it.
table inet filter {
chain forward {
type filter hook forward priority 0; policy accept;
ct state established,related accept
ct state new accept
}
}
[ 5] 0.00-30.00 sec 16.2 GBytes 4.64 Gbits/sec
4.64 Gbps. A 4% drop from a single conntrack rule. After the first packet of a single-flow iperf3 stream, the conntrack entry exists; lookup is O(1). The cost is real but small for a single long-lived flow.
The full ruleset emulating what we observed on the EFG: 5 jump chains, conntrack per chain, per-rule counters, multiple matchers per rule:
table inet filter {
chain alien_chain { counter; ip protocol tcp counter; ip saddr 10.0.0.0/8 counter }
chain tor_chain { counter; ip protocol tcp counter; tcp flags & (syn|ack) == ack counter }
chain ips_chain { counter; ip protocol tcp counter; meta l4proto tcp counter; tcp dport { 1-65535 } counter }
chain ubios_chain { counter; ip protocol tcp counter; ct state established counter }
chain user_chain { counter; ct state established,related counter; ip saddr 10.10.10.0/24 ip daddr 10.10.20.0/24 counter }
chain forward {
type filter hook forward priority 0; policy accept;
jump alien_chain
jump tor_chain
jump ips_chain
jump ubios_chain
jump user_chain
}
}
[ 5] 0.00-30.00 sec 7.99 GBytes 2.29 Gbits/sec
2.29 Gbps. The smoking gun. A 53% drop from the no-rule baseline of 4.95 Gbps. CPU 5 was pegged at 100% softirq during the entire run.
This is the EFG's per-packet cost on a fast x86 core. Scaling for Octeon ARM at 2.0 GHz: ~500–800 Mbps. Matches user reports of EFG inter-VLAN performance in the wild.
$ iperf3 -c 10.10.20.10 -t 30 -P 8
[SUM] 0.00-30.00 sec 39.7 GBytes 11.4 Gbits/sec
11.4 Gbps aggregate across 8 streams. mpstat showed 2–3 cores busy: different flows hashed to different RX queues, different queues bound to different cores. Multi-flow forwarding scales (somewhat), but single-flow performance does not — each stream caps near the per-core ceiling.
This is why a single backup transfer or large Veeam replication will saturate at 1 Gbps even though the WAN can do 25: the flow is one TCP connection.
We replace the 5-chain ruleset with a flowtable directive:
table inet filter {
flowtable f {
hook ingress priority 0
devices = { enp6s19, enp6s20 }
}
chain forward {
type filter hook forward priority 0; policy accept;
ip protocol { tcp, udp } flow add @f
ct state established,related accept
}
}
[ 5] 0.00-30.00 sec 24.6 GBytes 7.05 Gbits/sec
7.05 Gbps. A 3.0× jump from 2.36 Gbps. flowtable installs an ingress fast-path that, after the first few packets of a flow are tracked, bypasses conntrack lookup and FORWARD chain evaluation entirely. The packet still goes through netfilter ingress hook; the slow path is just skipped.
[ 5] 0.00-30.00 sec 60.9 GBytes 17.4 Gbits/sec
17.4 Gbps. A 7.4× improvement over the EFG-style ruleset baseline (2.36 Gbps). Same hardware. Same kernel. Same single TCP stream. The only changes: flowtable directive added, offloads enabled.
| # | MTU | Offloads | Rules | Single-stream |
|---|---|---|---|---|
| 1 | 9000 | on | none | 16.9 Gbps |
| 2 | 9000 | off | none | 17.2 Gbps |
| 3 | 1500 | off | none | 4.95 Gbps |
| 4 | 1500 | off | + ct module | 4.84 Gbps |
| 5 | 1500 | off | + simple ct rule | 4.64 Gbps |
| 6 | 1500 | off | EFG 5-chain replica | 2.36 Gbps |
| 7 (8-stream) | 1500 | off | EFG 5-chain | 11.4 Gbps agg |
| A | 1500 | off | flowtable | 7.05 Gbps |
| B | 1500 | on | flowtable | 17.4 Gbps |
The virtio tests share a known limitation: virtio-net packets traverse the host's vhost-net kernel thread, which adds its own per-packet cost beyond what's in the guest. To prove that the kernel-stack overheads are independent of virtio's I/O fabric, we ran the same tests with SR-IOV pass-through of ConnectX-6 Dx Virtual Functions.
The ConnectX-6 Dx supports up to 8 SR-IOV Virtual Functions per port. Without disturbing the existing LACP bond:
$ echo 4 > /sys/class/net/enp5s0f0np0/device/sriov_numvfs
$ cat /sys/class/net/enp5s0f0np0/device/sriov_numvfs
4
Four VFs were created (VF0-VF3), assigned dedicated MACs and isolated VLANs (110/120) at the eSwitch level, and passed through to the lab VMs:
- VF0 (
0000:05:00.2) → gw-router as VLAN 10 lab NIC - VF1 (
0000:05:00.3) → gw-router as VLAN 20 lab NIC - VF2 (
0000:05:00.4) → client1 (VLAN 10) - VF3 (
0000:05:00.5) → client2 (VLAN 20)
The ConnectX-6 Dx eSwitch handled L2 between VFs in silicon — no traffic exited the physical port for the VLAN 10/20 lab traffic. The bond and the upstream network were unaffected.
Inside each VM, the VFs appeared as native ConnectX hardware via the mlx5_core driver. The VMs ran kernel ip_forward exactly as before; only the I/O fabric changed.
[ 5] 0.00-30.00 sec 88.3 GBytes 25.3 Gbits/sec
25.3 Gbps single-stream. A 5.1× improvement over the equivalent virtio test (4.95 Gbps with offloads off). With offloads on, ConnectX hardware GRO is more efficient than virtio's, so the per-superpacket cost is even lower.
[ 5] 0.00-30.00 sec 73.9 GBytes 21.1 Gbits/sec
21.1 Gbps. Only a 17% drop. With GRO collapsing wire packets into super-segments, the rule evaluation cost is amortized across ~40× fewer events. The EFG ruleset is still expensive per-event, but per-packet on the wire it's hidden by GRO.
[ 5] 0.00-30.00 sec 16.6 GBytes 4.74 Gbits/sec
4.74 Gbps. Statistically identical to the virtio-net test (4.95 Gbps). With offloads off, every wire packet hits ip_forward once. The per-packet ceiling on a Zen 4 core is the same regardless of NIC quality. The kernel stack itself is the bottleneck, not the I/O fabric, when offloads are off.
[ 5] 0.00-30.00 sec 16.4 GBytes 4.70 Gbits/sec
4.70 Gbps. Same as K3 within noise. The mlx5 kernel I/O path is heavier per-packet than virtio's vhost-net path — heavy enough that the EFG ruleset cost is hidden inside the I/O cost. Both paths still cap at the single-core software ceiling.
| # | NIC | Offloads | Rules | Single-stream |
|---|---|---|---|---|
| K1 | ConnectX VF | on | none | 25.3 Gbps |
| K2 | ConnectX VF | on | EFG 5-chain | 21.1 Gbps |
| K3 | ConnectX VF | off | none | 4.74 Gbps |
| K4 | ConnectX VF | off | EFG 5-chain | 4.70 Gbps |
The pattern is clear: with offloads off, the I/O fabric does not matter. With offloads on, it does. Hardware offloads collapse the per-packet processing cost in the kernel's hot path. Without them, even the world's fastest networking silicon ceilings around 5 Gbps single-stream because the kernel itself is the limit.
The EFG configuration disables hardware offloads. By doing so, it makes its own silicon irrelevant.
VPP (Vector Packet Processor) is a userspace network dataplane built on DPDK that bypasses the kernel network stack entirely. It is what production-grade open-source routers (TNSR, DANOS) use, and it is what most enterprise-grade NFV appliances build on. We tested it both over virtio-net and over the ConnectX VFs.
A note on relevance to the EFG: Marvell ships a fully-supported DPDK Poll Mode Driver for the OCTEON family — the cnxk PMD, which covers CN9670 (in the EFG) and CN10K (in the UDM Beast). Marvell publishes reference architectures that combine OCTEON SoCs with VPP and DPDK-accelerated Suricata. Suricata itself has had native DPDK input mode since version 7.0 (released 2023). The components Ubiquiti would need to ship a userspace dataplane on the EFG are not research projects — they are vendor-blessed, production-deployed infrastructure that has been available for years.
[ 5] 0.00-30.00 sec 23.7 GBytes 6.78 Gbits/sec
6.78 Gbps. Roughly equal to ip_forward + flowtable in the equivalent kernel test. VPP's show runtime revealed the cause:
dpdk-input Vectors/Call: 0.05 Clocks/Packet: 1810
ip4-rewrite Vectors/Call: 15.24 Clocks/Packet: 24.2
0.05 vectors per call on the input side. DPDK's whole performance story is amortizing per-syscall and per-context-switch overhead across batches of ~32–256 packets. Virtio-net feeds packets to DPDK one at a time. The polling loop is essentially empty. Userspace dataplane only delivers its promised speedup when paired with a userspace-friendly I/O backend (vhost-user) or real hardware.
[ 5] 0.00-30.00 sec 54.9 GBytes 15.7 Gbits/sec
15.7 Gbps. Better than virtio-VPP (3× better) but actually worse than kernel-on-ConnectX with offloads on (25.3 Gbps). Why? VPP doesn't do GRO. It processes wire packets individually. With offloads off on the clients, every packet on the wire is 1500 bytes, and VPP processes ~1.4 million per second on one worker core.
The per-packet path through VPP is impressively cheap (ip4-input + lookup + rewrite + tx ≈ 78 cycles end-to-end on Zen 4) but it's still doing 40× more "work events" than the kernel + GRO setup, which only sees super-segments.
[ 5] 0.00-30.00 sec 124 GBytes 35.6 Gbits/sec
35.6 Gbps single-stream. Now the clients send fewer, larger TCP segments via TSO. ConnectX hardware can transmit each segment as a single frame on the wire (GSO/TSO offload at the NIC). VPP receives the resulting larger frames and forwards them with its low per-packet cost.
This is the headline number. 35.6 Gbps single-stream userspace dataplane forwarding on real silicon. Compared against the EFG's actual production performance on the same workload (~1 Gbps), this is the 15-35× ceiling that's possible with available open-source software on the same class of hardware.
VPP with show runtime during this test:
ip4-rewrite Vectors/Call: 7.54 Clocks/Packet: 38.6
lab-vlan20-tx Vectors/Call: 8.65 Clocks/Packet: 37.9
VPP itself is doing 75–80 cycles of work per packet. On a 5 GHz core that's ~16 ns per packet. The theoretical ceiling for VPP on this hardware is hundreds of Gbps. The measured 35.6 Gbps is bottlenecked on the clients (their ability to generate packets), not on VPP.
The lab numbers are on Zen 4 at 5+ GHz. To estimate what VPP+DPDK would achieve on the EFG's ARM Cortex-A72-class cores at 2.0 GHz, we lean on published Marvell numbers and the cycle-counting visible in show runtime:
- VPP per-packet cost in the lab: ~80 cycles on Zen 4 for full IP forwarding pipeline
- ARM Cortex-A72 vs Zen 4 IPC for this workload: ~3-4× lower
- Estimated cycles per packet on Octeon CN9670: 240-320 cycles
- At 2.0 GHz: 6.25-8.3 million packets per second per core
- At 1500-byte MTU: 9-12 Gbps single-stream per worker core
- The Octeon CN9670 has dedicated NIX hardware engines that can offload portions of this further
Marvell's own published cnxk PMD benchmarks show single-core forwarding rates of 15-30 Mpps (millions of packets per second) for simple L3 forwarding, which corresponds to 18-36 Gbps at 1500 MTU per core. Across 4-6 worker cores (leaving control plane and inspection cores untouched), aggregate forwarding capacity easily reaches the 50 Gbps line rate of the EFG's two 25G ports, and single-stream throughput in the 15-25 Gbps range is realistic.
This means: on the same EFG silicon, with no hardware changes, a properly-architected DPDK dataplane should deliver 10-25× the inter-VLAN throughput the device achieves today, and eliminate the inspection-vs-forwarding CPU contention by giving each worker its own dedicated core with a vendor-supported PMD.
Many enterprise customers (especially in countries where fiber-to-the-business is delivered via GPON or XGS-PON with PPPoE authentication) report that even when they have a 10 Gbps fiber link, single-stream throughput across their EFG WAN tops out around 2–3 Gbps. This is a separate bottleneck from inter-VLAN routing, but it has the same architectural root cause — and arguably worse manifestation, because the PPPoE path forces the kernel through multiple softirq passes per packet.
PPPoE encapsulates IP traffic in PPP frames inside Ethernet (ether_type 0x8864/0x8863). Every WAN packet must:
- Be encapsulated/decapsulated by the
pppoe.kokernel module on every transit - Have its effective MTU reduced to 1492 bytes (eight bytes of PPPoE header), increasing per-packet overhead and forcing Path MTU Discovery
- Be processed by
pppdin userspace for LCP/IPCP control plane and link state — packet flow events get notified to userspace - Pass through additional packet copy for encapsulation/decapsulation in software
- Bypass the kernel's flowtable fast-path — until kernel 6.2,
nf_flow_tablehad no PPPoE support at all; flows traversing PPPoE could not be offloaded - Make multiple distinct kernel-stack passes: ingress on the underlying VLAN (eth2.11) → softirq 1 → pppoe_rcv → ip_input → ip_forward → ip_output → softirq 2 → pppoe_xmit → egress on the same or different VLAN
Combined with the per-packet kernel forward cost we measured (4.74 Gbps ceiling on a single Zen 4 core with offloads off), the additional encap/decap work, and the multi-pass softirq pattern, PPPoE single-stream throughput is fundamentally bound by:
- Single-core ip_forward + pppoe.ko packet handling, which on a 2 GHz Octeon core lands in the 1-3 Gbps range — exactly what users report
- No flowtable PPPoE acceleration (kernel 5.15 doesn't have it; the EFG runs 5.15)
- Multiple softirq cores chained together, each handling part of the encap/decap/forward chain — this spreads CPU load across cores but adds latency and inter-core cache misses without actually speeding anything up
- No DPDK PPPoE termination (would require accel-ppp or VPP's native PPPoE plugin in userspace)
The following data was captured during a single Netflix Fast.com speed test from a client device on the LAN, using the EFG's PPPoE WAN connection (Vivo XGS-PON, Brazilian ISP requiring PPPoE auth, link rated 1 Gbps but the same softwarepath would be used on a 10 Gbps link).
$ top -bn1 -d 1 | head -15
top - 03:43:15 up 8 days, 6:51, load average: 3.81, 2.62, 2.33
%Cpu(s): 5.5 us, 5.0 sy, 0.0 ni, 52.5 id, 0.0 wa, 1.3 hi, 35.6 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23 root 20 0 0 0 0 R 100.0 0.0 17:11.14 ksoftirqd/2
48 root 20 0 0 0 0 R 100.0 0.0 4:13.86 ksoftirqd/7
63 root 20 0 0 0 0 R 100.0 0.0 21:58.12 ksoftirqd/10
83 root 20 0 0 0 0 R 72.2 0.0 10:05.83 ksoftirqd/14
73 root 20 0 0 0 0 R 66.7 0.0 16:39.62 ksoftirqd/12
12 root 20 0 0 0 0 R 55.6 0.0 16:02.71 ksoftirqd/0
2491041 root 5 -15 1768064 1.1g 19584 S 44.4 1.8 6:18.31 Suricata-Main
3139 root 5 -15 383232 68736 28416 S 22.2 0.1 1495:37 ubios-udapi-ser
8596 root 20 0 20.1g 647232 85440 S 16.7 1.0 474:44 unifi-core
This is the smoking gun for PPPoE. Six different ksoftirqd threads are running at 55-100% simultaneously — cores 0, 2, 7, 10, 12, and 14 — all chewing through softirq work for what is fundamentally a single-flow workload (one TCP stream from Fast.com's backend server, through the PPPoE WAN, to the LAN client).
The reason this is even worse than the inter-VLAN smoking gun: inter-VLAN forwarding has one core saturated. PPPoE has multiple cores in continuous softirq because the path itself does multiple distinct kernel-stack passes per packet (eth2.11 ingress → pppoe_rcv → ip_input → ip_forward → ip_output → pppoe_xmit → eth2.11 egress). Each pass can land on a different core via softirq scheduling. The kernel is doing more total work per packet and spreading it across cores in a way that creates cache-coherence overhead between cores. It's the worst of both worlds — single-flow throughput limited by per-core ceiling, but multi-core CPU consumption.
The corresponding mpstat -P ALL output confirms the picture:
03:43:24 CPU %usr %sys %irq %soft %idle
03:43:24 all 5.65 2.74 1.01 32.49 58.00
03:43:24 0 50.55 0.00 0.00 49.45 0.00
03:43:24 2 0.00 0.00 1.00 81.00 18.00
03:43:24 6 0.00 0.00 0.00 61.62 38.38
03:43:24 10 1.01 1.01 2.02 66.67 29.29
03:43:24 14 0.00 0.00 0.00 85.29 14.71
03:43:24 17 0.00 0.00 0.00 100.00 0.00
Six cores at 50-100% softirq during a single Fast.com speed test. The aggregate %soft of 32.49% across 18 cores corresponds to ~5.85 cores fully consumed by softirq work — for one flow.
While ksoftirqd is burning multiple cores, the inspection processes are also running:
Suricata-Main 44.4% CPU
ubios-udapi-ser 22.2% CPU
unifi-core 16.7% CPU
ulogd 5.6% CPU
That's ~89% of one core equivalent of additional userspace work, often landing on the same cores doing softirq. The result: the cores doing softirq are being preempted by userspace, and the userspace processes are being preempted by softirq, in a continuous round-robin that prevents either from getting clean cycles.
$ lsmod | grep -i ppp
pppoe 327680 2
pppox 262144 1 pppoe
ppp_generic 327680 6 pppox,pppoe
slhc 262144 1 ppp_generic
$ ps -eo pid,pcpu,comm,args | grep pppd
2878806 0.0 pppd /usr/sbin/pppd call ppp1 nodetach
The full software PPPoE stack is loaded: pppoe.ko for PPPoE-specific encap, pppox.ko for PPP-over-X dispatch, ppp_generic.ko for the PPP framing engine, slhc.ko for VJ header compression, and pppd in userspace for control plane (LCP, IPCP, keepalives). Every WAN packet traverses all of these in sequence.
$ ethtool -k ppp1 | grep -E "tcp-segmentation|generic-(receive|segmentation)|large-receive|hw-tc-offload"
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
hw-tc-offload: off [fixed]
The [fixed] flag means the kernel module returns "this feature cannot be enabled" — they are hardcoded off in the ppp_generic driver. Even when generic-segmentation-offload was requested on (probably by some default state), the kernel refused. Pseudo-interfaces like ppp1 inherently can't do hardware TSO/LRO because there's no hardware behind them — it's a software encap layer. That's normal Linux behavior, but it means every PPPoE WAN packet gets TX-fragmented and RX-aggregated in software before being handed to or received from the underlying VLAN.
Note that generic-receive-offload: on does work for the receive path — but TSO does not exist on the egress side, so every outbound packet traverses the kernel stack individually.
$ modinfo nf_flow_table_pppoe 2>&1
modinfo: ERROR: Module nf_flow_table_pppoe not found.
$ modinfo nf_flow_table 2>&1
modinfo: ERROR: Module nf_flow_table not found.
Not just unloaded — the kernel modules don't exist on the system. They aren't compiled into Ubiquiti's 5.15.72-ui-cn9670 kernel build. Even with root access, a customer cannot load the modules to enable flowtable acceleration. The fast-path infrastructure isn't shipped at all.
$ ip link show ppp1
ppp1: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492
$ ip link show eth2.11
eth2.11@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
ppp1 MTU 1492 (1500 - 8 byte PPPoE header), eth2.11 MTU 1500. Every payload is 8 bytes smaller than it could be on raw Ethernet, increasing packet count for the same throughput. Small effect compared to the per-packet kernel cost, but it adds up at line rate.
A reasonable question: PPPoE looks complicated, with control plane (PADI/PADO/PADR/PADS handshake, LCP/IPCP negotiation, keepalives, RADIUS) and dataplane (packet encap/decap) entangled. Can DPDK actually handle this, or is it fundamentally a kernel concept?
DPDK handles it well, but with a different architecture than the kernel uses.
The kernel's approach: pppoe.ko is a single module that does both control plane (handshake, LCP/IPCP, keepalives) and dataplane (encap/decap of every packet). Both run in softirq context, on whatever cores the kernel scheduler picks. The result is what we just measured: control plane and dataplane fighting for the same cores, with userspace processes (pppd) added on top.
DPDK splits this in two:
-
Control plane stays in userspace as a regular process. Tools like
accel-ppp(the most common open-source PPPoE BNG implementation, deployed by ISPs to terminate hundreds of thousands of sessions per box) handle PADI/PADO/PADR/PADS, LCP/IPCP, keepalives, session lifecycle, RADIUS authentication — everything that happens at session establishment or once per second per session. This doesn't need to be fast; it needs to be correct. accel-ppp added DPDK support around 2020 and is what ISP-grade BNGs use today. -
Dataplane runs as a fixed-cost pipeline stage. Once the session is up, every packet just needs an 8-byte header push (egress) or pop (ingress). In VPP (which has had a native PPPoE plugin since 2018), it's literally a node in the packet processing graph:
dpdk-input → ethernet-input → pppoe-input → ip4-input → ip4-lookup
→ ip4-rewrite → pppoe-encap → interface-output
The pppoe-input and pppoe-encap nodes are tiny — they push or pop 8 bytes, update some counters, and pass the packet to the next node in the same vector batch. Per-packet overhead for adding PPPoE to a VPP pipeline is roughly 30-50% above plain L3 forwarding, not 5-10× like the kernel softirq path imposes.
The critical difference: the kernel does control plane + dataplane on the same softirq path, blocking everything. DPDK does control plane in a slow, one-time-per-session userspace daemon, and dataplane as a small fixed-cost pipeline stage running on dedicated worker cores at line rate.
On Marvell silicon specifically: the Octeon CN9670 (the EFG SoC) is explicitly marketed by Marvell as a "Smart NIC and BNG" SoC. Their reference architectures combine:
- The
cnxkDPDK PMD handling raw Ethernet frames at line rate from the NIX hardware engines - accel-ppp running in userspace on dedicated control-plane cores, handling PPPoE control plane
- Dataplane integrated into VPP's PPPoE plugin or a custom DPDK pipeline
- Suricata in DPDK mode tapping the dataplane for inspection on dedicated worker cores
ISPs deploying this stack on Octeon hardware regularly hit 40+ Gbps PPPoE termination per box with 100K+ concurrent sessions. Companies like Calix, Adtran, and a handful of NFV vendors ship enterprise BNGs based on exactly this silicon, doing exactly this PPPoE workload, at 25+ Gbps per port. This isn't research — it's commodity, vendor-blessed, production-deployed infrastructure that has existed for years.
Two independent fix paths exist:
Kernel path: Linux 6.2 (released February 2023) added PPPoE support to nf_flow_table via the nf_flow_table_pppoe module. Established TCP/UDP flows over PPPoE WAN can now be offloaded to the same software fast-path as native L3 traffic, bypassing both pppoe.ko and the netfilter slow path for in-progress flows. Combined with hardware tc-flower offload on supported NICs, modern Linux distros (OpenWrt 23.05+, recent VyOS, Mikrotik RouterOS 7) achieve near-line-rate PPPoE throughput on 10 Gbps links through software fast-path acceleration.
The EFG ships kernel 5.15 — released in late 2021, predating PPPoE flowtable acceleration by over a year. A kernel rebase to 6.6 LTS or later, with nf_flow_table_pppoe loaded and a flowtable directive added to nftables, would dramatically improve PPPoE WAN throughput without any hardware changes and without changing the dataplane architecture. The fix is a kernel module load and one nftables stanza.
DPDK path: Migrate the PPPoE termination from pppoe.ko in the kernel to accel-ppp + VPP's PPPoE plugin in userspace, on dedicated worker cores. This is the same architectural change as Fix 3 in Section 10 (DPDK + VPP for the dataplane), with PPPoE just being one more pipeline stage. Since Marvell ships full DPDK support for the Octeon CN9670 and publishes reference architectures combining DPDK + accel-ppp + VPP, this is integration work, not invention.
Using the same scaling from Section 7.1:
| Configuration | Single-stream PPPoE throughput | Notes |
|---|---|---|
| Current EFG (kernel 5.15, no flowtable, software pppoe.ko) | ~2-3 Gbps | per user reports; matches our multi-core ksoftirqd evidence |
| EFG + kernel 6.6 + nf_flow_table_pppoe enabled | ~5-8 Gbps | flowtable bypasses pppoe.ko + netfilter for established flows |
| EFG + kernel 6.6 + flowtable + hw-tc-offload | ~8-9.5 Gbps | near line-rate on 10G PPPoE links |
| EFG + DPDK (accel-ppp + VPP PPPoE plugin) | line rate on 10 Gbps (and 25G aggregate) | what ISP-grade BNGs achieve on this exact silicon |
The point: PPPoE performance is not a hardware problem either. It is the same architectural failure (single-core kernel forwarding without acceleration) compounded by an additional encapsulation layer that mainline Linux now supports accelerating, and that DPDK has handled at line rate for years. The same fixes apply, with PPPoE benefiting more than inter-VLAN does because the multi-pass softirq pattern is so much more expensive in the current implementation.
Putting together the EFG diagnostics and the lab measurements, the findings are unambiguous.
Finding 1: The kernel network stack on a single core has a ceiling around 5 Gbps single-stream when offloads are off, regardless of NIC
Evidence: virtio-net (4.95 Gbps) and ConnectX VF (4.74 Gbps) measure within experimental error on the same kernel with offloads disabled. The Zen 4 core is identical in both tests. The difference between 4.95 and 4.74 is in the noise.
Implication for the EFG: their 2 GHz Octeon ARM core has its own per-cycle ceiling that's 3-5× slower than Zen 4 for this workload, putting the EFG kernel forwarding ceiling at ~1.0–1.5 Gbps. Reported user numbers match this range. The hardware silicon is not what's limiting them; the per-core kernel stack is.
Evidence:
- virtio kernel forwarding: 4.95 Gbps (off) → 17.4 Gbps with flowtable (on) — 3.5× swing
- ConnectX VF kernel forwarding: 4.74 Gbps (off) → 25.3 Gbps (on) — 5.3× swing
EFG state: hw-tc-offload: off [fixed], generic-receive-offload: off. Hard-coded off in the firmware build.
Finding 3: The 5-chain iptables FORWARD pattern costs roughly half your throughput when offloads are also off
Evidence:
- virtio-net + offloads off: 4.95 Gbps → 2.36 Gbps when EFG-style rules are applied (52% drop)
- ConnectX VF + offloads on: 25.3 Gbps → 21.1 Gbps when applied (17% drop, hidden by GRO)
EFG state: identical rule structure (ALIEN → TOR → IPS → UBIOS_FORWARD_JUMP → user → default). Confirmed by direct iptables diagnostic showing 874 million packets having traversed UBIOS_FORWARD_JUMP in 8 days.
Evidence:
- virtio + EFG rules: 2.36 Gbps → 7.05 Gbps with flowtable added (3.0×)
- virtio + flowtable + offloads on: 17.4 Gbps (7.4× over 2.36 baseline)
EFG state: nf_flow_table module not loaded. nft list flowtables is empty. The kernel module isn't even installed on the device. This is a one-line configuration change in nftables that Ubiquiti could ship and immediately triple single-stream inter-VLAN performance.
The popular description: "Every packet is inspected by every loaded helper." This is approximately wrong. The actual cost depends on which phase of a flow the packet belongs to.
Phase 1 — New connection (SYN packet, first packet of a flow):
When conntrack creates a new entry for a flow, it walks nf_ct_helper_hash — a hash table keyed by L4 protocol + port — to determine if any registered helper applies. For TCP/21 (FTP control), it finds the FTP helper and attaches it to the conntrack entry. For TCP/443 (HTTPS), it finds nothing and attaches no helper. The per-new-connection cost is one hash lookup against the helper registry. Small but real.
This phase also touches nf_ct_expect_hash — the expectations table — to check if this new flow matches a previously-expected data connection (e.g., the data port that an active FTP control session announced via PORT or PASV). Empty expectations table = essentially zero cost; an active expectations table = small additional lookup.
Phase 2 — Established flow (every subsequent packet):
Once a flow has a conntrack entry, the per-packet helper logic in nf_conntrack_in() reads:
help = nfct_help(ct); // pointer load from conntrack entry
if (help && help->helper) // both NULL for non-helper flows
help->helper->help(skb, ct, ...);For a flow with no helper attached — the vast majority of traffic, since helper-relevant ports are rare — this is two pointer loads and a branch. Modern CPUs predict the not-taken branch perfectly. The cost on non-helper flows is essentially zero.
For flows that DO have a helper attached (e.g., active FTP control connection, ongoing SIP call), the helper's ->help() callback runs on every packet to inspect for protocol events (PORT command, RTP setup, etc.). This is genuine per-packet cost, but it only applies to flows on helper-recognized ports.
Why iperf3 throughput doesn't change when helpers are disabled: An iperf3 inter-VLAN test uses a single TCP connection on iperf3's port (5201 by default). That port is not a helper-recognized port. The connection has no helper attached. Phase 2's two-pointer-load-and-branch is essentially free. Disabling helpers via the UI removes the modules from memory, eliminating Phase 1 lookup cost on new connections — but it does not change anything in Phase 2 for non-helper flows.
Why helpers nonetheless matter at scale: An enterprise router doing ~10,000 new connections per second — driven by lots of short HTTP requests, DNS resolutions, and other transient flows — pays the Phase 1 helper-hash-lookup tax 10,000 times per second. Removing helpers eliminates that. It's not a per-packet win on data flows, it's a per-new-connection win.
The proper fix is not removing helpers: a correctly-architected router uses the netfilter flowtable for the data path. With flowtable, established flows bypass the entire netfilter chain (helpers included) and go through the offloaded fast path. Helpers continue to run on connection setup and on the control connection of helper protocols (e.g., FTP control), but the data connection of those protocols can be offloaded. You get full helper functionality and zero per-packet cost on data flows, simultaneously. This is what mainstream Linux distributions ship in 2026.
The EFG's kernel does not have flowtable compiled in (Section 11).
Four implementation approaches that would do this correctly:
-
nftables with explicit per-flow helper attachment (the modern, correct approach). Helpers attach only to flows matching explicit nftables rules — no global helper auto-attach, zero cost for any flow not matching the rule. Requires migrating from iptables to nftables.
-
Userspace conntrack helpers via netlink (kernel 3.6+). The kernel forwards control packets to a userspace daemon, which parses protocols and inserts expectations back via netlink. Pros: kernel stays small, helper bugs don't crash the kernel, helpers can be updated independently of the kernel. Cons: control-plane latency increase.
-
Don't NAT helper-protocol traffic at all. Modern protocols handle NAT traversal in the application layer (FTP passive mode, SIP+STUN/ICE, WebRTC). The kernel doesn't need to do ALG. Most enterprise gateways in 2026 have moved this direction; kernel helpers are legacy.
-
Keep helpers, add flowtable (the practical fix for an existing iptables-based system). Helpers run on connection setup and helper-protocol control channels; flowtable handles the data path of every other flow. Best compatibility with existing rule sets.
EFG state: A "Firewall Connection Tracking" toggle in the UniFi controller's Gateway settings exposes individual checkboxes for FTP, H.323, SIP, GRE, PPTP, and TFTP helpers. Disabling them all unloads the helper modules entirely — which addresses Phase 1 lookup overhead on new connections but does nothing for the bigger architectural issues. The toggle's existence confirms that Ubiquiti's engineering team is aware that helpers cost something. They have implemented a partial fix (the toggle) instead of the proper fix (flowtable). The proper fix would require shipping nf_flow_table.ko, which they have chosen not to do (Section 11).
Evidence: 8 parallel streams with the EFG ruleset reach 11.4 Gbps aggregate (~1.4 Gbps per stream). Single stream caps at 2.36 Gbps. The EFG's mpstat shows all 18 cores idle except the one with the active flow.
EFG state: 18 cores, but RSS hashes a single TCP 5-tuple to one queue, which binds to one core. Adding cores to a kernel-based router cannot fix single-flow performance. Faster per-core, fewer per-packet steps, hardware offload, or a userspace dataplane (which can poll across worker cores) can.
Finding 7: DPDK on the same silicon delivers 10-25× the throughput, and the vendor ships full DPDK support
Evidence:
- Lab VPP/DPDK on ConnectX with offloads: 35.6 Gbps single-stream (15× over the EFG-style baseline)
- Marvell's published cnxk PMD benchmarks: 18-36 Gbps single-core on CN9670-class silicon
- Suricata 7.0+: native DPDK input mode shipped 2023
- VPP: native cnxk plugin shipped 2020
- The full reference architecture (DPDK + VPP + Suricata-on-DPDK) is published by Marvell and field-deployed by NFV vendors
EFG state: zero DPDK. The cnxk PMD is not loaded. Suricata runs in pcap mode (per-packet kernel→userspace copy) instead of DPDK mode. Ubiquiti would lose nothing by adopting DPDK — their primary inspection workload (Suricata) supports it, their silicon vendor supports it, and the resulting performance on the same hardware would be 10-25× higher.
Evidence (EFG): dpi-flow-stats at 39.6% CPU + Suricata-Main at 6.9% + conntrackd at 7.0% = ~54% of one core continuously consumed by per-packet inspection that performs no forwarding.
Evidence (lab): a deliberate spinner pinned to a non-forwarding core had no effect (correctly isolated). When CPU contention is on the forwarding core, throughput drops proportionally.
Implication: even if Ubiquiti fixed every other issue, the inspection processes still pin a chunk of one core's cycles, leaving less for forwarding. The fix is to move them off the data-path core (kernel taskset/cgroup), or use kernel-side offloaded sampling (sFlow hardware counters), or — the best fix — use Suricata in DPDK mode on dedicated worker cores.
Finding 9: Per-VLAN bridges instead of vlan-aware single bridge prevent kernel fast-path optimization
Evidence (EFG): br0, br3, br5, br6, br7, br254, br1111 — one bridge per VLAN. Inter-VLAN traffic must traverse multiple bridge hops plus a kernel L3 lookup.
Lab equivalent: vmbr1 with VLAN-aware mode and bridge VID filtering allows a single bridge to handle all VLANs. With flowtable on top, established flows skip the bridge slow path entirely.
Implication: even without flowtable, switching to a vlan-aware bridge architecture would simplify the data path and enable bridge VID hardware offload paths that the current per-bridge structure cannot use.
Finding 10: PPPoE WAN performance is bottlenecked by the same kernel stack, with additional encapsulation cost — and worse multi-core spread
Evidence (deployment reports): enterprise customers on 10 Gbps PPPoE fiber consistently report 2-3 Gbps single-stream WAN throughput on the EFG.
Evidence (live capture during a Netflix Fast.com test on a production EFG): six different ksoftirqd kernel threads simultaneously consuming 55-100% CPU (cores 0, 2, 7, 10, 12, 14), with concurrent userspace inspection load (Suricata 44%, ubios-udapi-ser 22%, unifi-core 16%) competing for the same cores. The PPPoE encap/decap path forces multiple kernel-stack passes per packet, each potentially landing on a different core, multiplying total CPU consumption while not improving single-flow throughput.
Evidence (mainline Linux): kernel 6.2+ ships nf_flow_table_pppoe for PPPoE flowtable acceleration. The EFG runs kernel 5.15. The nf_flow_table and nf_flow_table_pppoe modules are not even compiled into Ubiquiti's kernel build — modinfo returns "Module not found" for both.
Implication: PPPoE WAN performance is not a hardware limitation. It is the same per-core kernel ceiling as inter-VLAN routing, with an additional encapsulation layer that mainline Linux now supports accelerating, and a multi-pass softirq pattern that is more expensive than plain inter-VLAN forwarding. The fix is a kernel rebase plus the same flowtable directive — or DPDK + accel-ppp + VPP, which Marvell publishes as a reference architecture for this exact silicon.
Finding 11: The EFG's kernel is binary-incompatible with vanilla 5.15.72 despite identifying as such, and the safety net that would catch this is disabled
Evidence: We cross-compiled nf_tables, nf_flow_table, and nf_flow_table_inet from vanilla linux-5.15.72.tar.xz (kernel.org), using the EFG's exposed /proc/config.gz as the build configuration. The resulting modules report a vermagic string identical character-for-character to the EFG's existing in-tree modules: 5.15.72-ui-cn9670 SMP mod_unload aarch64. Loading nf_tables.ko on the device caused an immediate kernel panic (NULL pointer dereference at virtual address 0x120 during module init), forcing a watchdog reboot.
Evidence (config audit):
$ zcat /proc/config.gz | grep -E "MODVERSIONS|TRIM_UNUSED_KSYMS|MODULE_SIG"
CONFIG_HAVE_ASM_MODVERSIONS=y
# CONFIG_MODVERSIONS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
[no CONFIG_MODULE_SIG entries]
CONFIG_MODVERSIONS would have caught the binary incompatibility at load time with a clean error message. It is disabled. CONFIG_MODULE_SIG (cryptographic module signing) is not even built into the kernel. lockdown is not enabled. The root filesystem is writable via overlay.
Implication: Two findings, both serious.
First, the EFG's kernel is not actually vanilla 5.15.72 even though it identifies as 5.15.72-ui-cn9670 and reports the upstream version. Ubiquiti has applied undisclosed patches that change netfilter's internal data structures or function signatures. Customers who attempt to enable missing kernel features by building from the announced upstream tag will produce modules that load (because vermagic matches) but crash (because the real ABI doesn't). This is exactly why the GPL exists — it requires vendors to publish the complete corresponding source so customers can rebuild against the actual kernel they received, not the vanilla one it claims to be.
Second, the security configuration is unusually permissive for an enterprise security product: no module signing, no kernel lockdown, no symbol-CRC verification, writable root via overlay. Any process that becomes root can load arbitrary unsigned, unverified kernel modules with no cryptographic check. Combined with the binary-incompatible-but-not-detected ABI, this is a pathway for both accidental crashes and deliberate exploitation.
A GPL source request was filed with [email protected] at the time of this writing. Until it is fulfilled, even a customer with full root access on hardware they own cannot enable the missing performance features safely. Section 11 documents this experiment in detail.
The findings above translate directly to a list of prioritized configuration changes Ubiquiti could ship. None of these require new hardware. All are available in mainline Linux or as vendor-supported infrastructure from Marvell. Several are config changes that do not even require a kernel update.
What: Load the nf_flow_table kernel module and add a flowtable directive to the active nftables ruleset. The hook is software-only (no hardware offload required) and works on any modern kernel (5.4+).
Configuration sketch:
table inet filter {
flowtable f {
hook ingress priority 0
devices = { eth_lan_vlan10, eth_lan_vlan20, ... }
}
chain forward {
type filter hook forward priority 0; policy accept;
ip protocol { tcp, udp } flow add @f
ct state established,related accept
... existing security rules ...
}
}
Measured improvement: 2.36 → 7.05 Gbps single-stream (3.0×) on virtio. Combined with offloads enabled: 17.4 Gbps (7.4×).
Trade-off: Flows in the fast-path bypass conntrack and rule evaluation. Security rules must be applied to the first few packets of a flow, before it's offloaded. Existing iptables/nftables rules continue to work; only established flows are accelerated. The IPS / DPI processes that need every packet would need to be moved to a different inspection point (e.g., promiscuous tap on the bridge, or sFlow sampling) — but most of them only need flow-level visibility, which conntrack already provides.
What: Stop hard-coding hw-tc-offload off [fixed]. Enable GRO and TSO on the kernel side. On the Octeon CN9670 (and CN10K on the UDM Beast), enable the NIX hardware acceleration path — these are first-party Marvell engines designed to forward packets without ARM core involvement.
Measured improvement: 4.74 → 25.3 Gbps single-stream (5.3×) on ConnectX VF with kernel forwarder when offloads enabled. The same pattern applies to any NIC with hardware-accelerated forwarding, including the Octeon NIX.
Trade-off: Hardware offload paths typically require the kernel and the device firmware to agree on which features can be offloaded. Some advanced features (like complex iptables matchers) can't be offloaded; the kernel falls back to software for those packets. This is a graceful degradation, not a failure — the fast path handles the common case, slow path handles edge cases. Modern flowtable in switchdev mode (which ConnectX-6 Dx and Octeon CN9670 both support) hands established TCP/UDP flows directly to silicon.
What: Migrate the forwarding plane from kernel ip_forward to VPP with the Marvell-supported cnxk DPDK PMD. Move Suricata to its native DPDK mode (available since Suricata 7.0). Pin VPP worker threads and Suricata workers to dedicated CPU cores, leaving the control plane (UniFi management, control plane protocols, dpi-flow-stats summaries) on a separate core.
Why this is the biggest win: Marvell publishes complete DPDK + VPP reference architectures for the OCTEON family. The cnxk PMD is open-source, well-maintained, and ships with mainline DPDK. Suricata's DPDK mode is production-deployed by major NFV vendors. Every component Ubiquiti needs is already vendor-supported, mainline open-source software. They lose nothing by adopting it.
Estimated improvement on EFG silicon:
- Single-stream inter-VLAN: from ~1 Gbps to 15-25 Gbps (15-25×)
- PPPoE WAN single-stream: from ~3 Gbps to 8-10 Gbps (line rate on 10G PPPoE)
- Aggregate: from a few Gbps to line rate on both 25G ports (50 Gbps)
- Inspection (Suricata): from kernel-pcap mode to DPDK direct, eliminating per-packet kernel→userspace copy
Trade-off: Largest engineering investment of any fix. Ubiquiti would need to rewrite their forwarding plane on top of VPP's API and integrate VPP's CLI/API with their UniFi controller. However, all the heavy lifting (the PMD, the dataplane, the Suricata DPDK integration) already exists. They are integrating, not inventing.
What: Replace br0, br3, br5, ... with a single bridge in bridge_vlan_filtering=1 mode, with VID assignments per port. Combined with nf_flow_table on the same bridge, this enables flowtable to short-circuit established flows entirely within the bridge layer.
Measured improvement: Indirect — enables Fix 1 and Fix 2 to be more effective, particularly for inter-VLAN flows that today must traverse multiple bridges. Direct measurements not made in this study, but Linux upstream has documented order-of-magnitude improvements in similar setups.
Trade-off: Configuration migration. Existing ruleset references to specific bridge devices need updating to reference the unified bridge. Manageable as a firmware update.
What: Use cgroup or taskset to ensure dpi-flow-stats, Suricata-Main, conntrackd, and similar processes do not run on the same CPU cores that handle network softirqs. On an 18-core Octeon, the data path occupies one core (sometimes two); the other 16+ are mostly idle. Pin inspection there.
Measured improvement: Indirect. Frees the forwarding core from contention, which becomes meaningful when the forwarder is pushing close to its single-core ceiling.
Trade-off: None of consequence. This is a basic Linux performance hygiene setting that any production router enables. The cost is a few sysfs/systemd-cgroup changes in the boot configuration. Becomes moot after Fix 3 (with DPDK, each worker has its own dedicated core by design).
What: The current ruleset is on the legacy iptables (xt_*) backend with 839 rules. Native nftables is faster per-rule, supports flowtable natively (Fix 1 builds on this), supports atomic ruleset replacement (no flushing), and is the future of Linux netfilter.
Measured improvement: Single-digit percentage points on its own; enables Fix 1 to reach its full potential.
Trade-off: Migration cost. Tools like iptables-translate automate most of it. The tools that produce the existing ruleset (presumably internal Ubiquiti config generators) need to emit nft syntax instead.
What: The UniFi controller already exposes a "Firewall Connection Tracking" control in Gateway settings, with checkboxes for FTP, H.323, SIP, GRE, PPTP, and TFTP helpers. Enterprise deployments without those legacy protocols can disable them all to unload the helper modules entirely.
What this actually does: Removes Phase 1 helper-hash-lookup overhead on new connections (see Section 9 Finding 5). On a router doing tens of thousands of new connections per second, this is a meaningful reduction in connection-setup CPU cost.
What this does NOT do: It does not change throughput on already-established TCP flows like an iperf3 test. The Phase 2 per-packet cost on non-helper flows is essentially zero whether helpers are loaded or not. iperf3 inter-VLAN single-stream throughput is unchanged.
Why this is a partial fix: The architecturally correct answer is to use the kernel's flowtable for the data path so that established flows bypass the entire netfilter chain (helpers and all) at line rate, while helpers continue to handle the control connections of legitimate helper-protocol traffic. That requires shipping nf_flow_table.ko, which the EFG does not have (Section 11). The toggle's existence is evidence that Ubiquiti's engineering team understands the helpers-cost-something question; they have shipped a partial mitigation rather than the proper fix.
Recommended action for administrators: If your deployment doesn't use FTP active-mode NAT, H.323 video conferencing, SIP through ALG (most modern SIP deployments use STUN/ICE instead), PPTP VPN, or TFTP, disable all of them. It's a free win on connection-setup costs.
What: Linux 5.15 LTS dates from late 2021. Kernel 6.6 LTS (the current LTS) includes substantial nftables, flowtable, bridge improvements, and PPPoE flowtable acceleration via nf_flow_table_pppoe (kernel 6.2+). Kernel 6.12 LTS includes hardware-offloaded flowtable for several NICs and improved per-CPU optimizations.
Measured improvement: Compounding with Fix 1, Fix 2, and the PPPoE acceleration. Recent kernels have made nf_flow_table faster per-packet, made hardware-offload setup easier, and added PPPoE-specific acceleration that the EFG completely lacks today.
Trade-off: Vendor kernel update. The Octeon vendor BSP (Marvell's "ubuntu-cn9670") will need to be rebased on a newer kernel. Not trivial but routine for a hardware vendor; Marvell themselves publish 6.x-based BSP releases.
| Priority | Fix | Effort | Single-stream improvement |
|---|---|---|---|
| 1 | Enable flowtable | Low (config) | 3.0× |
| 2 | Enable hardware offloads | Low–Medium (config + firmware) | up to 5.3× |
| 3 | Adopt DPDK + VPP + Suricata-DPDK | High (engineering) | 15-25× — and fixes PPPoE too |
| 4 | Newer kernel (5.15 → 6.6+) | Medium | enables PPPoE flowtable, +small kernel gains |
| 5 | Pin inspection processes off data-path core | Low (config) | small but additive |
| 6 | Per-VLAN bridges → vlan-aware single bridge | Medium (config migration) | enables 1+2 |
| 7 | iptables → nftables | Medium | enables 1, small direct |
| 8 | Conntrack helper toggles (already shipped — disable in UI) | Free (UI checkbox) | none on iperf3, small on connection setup |
Doing Fix 1 alone gets you 3× the single-stream throughput. Fix 1+2 gets you 7×. Fix 3 — the long-term architectural fix that the silicon vendor literally publishes a reference architecture for — gets you 15-25×. The hardware does not need to change.
The analysis to this point rests on lab measurements made on x86 hardware that reproduces the EFG's software stack. The lab data is reproducible and self-consistent, but a fair reader can ask: would the recommended fixes actually work on the real device?
To find out, we attempted the most surgical of the recommended fixes — adding the missing nftables flowtable kernel modules — to a production EFG. The exercise was instructive in ways we did not anticipate, and the results materially strengthen Section 9's findings about the EFG's kernel.
What follows is a complete, honest record of the attempt. Both attempts ultimately crashed the device. Neither outcome was the desired success path, but the failure modes themselves are diagnostic — they reveal precisely how far Ubiquiti's kernel diverges from any reproducible public source.
Loading a third-party kernel module into a running kernel requires a few prerequisites:
- A matching kernel version (
vermagic). The Linux module loader rejects any module whosevermagicstring doesn't match the running kernel's exactly. - Module loading not blocked by signing. If
CONFIG_MODULE_SIG_FORCE=yormodule.sig_enforce=1, only modules signed by an in-kernel trusted key can load. - No kernel lockdown. If a Secure Boot lockdown is engaged, module loading from disk is restricted regardless of signing config.
- A writable filesystem location, since module files must be readable from disk by
init_module(2)orfinit_module(2).
We confirmed each on a production EFG via SSH:
$ cat /proc/cmdline
console=ttyAMA0,115200n8 earlycon=pl011,0x87e028000000 maxcpus=18 isolcpus=12
rootwait rw coherent_pool=16M pcie_aspm=off net.ifnames=0 sysid=ea3d
root=PARTUUID=...
No module.sig_enforce=1. No lockdown= argument. No lsm=lockdown,....
$ cat /sys/module/module/parameters/sig_enforce
N
Module signing not enforced.
$ zcat /proc/config.gz | grep -E "^CONFIG_(MODULE_SIG|SECURITY_LOCKDOWN|MODVERSIONS|TRIM_UNUSED_KSYMS)"
# CONFIG_MODULE_SIG is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
This was both encouraging and concerning. Encouraging because it meant we had a clean path to load a custom-built module if we could match vermagic. Concerning because these missing options are exactly the safeguards a production firmware should have:
MODULE_SIG: prevents loading unsigned modules. Any process withCAP_SYS_MODULE(root, in containers if not seccomp'd) can load arbitrary kernel code.MODVERSIONS: adds CRC checksums to every exported symbol. A module built against a kernel with subtly different struct layouts will be refused at load time rather than crashing the kernel later.TRIM_UNUSED_KSYMS: limits the surface area of exposed kernel symbols.SECURITY_LOCKDOWN_LSM: restricts what root can do to a running kernel.
The implications of these absences are explored further in Section 9, Finding 10. For the experiment, they meant that load-time symbol mismatches would not be caught — the kernel would happily start executing code with bad assumptions about struct layouts.
The EFG's filesystem is overlayfs root with a writable upper layer at /mnt/.rwfs/data. Modules placed in /tmp survive long enough to load.
The flowtable modules (nf_flow_table.ko, nf_flow_table_inet.ko, plus nf_tables.ko as a dependency) are absent from the EFG's /lib/modules/:
$ find /lib/modules/$(uname -r) -name 'nf_flow_table*' -o -name 'nf_tables.ko'
[no output]
$ modinfo nf_flow_table
modinfo: ERROR: Module nf_flow_table not found
The modules are not merely disabled; they are not present in the build. We needed to compile them ourselves.
A separate build VM was provisioned on the lab host:
- Ubuntu 24.04 LTS, 16 vCPU, 32 GB RAM
gcc-10-aarch64-linux-gnu10.5.0 from the noble-universe repository (matches the EFG's compiler family)- Linux 5.15.72 source tree from
kernel.org
The EFG's running kernel reports itself as:
$ uname -r
5.15.72-ui-cn9670
$ uname -a
Linux EFG-Home-SP 5.15.72-ui-cn9670 #5.15.72 SMP Wed Apr 15 23:39:47 CST 2026
aarch64 GNU/Linux
$ strings /lib/modules/5.15.72-ui-cn9670/kernel/net/netfilter/nf_conntrack_ftp.ko \
| grep -E '^(vermagic|name)='
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_conntrack_ftp
The build process:
$ export ARCH=arm64
$ export CROSS_COMPILE=aarch64-linux-gnu-
$ export CC=aarch64-linux-gnu-gcc-10
$ cd ~/efg-build/vanilla-5.15.72/linux-5.15.72
$ cp ~/efg-build/efg-running.config .config
# Set CONFIG_LOCALVERSION inside the .config (not the env)
$ ./scripts/config --set-str CONFIG_LOCALVERSION "-ui-cn9670"
# Enable the modules we want to build
$ ./scripts/config --module CONFIG_NF_TABLES
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE_INET
# Disable BTF generation (would require pahole on EFG kernel — not available)
$ ./scripts/config --disable CONFIG_DEBUG_INFO_BTF
# Reconcile
$ make olddefconfig
$ time make -j$(nproc) modules
real 1m52s
$ for ko in net/netfilter/nf_tables.ko \
net/netfilter/nf_flow_table.ko \
net/netfilter/nf_flow_table_inet.ko; do
strings $ko | grep -E '^(vermagic|name)='
done
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_flow_table
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_flow_table_inet
Three modules. All vermagic strings byte-perfect matches for the EFG kernel.
The modules were copied to the EFG and loading was attempted in dependency order:
$ scp nf_tables.ko nf_flow_table.ko nf_flow_table_inet.ko \
root@efg-prod:/tmp/
$ ssh root@efg-prod
# cd /tmp
# insmod ./nf_tables.ko
[connection drops, device reboots]
The kernel oops, captured before the watchdog reboot:
[ ... ] Unable to handle kernel NULL pointer dereference at virtual address 0x120
[ ... ] Mem abort info:
[ ... ] ESR = 0x96000004
[ ... ] FSC = 0x4: level 0 translation fault
[ ... ] Internal error: Oops: 96000004 [#1] SMP
[ ... ] Modules linked in: nf_tables(+) wireguard libchacha20poly1305 ...
[ ... ] CPU: 3 PID: 211748 Comm: insmod Tainted: P W O 5.15.72-ui-cn9670 #5.15.72
[ ... ] Hardware name: Marvell OcteonTX CN96XX board (DT)
[ ... ] pc : nf_tables_init_net+0x18/0x94 [nf_tables]
[ ... ] lr : ops_init+0x3c/0x120
[ ... ] Call trace:
[ ... ] nf_tables_init_net+0x18/0x94 [nf_tables]
[ ... ] ops_init+0x3c/0x120
[ ... ] register_pernet_operations+0xec/0x240
[ ... ] register_pernet_subsys+0x2c/0x50
[ ... ] nf_tables_module_init+0x24/0x100 [nf_tables]
The HA secondary in the home cluster failed over within ~8 seconds. Service was restored without operator intervention.
The crash happened at byte 24 of the function nf_tables_init_net — extremely early in the per-network-namespace initialization. nf_tables_init_net is one of the very first things register_pernet_subsys calls when the module starts up. It tries to read a field at offset 0x120 from a struct pointer that the kernel allocated, and the kernel handed back a struct that doesn't have a valid pointer at that offset.
This isn't a "missing symbol" error or a "wrong function signature" error. The module loaded successfully. Its symbols resolved against the running kernel's symbol table. Execution started. And then, within microseconds, it dereferenced a struct field at an offset where the running kernel doesn't have what our module expected.
That's an ABI mismatch — the structure layout in our build's view of the kernel is different from the structure layout in the EFG's running kernel.
The crash happens because:
# CONFIG_MODVERSIONS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
Without MODVERSIONS, the kernel module loader has no per-symbol CRC to compare. Vermagic only checks "this is kernel 5.15.72-ui-cn9670 SMP aarch64" — it doesn't say "the struct net has a particular field at offset 0x120." If the EFG's nf_tables_pernet struct has a different field count than vanilla's, the build still produces a module that loads cleanly. It just crashes when execution hits a misaligned access.
This means either:
- (a) Ubiquiti rebased Linux 5.15.72 on top of patches from a different kernel version, OR
- (b) Ubiquiti or a vendor (Marvell) added fields to internal structures that vanilla 5.15.72 doesn't have, OR
- (c) Both.
Section 11.5 below addresses (b) directly by attempting to build against Marvell's complete published BSP — the largest plausible source of vendor-specific kernel patches for this silicon.
The Marvell OCTEON CN9670 SoC has substantial vendor-specific Linux support that is not in mainline. Marvell maintains kernel patches for their hardware engines (NIX network units, RVU resource virtualization, NPA packet allocator, SSO event scheduler, CPT crypto), and these patches frequently touch core kernel infrastructure including netfilter (where Marvell integrates hardware flow offload acceleration).
Marvell publishes their kernel patches through the Yocto Project's linux-yocto repository, branch v5.15/standard/cn-sdkv5.15/octeon, maintained by Bo Sun (Marvell engineer) and merged by the Yocto Project's kernel maintainer (Bruce Ashfield). This is a public, GPL-licensed source tree.
$ git clone https://git.yoctoproject.org/linux-yocto.git linux-yocto-cnxk-5.15
$ cd linux-yocto-cnxk-5.15
$ git checkout v5.15/standard/cn-sdkv5.15/octeon
$ head -5 Makefile
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
PATCHLEVEL = 15
SUBLEVEL = 203
EXTRAVERSION =
The branch HEAD is at 5.15.203 (a stable update) with the full Marvell OCTEON CN9K patch set applied on top.
Examination of the source tree shows the BSP modifies sixteen netfilter-related header files compared to vanilla Linux 5.15.72:
$ for f in $(find ~/vanilla-5.15.72/include -name "*netfilter*" -o -name "*nf_*"); do
rel=${f#*/include/}
bsp=~/linux-yocto-cnxk-5.15/include/$rel
if [ -f "$bsp" ] && ! diff -q "$f" "$bsp" >/dev/null 2>&1; then
echo "DIFFERS: $rel"
fi
done
DIFFERS: net/netfilter/nf_conntrack.h
DIFFERS: net/netfilter/nf_conntrack_count.h
DIFFERS: net/netfilter/nf_conntrack_timeout.h
DIFFERS: net/netfilter/nf_flow_table.h
DIFFERS: net/netfilter/nf_nat_redirect.h
DIFFERS: net/netfilter/nf_tables.h
DIFFERS: net/netfilter/nf_tables_core.h
DIFFERS: net/netfilter/nf_tproxy.h
DIFFERS: net/netns/netfilter.h
DIFFERS: linux/netfilter.h
DIFFERS: linux/netfilter_defs.h
DIFFERS: linux/netfilter/nf_conntrack_sctp.h
DIFFERS: uapi/linux/netfilter_bridge.h
DIFFERS: uapi/linux/netfilter/nf_conntrack_common.h
DIFFERS: uapi/linux/netfilter/nf_conntrack_sctp.h
DIFFERS: uapi/linux/netfilter/nf_tables.h
Several of these headers contain function-signature changes that explain why a vanilla-built module would crash. For example, in nf_conntrack_count.h:
-unsigned int nf_conncount_count(struct net *net,
- struct nf_conncount_data *data,
- const u32 *key,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone);
+unsigned int nf_conncount_count_skb(struct net *net,
+ const struct sk_buff *skb,
+ u16 l3num,
+ struct nf_conncount_data *data,
+ const u32 *key);
The function was renamed, and its signature changed. In nf_flow_table.h:
-int flow_offload_route_init(struct flow_offload *flow,
- const struct nf_flow_route *route);
+void flow_offload_route_init(struct flow_offload *flow,
+ struct nf_flow_route *route);
Return type changed from int to void; const removed from the route argument.
The same header backports a feature from kernel 6.2 — PPPoE flowtable acceleration — into 5.15:
+static inline bool nf_flow_pppoe_proto(struct sk_buff *skb, __be16 *inner_proto)
+{
+ if (!pskb_may_pull(skb, ETH_HLEN + PPPOE_SES_HLEN))
+ return false;
+
+ *inner_proto = __nf_flow_pppoe_proto(skb);
+ return true;
+}
This last item is significant: Marvell's BSP includes a PPPoE flowtable backport that mainline 5.15 does not have. If we can build a module against this BSP and load it on the EFG, we should — in principle — get not only inter-VLAN flowtable acceleration but PPPoE flowtable acceleration as well.
The build:
$ cd linux-yocto-cnxk-5.15
# Force SUBLEVEL=72 to match EFG vermagic (BSP HEAD is 5.15.203)
$ sed -i 's/^SUBLEVEL = .*/SUBLEVEL = 72/' Makefile
# Suppress kbuild dirty marker
$ touch .scmversion
# Apply EFG running config and target modules
$ cp ~/efg-build/efg-running.config .config
$ ./scripts/config --set-str CONFIG_LOCALVERSION "-ui-cn9670"
$ ./scripts/config --module CONFIG_NF_TABLES
$ ./scripts/config --enable CONFIG_NF_TABLES_INET
$ ./scripts/config --enable CONFIG_NF_TABLES_IPV4
$ ./scripts/config --enable CONFIG_NF_TABLES_IPV6
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE_INET
$ ./scripts/config --enable CONFIG_NF_FLOW_TABLE_IPV4
$ ./scripts/config --enable CONFIG_NF_FLOW_TABLE_IPV6
$ ./scripts/config --disable CONFIG_DEBUG_INFO_BTF
$ ./scripts/config --disable CONFIG_MODULE_SIG_ALL
$ make olddefconfig
$ make kernelrelease
5.15.72-ui-cn9670
$ time make -j$(nproc)
real 1m59s
Five modules built, all with byte-perfect vermagic:
$ for ko in $(find . -name 'nf_tables.ko' -o -name 'nf_flow_table*.ko' | sort); do
echo "=== $(basename $ko) ==="
strings $ko | grep -E '^(vermagic|name|depends)='
done
=== nf_flow_table.ko ===
name=nf_flow_table
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_inet.ko ===
name=nf_flow_table_inet
depends=nf_flow_table,nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_ipv4.ko ===
name=nf_flow_table_ipv4
depends=nf_flow_table,nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_ipv6.ko ===
name=nf_flow_table_ipv6
depends=nf_flow_table,nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_tables.ko ===
name=nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
# insmod ./nf_tables.ko
[connection drops, device reboots]
Captured kernel trace before reboot:
[ 3368.013405] Unable to handle kernel NULL pointer dereference at virtual address 0
[ 3368.022216] Mem abort info:
[ 3368.025005] ESR = 0x96000005
[ 3368.028072] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3368.033402] FSC = 0x05: level 1 translation fault
[ 3368.074382] Modules linked in: nf_tables(+) wireguard libchacha20poly1305 ...
xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O)
xt_dyn_random ip6table_nat xt_conntrack xt_connmark xt_TCPMSS pppoe
pppox bonding xt_dpi(O) ip6table_mangle iptable_mangle ip6table_filter
ip6_tables uio_pdrv_genirq ui_lcm(O) ifb ppp_generic slhc
ubnthal(PO) ubnt_common(PO) drm drm_panel_orientation_quirks
[ 3368.121977] CPU: 3 PID: 211748 Comm: insmod Tainted: P W O 5.15.72-ui-cn9670 #5.15.72
[ 3368.130936] Hardware name: Marvell OcteonTX CN96XX board (DT)
[ 3368.143638] pc : nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.149059] lr : ops_init+0x3c/0x120
[ 3368.227314] x2 : ffff00019027b300 x1 : 0000000000000000 x0 : 0000000000000000
[ 3368.229754] Call trace:
[ 3368.234825] nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.238053] ops_init+0x3c/0x120
[ 3368.242840] register_pernet_operations+0xec/0x240
[ 3368.247195] register_pernet_subsys+0x2c/0x50
[ 3368.252609] nf_tables_module_init+0x24/0x100 [nf_tables]
Identical crash signature. nf_tables_init_net+0x18, called from the same path.
Two builds:
| Source tree | Result |
|---|---|
| Vanilla Linux 5.15.72 (kernel.org) | Crash at nf_tables_init_net+0x18 |
Marvell BSP linux-yocto v5.15/standard/cn-sdkv5.15/octeon HEAD with SUBLEVEL forced to 72 |
Crash at nf_tables_init_net+0x18 |
If the crash were caused by Marvell BSP patches, the BSP-built module would have crashed somewhere different (or — ideally — not at all). It crashed at the exact same instruction. That tells us:
- The crash is NOT primarily caused by Marvell BSP patches; it's caused by something on top of the BSP
- Ubiquiti has applied additional, non-public patches to the kernel that affect netfilter per-net data layout
- These additional patches are not derivable from any combination of Linux mainline + Marvell's published OCTEON BSP
The Modules linked in line of the panic trace lists the modules already loaded on the EFG when our module tried to initialize:
xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O)
xt_dyn_random ip6table_nat xt_conntrack xt_connmark ...
xt_dpi(O) ... ui_lcm(O) ... ubnthal(PO) ubnt_common(PO)
The taint flags (O) and (PO) in Linux's module taint vocabulary mean:
O— out-of-tree moduleP— proprietary (non-GPL) modulePO— both proprietary and out-of-tree
The presence of t_miner(PO), tdts(PO), nf_app(PO), xt_geoip(O), xt_dyn_random, tm_crypto(O), xt_dpi(O), ui_lcm(O), ubnthal(PO), and ubnt_common(PO) in the running kernel's module list is documentary evidence of the closed-source kernel modules Ubiquiti is shipping.
Section 13 returns to this point to evaluate the GPL implications.
Before drawing conclusions, we examined the EFG's existing kernel modules to determine whether Ubiquiti ships debug information that could aid investigation.
$ file /lib/modules/$(uname -r)/kernel/net/netfilter/nf_conntrack_ftp.ko
/lib/modules/.../nf_conntrack_ftp.ko: ELF 64-bit LSB relocatable, ARM aarch64,
version 1 (SYSV), BuildID[sha1]=5827c50c..., not stripped
$ readelf -S nf_conntrack_ftp.ko | grep -i debug
[30] .gnu_debuglink PROGBITS 0000000000000000 00001ed0
Modules are not stripped — symbol tables are intact, function and variable names are preserved. However, the only debug section is .gnu_debuglink, which is a 4-byte CRC + filename pointer that says "the actual debug info is in a separate file." That separate file (*.ko.debug) is not shipped on the production firmware.
This is by itself a defensible engineering decision (debug files are large), but combined with MODVERSIONS=N and kptr_restrict=0 (see Section 12 below), it creates a peculiar combination:
- A normal user with sufficient privilege can dump the running kernel's complete symbol table at full virtual addresses
- But cannot match those symbols to source-level constructs (struct field names, member offsets) without the debug info
- And cannot rely on the kernel's own ABI-version tracking to detect mismatched modules
The debug info isn't shipped, so reverse-engineering structure layouts requires examining the binary kernel image directly. Section 12 documents what such an examination reveals.
The crash at nf_tables_init_net+0x18 told us that the running kernel's internal layout differs from any combination of public sources we could build against. To quantify how far it diverges, we extracted the kernel image from the EFG and compared its symbol table against the symbol tables of vanilla Linux 5.15.72 and our Marvell BSP build.
The EFG's kernel image is on disk at /boot/vmlinuz-5.15.72-ui-cn9670:
$ ls -la /boot/vmlinuz-5.15.72-ui-cn9670
-rw-r--r-- 1 root root 12071956 ... /boot/vmlinuz-5.15.72-ui-cn9670
$ file /boot/vmlinuz-5.15.72-ui-cn9670
gzip compressed data, max compression, from Unix, original size 28811776
$ gunzip -c /boot/vmlinuz-5.15.72-ui-cn9670 > efg-vmlinuz
$ binwalk efg-vmlinuz | head -3
DECIMAL HEXADECIMAL DESCRIPTION
0 0x0 Linux kernel ARM64 image, load offset: 0x0,
image size: 29818880 bytes, little endian, 64k page size
$ strings -a efg-vmlinuz | grep "Linux version"
Linux version 5.15.72-ui-cn9670 (bdd@builder)
(gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld 2.35.2)
#5.15.72 SMP Wed Apr 15 23:39:47 CST 2026
The kallsyms symbol table is dumped via /proc/kallsyms:
$ wc -l /proc/kallsyms
130789 /proc/kallsyms
$ head -2 /proc/kallsyms
ffff800008000000 T _text
ffff800008010000 T _stext
We note that kallsyms is unrestricted — full virtual addresses are visible. On most production systems, kernel.kptr_restrict is set to 1 or 2, which causes kallsyms to either redact or zero out the address column. The EFG ships with kptr_restrict=0. This is a security observation in its own right (it makes ROP and KASLR-bypass attacks easier), but for our purposes it provided complete ground-truth symbol data.
We extracted the symbol tables from each source:
# Symbols in EFG's running kernel
$ awk '{print $3}' /tmp/efg-kallsyms.txt | sort -u > /tmp/efg-syms.txt
# Symbols in our Marvell BSP build
$ nm ~/efg-build/marvell-bsp/linux-yocto-cnxk-5.15/vmlinux \
| awk '{print $3}' | sort -u > /tmp/bsp-syms.txt
# Symbols in vanilla 5.15.72
$ nm ~/efg-build/vanilla-5.15.72/linux-5.15.72/vmlinux \
| awk '{print $3}' | sort -u > /tmp/vanilla-syms.txt
$ wc -l /tmp/*-syms.txt
115998 /tmp/bsp-syms.txt
120399 /tmp/efg-syms.txt
112581 /tmp/vanilla-syms.txt
The diff: symbols present in the EFG kernel but absent from BOTH vanilla 5.15.72 AND the Marvell BSP build:
$ comm -23 /tmp/efg-syms.txt \
<(sort -u /tmp/vanilla-syms.txt /tmp/bsp-syms.txt) \
| grep -vE "^(\.L[0-9]+|\.LC[0-9]+|\.LBE|\.LFE|\.LFB|\.Letext|\.Ldebug|\.Lframe|__compound_literal\.|__func__\.|__warned\.|CSWTCH\.)" \
> /tmp/efg-unique-real-syms.txt
$ wc -l /tmp/efg-unique-real-syms.txt
6357 /tmp/efg-unique-real-syms.txt
After filtering out compiler-generated local labels (which vary across every build of every kernel and carry no information), 6,357 unique symbols exist in the EFG's kernel that are present in neither vanilla Linux 5.15.72 nor Marvell's published OCTEON BSP.
Grouping the unique symbols by name pattern reveals what Ubiquiti added:
| Category | Symbol count | Examples |
|---|---|---|
tdts_* (Trend Micro Deep-packet Threat Surveillance) |
116 | tdts_shell_dpi_l3_skb, tdts_shell_dpi_register_mt |
tm_* (Trend Micro shared) |
33 | tm_crypto_* family |
ubnthal_* (Ubiquiti HAL) |
45 | ubnthal_get_controller_host, ubnthal_get_cputype |
ubnt_* (Ubiquiti utilities) |
additional | ubnt_blk_wp_callback, ubnt_mtd_partition_read |
| HTTP protocol decoder (kernel-space) | dozens | BuildHTTP_request_KeywordTries, Create_HTTP_Protocol_Decoder |
| H.323 protocol decoder (kernel-space) | dozens | DecodeQ931, DecodeMultimediaSystemControlMessage |
nf_*dpi* (Deep Packet Inspection conntrack extensions) |
several | nf_conntrack_dpi_init, nf_ct_ext_dpi_destroy, nf_dpi_proc_dir |
dpi_* (Deep Packet Inspection engine) |
dozens | __kstrtab_dpi_main, related classification entry points |
wg_* (WireGuard, partly upstream) |
113 | wg_* |
| Firmware signing key blobs | a few | UDMENT_CN9670_FW_KEY, UXG_AL324_FW_KEY |
A note on terminology: throughout this section, "DPI" refers to Deep Packet Inspection — the application-layer traffic-classification feature that powers the UniFi dashboard's per-application traffic statistics and threat management. This is distinct from Marvell's hardware DPI block (DMA Packet Interface, also abbreviated DPI), which is a PCIe DMA engine on the OCTEON SoC and shows up in the kernel image as register-name strings like DPI_DMA_CONTROL and DPI_REQQ_INT. Those Marvell hardware-driver symbols are present in the public BSP and don't appear in the 6,357-symbol delta. The dpi_*, tdts_*, nf_*dpi*, and xt_dpi symbols below are the inspection-software layer Ubiquiti added on top.
Some of these are unsurprising (ubnthal_* is a clean abstraction layer; WireGuard was upstream by 5.6 but Ubiquiti may have backported aspects). Others are deeply diagnostic.
The most consequential finding is in the nf_* namespace:
nf_conntrack_dpi_fini
nf_conntrack_dpi_init
nf_ct_ext_dpi_destroy
nf_dpi_proc_dir
The Linux conntrack subsystem has an extension framework (include/net/netfilter/nf_conntrack_extend.h) that allows kernel modules to attach per-flow metadata to each struct nf_conn. Adding a new extension type requires changes in both:
enum nf_ct_ext_idinnf_conntrack_extend.h(adding a new value)- The static array
nf_ct_ext_types(adding a new entry) - Anywhere code iterates over extension types
The presence of nf_ct_ext_dpi_destroy is direct evidence that Ubiquiti has added a new conntrack extension (NF_CT_EXT_DPI or similar) to track DPI metadata per flow.
This change is precisely the kind that would alter struct nf_conn layout and per-net data structure layout — exactly the kind of change that would explain why nf_tables.ko built against any public source crashes when it tries to register a pernet_operations against the running kernel.
Examined more closely, the tdts namespace exposes kernel symbols:
__ksymtab_tdts_shell_dpi_l2_eth
__ksymtab_tdts_shell_dpi_l3_data
__ksymtab_tdts_shell_dpi_l3_skb
__ksymtab_tdts_shell_dpi_register_mt
__ksymtab_tdts_shell_dpi_unregister_mt
__ksymtab_dpi_main
The __ksymtab_* and __kstrtab_* symbols are how the kernel records what symbols a module exports. The names dpi_l2_eth, dpi_l3_data, dpi_l3_skb indicate these are functions for handling Ethernet frames and IPv4/IPv6 packets at layer 2 and layer 3 respectively. The _register_mt and _unregister_mt suffixes are netfilter xt_match (match-target) registration entry points.
The runtime panic dump in Section 11 showed these modules tagged tdts(PO) and t_miner(PO) — proprietary, out-of-tree.
The "tdts" name strongly suggests Trend Micro Smart Protection Network ("TMSPN" — TM Deep-packet Threat Surveillance, abbreviated tdts). Trend Micro licenses their threat-detection engine to network device vendors as a closed-source kernel module. The tm_crypto(O) and t_miner(PO) modules in the same panic trace fit the pattern: t_miner is a content-pattern matcher, tm_crypto is the encrypted-traffic analyzer.
These modules are not Ubiquiti's own code. They are licensed proprietary code from Trend Micro that Ubiquiti has integrated into their firmware. They link directly against kernel symbols (notable per the xt_dpi(O) netfilter match registered in the kernel's tainted-module list).
The unique symbols also reveal that Ubiquiti has embedded application-layer protocol decoders directly in the kernel:
BuildHTTP_request_KeywordTries
Close_HTTP_Request_Connection
Create_HTTP_Protocol_Decoder
Free_HTTP_Protocol_Decoder
HTTP_Connection_Lost_Count
HTTP_Req_Count
Init_HTTP_Protocol_Decoder
NormalizeURI
Parse_HTTP_Request
ScanHTTPVersion
ScanRequestHeaders
URINormalize
DecodeMultimediaSystemControlMessage
DecodeQ931
DecodeRasMessage
_AdmissionConfirm
_AdmissionRequest
_Alerting_UUIE
The HTTP decoder symbols (camelCase, with _HTTP_ infix) appear to be from a Trend Micro protocol-parsing library running in kernel space. The H.323/Q.931 decoder symbols are similarly out-of-place for a kernel — these would normally live in userspace.
Running parsers for HTTP, H.323, and similar attacker-controllable formats inside the kernel is a substantial security risk. A bug in any of these decoders becomes a kernel vulnerability. Mainstream Linux distributions and other vendors deliberately keep this kind of code in userspace (Suricata, Snort, etc.) for exactly this reason.
To put 6,357 symbols in perspective:
- Vanilla 5.15.72 has 112,581 unique symbols
- Marvell's published BSP adds 3,417 net new symbols on top (a 3% increase)
- Ubiquiti's running kernel has 6,357 symbols beyond Marvell's BSP — a further 5.5% increase
Phrased differently: roughly 1 in 19 symbols in the EFG's running kernel did not come from any source publicly available to a security researcher, GPL-rights-exercising customer, or independent third party.
This is the kernel that handles your VLAN traffic, your firewall rules, your VPN keys, and your DPI inspection. The behavior of this kernel cannot be audited from outside because the source for 5% of it is not published. The technical analysis in Section 11 demonstrates that this 5% includes substantial netfilter modifications.
The Linux kernel is licensed under GPL-2.0. That license imposes specific obligations on anyone who distributes a binary derived from GPL-licensed source. The relevant provisions, summarized:
- The complete corresponding source code must be made available to recipients of the binary, under the same license, for at least three years (GPL-2.0 §3).
- Changes to GPL'd source files must themselves be GPL-licensed (GPL-2.0 §2, the "viral" clause).
- Linking proprietary modules against GPL kernel symbols is a contested legal area. Linus Torvalds and the Linux Foundation's longstanding position is that modules that use only
EXPORT_SYMBOL(notEXPORT_SYMBOL_GPL) interfaces and "can plausibly be shown to be independent" may be distributed under non-GPL licenses, but there is no clean legal answer here. The Free Software Foundation's position is stricter: any kernel module is a derived work. - A written offer to provide source must accompany the binary distribution, valid for at least three years.
- Derived works that combine GPL and proprietary code in linked form typically must be GPL-licensed in their entirety.
Ubiquiti previously maintained an open-source download page at ui.com/download/open-source, but that page no longer exists. As of this writing (May 2026), Ubiquiti's main website does not host any GPL source code archives that we could locate. The Ubiquiti GitHub organization (https://github.com/ubiquiti) contains only two repositories: support-tools and freeswitch. Neither contains kernel sources or firmware sources for any current product.
This is not the first time Ubiquiti's GPL compliance has been questioned. The Wikipedia article on Ubiquiti documents a recurring pattern:
- 2015: Ubiquiti was accused of violating GPL terms for code in their products. Specifically, customers requested the source for the GPL-licensed U-Boot bootloader and Ubiquiti refused, making it impractical for customers to fix a security issue. The source was eventually released after sustained public pressure.
- 2019: Ubiquiti was again reported to be in violation of GPL.
- 2026 (current): The open-source download page that previously hosted source archives has been removed entirely.
For an EFG owner attempting to exercise their GPL rights today, the channels are:
- The Ubiquiti support email (
[email protected]), which redirects GPL requests to a separate address - A specific email for source requests:
[email protected] - Community forum posts (which historically receive no substantive Ubiquiti response on GPL questions)
- Third-party archives like
github.com/unifi-hackers/unifi-gplandgithub.com/CodeFetch/Ubiquiti-UBNT-airOS, which contain partial GPL sources that researchers have extracted from firmware images or obtained through pressure
A formal request for the complete kernel source has been filed via [email protected], the email address Ubiquiti's support team directed users to. The request specifies:
- The full kernel source tree corresponding to the running kernel version
- The build configuration (
/proc/config.gz) - The complete set of patches applied on top of the base kernel
- The Marvell-specific drivers (octeontx2_pf, octeontx2_vf, octeontx2_af, rvu_*, NIX, CPT, SSO, NPA)
- Any other GPL components
The request is pending. Ubiquiti's response (or non-response) to this request is itself a data point.
Section 12 documents 6,357 unique kernel symbols in the running EFG kernel that are not present in either vanilla Linux 5.15.72 or the complete published Marvell OCTEON CN9K BSP. These include:
- Symbols indicating modifications to core netfilter conntrack data structures (
nf_ct_ext_dpi_destroy,nf_conntrack_dpi_init) - A 116-symbol
tdtsnamespace exposing kernel functions to a closed-source DPI engine - HTTP and H.323 application-layer protocol decoders embedded in the kernel
- A 45-symbol Ubiquiti hardware abstraction layer
For Ubiquiti to be in compliance with GPL-2.0, the source of the changes producing these symbols must be available — at minimum to anyone who has purchased an EFG and exercises their GPL rights to request it.
Section 11 documented the panic trace's Modules linked in list, which included:
xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O)
xt_dyn_random xt_dpi(O) ui_lcm(O) ubnthal(PO) ubnt_common(PO)
The (PO) taint flag is the kernel's own classification. It means the module is loaded with a MODULE_LICENSE() declaration that is not one of the GPL-compatible strings. The kernel taints itself when such modules are loaded specifically because their continued operation calls into question the kernel's GPL status.
Among these:
tdtsandt_minerare almost certainly licensed proprietary code from Trend Micro. They registerxt_matchnetfilter hooks and export functions liketdts_shell_dpi_l3_skb. They link directly against GPL kernel symbols (the__kstrtab_*and__ksymtab_*infrastructure exists for this purpose).nf_app,xt_dpi,xt_geoipare likely Ubiquiti's own proprietary netfilter extensions that integrate with the DPI engine.ubnthal,ubnt_common,ui_lcmare Ubiquiti's hardware abstraction layer.
The legal status of these modules is contested in general terms. The specific question for Ubiquiti is: are these modules "derived works" of the kernel? The Free Software Foundation says any kernel module is. Linus Torvalds has historically said it depends on whether the module uses EXPORT_SYMBOL_GPL interfaces and on whether the module has independent existence outside of Linux.
For tdts specifically: Trend Micro markets the underlying technology as portable across operating systems (it runs on Windows, FreeBSD, etc.), which would weigh in favor of "independent existence" under Torvalds's standard. For nf_app, xt_dpi, and ubnthal: these are by name and design Ubiquiti-specific kernel-only modules; they have no plausible existence independent of Ubiquiti's Linux distribution. Under either FSF's or Torvalds's standard, nf_app, xt_dpi, and ubnthal would appear to be derived works of the kernel and therefore subject to GPL.
The closed-source modules link against GPL kernel symbols using EXPORT_SYMBOL and EXPORT_SYMBOL_GPL exports. Some of those exports — particularly conntrack extension registration — were added by Ubiquiti's own kernel patches (per Section 12).
In other words: Ubiquiti modified the kernel (a GPL'd derived work, requiring source release) specifically to add GPL'd interfaces that proprietary modules would link against. Whether this is a GPL violation depends on the resolution of the GPL-vs-proprietary-module question, but it is a structurally significant observation: the proprietary modules and the kernel patches are designed to work together as a single integrated system. The kernel cannot be replaced without breaking the proprietary modules; the proprietary modules cannot run on any other kernel.
That tight integration is what FSF would call "a single program in two pieces" — a derived work. Under that interpretation, the entire firmware would need to be GPL-licensed, and the proprietary modules would be in violation.
If you own an EFG, you have a legal right under GPL-2.0 to request the complete source code of the kernel running on your device. That includes:
- The base kernel source, with full version history
- All patches applied by Ubiquiti and any third parties
- The build configuration (
.config) - Any installation/build scripts necessary to reconstruct the binary
- The kernel modules whose source is GPL
This right cannot be waived by EULA. If Ubiquiti refuses to provide this source, that refusal is a violation of GPL-2.0 §3, and the appropriate path forward is:
- Make a written request to
[email protected]specifying the firmware version - If no response within 30 days, escalate to Ubiquiti's legal department
- If still no response, contact the Software Freedom Conservancy at
[email protected]— they handle GPL enforcement on behalf of multiple Linux kernel copyright holders - The Conservancy can pursue compliance via the
kernel-enforcementprogram
The EFG is a flagship enterprise router from a publicly-traded networking vendor (Ubiquiti, NYSE: UI). It is sold to enterprises, cloud providers, government agencies, and home users. The firmware running on it includes 6,357 kernel symbols that no customer can audit because the source is not published.
Network device firmware is some of the most security-sensitive software in any infrastructure. The kernel running on a firewall or router decides what packets enter and leave the network. Bugs and backdoors in that kernel directly affect every device behind it.
GPL-2.0 was specifically designed to ensure that customers and security researchers can audit the software running on the devices they own. Vendor compliance with the license is not a courtesy — it is a precondition for the trust the GPL ecosystem makes possible.
The findings in this document — that even Marvell's complete public BSP source is insufficient to build modules that work on the EFG, that 6,357 symbols are unique to Ubiquiti's kernel, and that closed-source modules with (PO) taint flags are integrated with the netfilter subsystem — are exactly the kind of findings that demonstrate why GPL compliance is important. The license requires that this kind of analysis be unnecessary, because the source should be available.
Many of the findings in this document have already been raised with Ubiquiti through their official channels. The vendor's responses are themselves part of the record.
The author of this document opened a support ticket with Ubiquiti approximately one year prior to publication, describing the inter-VLAN performance bottleneck on the EFG and proposing the architectural fix in detail — specifically, recommending that Ubiquiti adopt the DPDK + VPP + Suricata-on-DPDK reference architecture that Marvell themselves publish for the OCTEON CN9K silicon family.
The ticket has not received a substantive engineering response. It remains effectively open without resolution.
This means the central technical recommendation of this document — that the EFG can deliver substantially higher throughput by adopting the dataplane architecture its silicon vendor publishes — was already in Ubiquiti's hands a year ago, with implementation guidance, and was not acted upon.
Section 11.1 of this document catalogues the security configuration choices in the EFG's running kernel:
module.sig_enforce=0— modules can be loaded without signature verificationCONFIG_MODULE_SIGnot set — the kernel was not even built with signing infrastructure- No
lockdown=argument on the kernel command line — Secure Boot LSM is not engaged CONFIG_SECURITY_LOCKDOWN_LSMnot set in the kernel build- Overlayfs root filesystem with a writable upper layer — kernel-loadable code can be persisted
kernel.kptr_restrict=0— the full kallsyms table with virtual addresses is exposed
Combined with the kernel's CONFIG_MODVERSIONS=N setting (Section 11.4), this means: any process with CAP_SYS_MODULE (root, including any context that escalates to root) can load arbitrary kernel code, and there is no in-kernel mechanism to detect or prevent that loading. The watchdog will reboot the device on a kernel panic, but a successfully-loaded malicious module that doesn't crash the kernel would persist indefinitely.
Separately, the author identified additional security findings on the EFG — notably the presence of private cryptographic key material accessible via the firmware image (per the *_FW_KEY strings observed in Section 12.6's symbol analysis, alongside other findings not detailed here for responsible disclosure reasons).
These findings were submitted through Ubiquiti's HackerOne bug bounty program — the formal, documented channel for security disclosure to the vendor.
Ubiquiti rejected the submission. The stated reason: the attacker would require network access to exploit the issue.
This rationale does not survive scrutiny when applied to a network gateway:
- A network gateway is, by definition, on the network. Network access to the device is the universal precondition for any attack against it.
- The threat model that a security-conscious gateway is designed to defend against is precisely "an attacker who has gained network access" — whether that's a compromised endpoint behind the gateway, a hostile guest device on the same VLAN, or an internal lateral-movement scenario in an enterprise breach.
- Gateway vendors with mature security postures (Cisco, Juniper, Palo Alto, Fortinet, Arista, etc.) routinely accept and remediate vulnerabilities under this threat model. CVEs against these products list "network adjacent" or "network reachable" as the qualifying attack vector, not a disqualifying one.
- The official CVSS v3.1 scoring system explicitly defines "Adjacent Network" (AV:A) and "Network" (AV:N) as valid attack vectors. A vendor declining to engage with vulnerabilities in those classes is declining to engage with most of the vulnerability landscape for their product category.
The rejection is therefore not just a technical disagreement — it is a stated position on what kinds of attacks Ubiquiti considers in scope for their bounty program. By that stated standard, an attacker who has already established a foothold on the network behind the EFG is not a threat the EFG considers itself responsible for defending against. That is an unusual posture for a $2,000 device sold and marketed as an enterprise security gateway.
Putting these data points together with the GPL findings in Section 13:
| Issue raised | Channel | Year | Vendor response |
|---|---|---|---|
| Inter-VLAN performance, with DPDK fix recommendation | Standard support | ~1 year ago | No substantive engineering response |
| Security configuration / private key exposure | HackerOne bug bounty | Recent | Rejected: "requires network access" |
| GPL kernel source release | Email to [email protected] | Pending | Pending |
| GPL kernel source release | Public web page | Historical | Page removed |
The historical context is also relevant: Ubiquiti was publicly accused of GPL violations in 2015 and again in 2019, and the pattern has continued.
The findings in this document are not surprising vendor disclosures. They are issues that engineering, security, and licensing teams within the vendor have either been told about or are demonstrably aware of and have chosen not to act on. The reason this document exists in public form is that the channels designed for these conversations — support tickets, bug bounty programs, GPL compliance contacts — have not produced action.
This investigation began as a performance analysis: why does a $2,000 enterprise router with two 25 GbE SFP28 ports deliver only ~1 Gbps of single-stream inter-VLAN throughput, and ~3 Gbps of single-stream PPPoE WAN throughput? The lab data is unambiguous. The bottlenecks are software-architectural choices, not hardware limitations:
- The kernel network stack on a single core has a ~5 Gbps single-stream ceiling when offloads are off, regardless of CPU vendor.
- Hardware offloads are disabled by default on the EFG. Enabling them is a 4-7× improvement on otherwise-identical configurations.
- The 5-deep iptables FORWARD chain pattern the EFG ships with costs roughly half of single-stream throughput when offloads are also off.
nftablesflowtable — a kernel feature available since Linux 4.16, shipped enabled by every major distribution, is not even compiled into the EFG's kernel. Adding it gives 3-7× single-stream improvement.- DPDK + VPP on the same silicon — using software stacks that Marvell themselves publish — would deliver 15-25× the throughput. The Cortex-A72-class cores in the Octeon CN9670 can sustain 6-12 Gbps per core in a userspace dataplane. The chip has 18 of those cores.
- PPPoE forwarding is single-cored in stock Linux because of how
ppp_genericis structured. The fix exists in DPDK and was being upstreamed at time of writing.
These are not exotic or research-grade fixes. Three of them are configuration changes. One requires loading a kernel module that's already in mainline. The most architecturally significant — DPDK + VPP — uses Marvell's own published reference architecture. The hardware was designed for this; the firmware just doesn't use it.
The conntrack helper toggle Ubiquiti recently shipped in the UniFi controller (Section 9 Finding 5, Section 10 Fix 7) is informative beyond its narrow effect. It exposes the FTP/H.323/SIP/PPTP/TFTP helpers as administrator-controllable. The toggle's existence proves Ubiquiti's engineering team is actively reasoning about per-flow netfilter overhead — they identified that helpers cost something, and shipped a workaround to let users disable them. They did not ship the proper fix, which is the kernel's flowtable infrastructure, even though the proper fix would address every architectural finding in this document and the partial fix addresses only one. That is a choice, not an oversight.
Section 11 documented our attempt to apply the most surgical of these fixes — adding the missing nftables flowtable kernel modules — to a real production EFG. Two builds were attempted:
- Vanilla Linux 5.15.72 from kernel.org → byte-perfect vermagic match → kernel panic at
nf_tables_init_net+0x18 - Marvell's complete published OCTEON BSP source (linux-yocto branch
v5.15/standard/cn-sdkv5.15/octeon) → byte-perfect vermagic match → kernel panic at the identical instruction
The fact that both crashes occurred at the same function offset proves that the ABI mismatch is not introduced by Marvell's BSP patches. It is introduced by something Ubiquiti has applied on top of Marvell's BSP — patches Ubiquiti has not published.
Section 12 quantified that delta: 6,357 kernel symbols exist in the running EFG kernel that are present in neither vanilla Linux 5.15.72 nor Marvell's complete public BSP. Approximately 1 in 19 symbols in the EFG's kernel is unique to Ubiquiti's build and not derivable from any public source. These include:
- Conntrack extension types for proprietary DPI integration (
nf_ct_ext_dpi_destroy,nf_conntrack_dpi_init) - A 116-symbol
tdtsnamespace exposing kernel internals to a closed-source Trend Micro DPI engine - HTTP and H.323 application-layer protocol decoders running in kernel space
- A 45-symbol Ubiquiti hardware abstraction layer
Section 13 addressed what these findings mean for GPL-2.0 compliance:
- Ubiquiti has shipped a substantially modified Linux kernel without publishing the corresponding source
- The proprietary kernel modules
tdts,t_miner,nf_app,xt_dpi,ubnthal, andubnt_commonlink against GPL kernel symbols and operate as integrated components of the running kernel - Specifically,
nf_app,xt_dpi, andubnthalhave no existence independent of Ubiquiti's Linux integration and would be derived works under either FSF's or Linus Torvalds's interpretation of the GPL - Ubiquiti's open-source download page has been removed; their GitHub presence does not contain firmware sources
- This continues a documented pattern — Ubiquiti was publicly accused of GPL violations in 2015 (resolved only after sustained pressure) and again in 2019
- A formal request has been filed via the channel Ubiquiti's support team specified
The GPL exists specifically so that customers can audit and modify the software running on devices they own. The fact that this analysis required reverse-engineering kernel symbol tables from a binary firmware image — when the GPL requires the source be available on request — is itself the finding.
Section 14 documented direct vendor engagement: a performance ticket open with Ubiquiti for approximately one year recommending the DPDK fix (no substantive engineering response), a security disclosure submitted through Ubiquiti's HackerOne bug bounty program (rejected on the grounds that exploitation requires network access — a position that does not survive scrutiny when applied to a network gateway), and the GPL request now pending. The findings in this document are not novel disclosures to the vendor; they are issues the vendor has been told about, through the channels designed for these conversations, and has chosen not to act on.
If you are evaluating or already operating EFG/UDM/UXG hardware, the questions to put to your Ubiquiti account team are:
- Performance: When will inter-VLAN single-stream throughput on the EFG match the marketed 25 GbE port speeds for normal enterprise workloads (TCP, MTU 1500, with stateful firewall rules)?
- Roadmap: Does Ubiquiti's roadmap include adopting DPDK-based dataplanes (which Marvell's reference architecture for this silicon recommends and supports)?
- Configuration: Will Ubiquiti expose
nftablesflowtable, hardware offload, and conntrack helper toggles as administrator-controllable settings before any DPDK migration? - GPL compliance: Will Ubiquiti publish the complete kernel source corresponding to current EFG firmware versions, including all patches, build configuration, and the source of
nf_app,xt_dpi,ubnthal, andubnt_common?
The first three are about getting the performance you paid for. The fourth is about knowing what's running on your network.
The EFG, UDM-Pro-Max, UXG-Lite, UXG-Pro, and other Ubiquiti gateways share substantial portions of this kernel and firmware design. The performance characteristics documented here for the EFG are likely to apply, with proportional differences in absolute numbers, across the product line.
If your home or small-office workload is dominated by single-stream throughput (a single VPN tunnel, a single large file transfer, a single backup job), you are likely bottlenecked by the issues described above, regardless of how fast your internet connection or LAN switch is.
The most impactful workaround available without firmware changes is to enable hardware offloads where Ubiquiti's UI exposes the toggle. Beyond that, the architectural fix is in Ubiquiti's hands.
| # | NIC | Forwarder | MTU | Offloads | Rules | Single-stream | Notes |
|---|---|---|---|---|---|---|---|
| 1 | virtio | kernel | 9000 | on | none | 16.9 Gbps | naïve baseline |
| 2 | virtio | kernel | 9000 | off | none | 17.2 Gbps | jumbo hides per-packet cost |
| 3 | virtio | kernel | 1500 | off | none | 4.95 Gbps | EFG-realistic baseline; 1 core 100% soft |
| 4 | virtio | kernel | 1500 | off | + ct module | 4.84 Gbps | trivial overhead |
| 5 | virtio | kernel | 1500 | off | + simple ct rule | 4.64 Gbps | 4% drop |
| 6 | virtio | kernel | 1500 | off | EFG 5-chain replica | 2.36 Gbps | smoking gun |
| 7 | virtio | kernel | 1500 | off | EFG (8 streams) | 11.4 Gbps agg | scales with cores |
| A | virtio | kernel | 1500 | off | flowtable | 7.05 Gbps | flowtable alone, 3× over EFG |
| B | virtio | kernel | 1500 | on | flowtable | 17.4 Gbps | one-line config improvement |
| K1 | ConnectX VF | kernel | 1500 | on | none | 25.3 Gbps | real silicon baseline |
| K2 | ConnectX VF | kernel | 1500 | on | EFG 5-chain | 21.1 Gbps | GRO hides per-packet cost |
| K3 | ConnectX VF | kernel | 1500 | off | none | 4.74 Gbps | matches virtio with offloads off |
| K4 | ConnectX VF | kernel | 1500 | off | EFG 5-chain | 4.70 Gbps | I/O is the bottleneck here |
| V0 | virtio | VPP/DPDK | 1500 | off | n/a | 6.78 Gbps | DPDK with virtio-pmd; bottlenecked by vhost-net |
| V1 | ConnectX VF | VPP/DPDK | 1500 | client off | n/a | 15.7 Gbps | wire-packet processing |
| V2 | ConnectX VF | VPP/DPDK | 1500 | client on | n/a | 35.6 Gbps | headline number |
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain alien_chain {
counter
ip protocol tcp counter
ip saddr 10.0.0.0/8 counter
}
chain tor_chain {
counter
ip protocol tcp counter
tcp flags & (syn|ack) == ack counter
}
chain ips_chain {
counter
ip protocol tcp counter
meta l4proto tcp counter
tcp dport { 1-65535 } counter
}
chain ubios_chain {
counter
ip protocol tcp counter
ct state established counter
}
chain user_chain {
counter
ct state established,related counter
ip saddr 10.10.10.0/24 ip daddr 10.10.20.0/24 counter
}
chain forward {
type filter hook forward priority 0; policy accept;
jump alien_chain
jump tor_chain
jump ips_chain
jump ubios_chain
jump user_chain
}
}
table ip nat {
chain postrouting {
type nat hook postrouting priority 100;
oifname "enp6s18" masquerade
}
}
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
flowtable f {
hook ingress priority 0
devices = { enp6s19, enp6s20 }
}
chain forward {
type filter hook forward priority 0; policy accept;
ip protocol { tcp, udp } flow add @f
ct state established,related accept
}
}
table ip nat {
chain postrouting {
type nat hook postrouting priority 100;
oifname "enp6s18" masquerade
}
}
unix {
nodaemon
log /var/log/vpp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
gid vpp
}
api-trace { on }
api-segment { gid vpp }
socksvr { default }
cpu {
main-core 0
corelist-workers 1
}
buffers {
buffers-per-numa 32768
default data-size 2048
}
dpdk {
dev 0000:01:00.0 {
name lab-vlan10
num-rx-queues 1
num-tx-queues 1
}
dev 0000:02:00.0 {
name lab-vlan20
num-rx-queues 1
num-tx-queues 1
}
}
plugins {
plugin default { enable }
plugin dpdk_plugin.so { enable }
}
Thread 1 vpp_wk_0 (lcore 1)
Time 257.0, vector rate 3.5586e5 in/out, packets/sec
Name Calls Vectors Packet-Clocks Vectors/Call
dpdk-input polling 2683609353 91446442 4.25e3 .03
ethernet-input active 12518445 91446442 9.41e1 7.30
ip4-input-no-checksum 12136093 91446437 3.98e1 7.54
ip4-lookup active 12136093 91446437 5.23e1 7.54
ip4-rewrite active 12136093 91446437 3.86e1 7.54
lab-vlan20-output active 10310280 89229310 1.21e1 8.65
lab-vlan20-tx active 10310280 89229310 3.79e1 8.65
VPP per-packet end-to-end cost on Zen 4: ~80 cycles (ethernet-input + ip4-input + ip4-lookup + ip4-rewrite + interface-output + tx) ≈ 16 nanoseconds per packet at 5 GHz. Theoretical ceiling on this pipeline: ~700+ Gbps single-core.
$ uname -a
Linux EFG-Home-SP 5.15.72-ui-cn9670 #5.15.72 SMP Wed Apr 15 23:39:47 CST 2026 aarch64
$ iptables -L FORWARD -n -v --line-numbers
Chain FORWARD (policy ACCEPT)
1 555K 775M ALIEN
2 2764K 4489M TOR
3 238M 354G IPS
4 874M 1342G UBIOS_FORWARD_JUMP
$ nft list flowtables
[empty]
$ lsmod | grep nf_flow_table
[empty]
$ ps -eo pid,pcpu,comm --sort=-pcpu | head -8
4098469 39.6 dpi-flow-stats
3139 12.5 ubios-udapi-ser
66687 7.8 java
4891 7.0 conntrackd
2491041 6.9 Suricata-Main
5505 6.2 mcad
8596 3.9 unifi-core
$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 10485760
$ lsmod | grep nf_conntrack | grep -v '^nf_conntrack '
nf_conntrack_tftp 262144 1 nf_nat_tftp
nf_conntrack_pptp 327680 1 nf_nat_pptp
nf_conntrack_h323 327680 1 nf_nat_h323
nf_conntrack_ftp 327680 1 nf_nat_ftp
Cross-compilation environment:
- Host: Threadripper Pro 7995WX, Ubuntu 24.04 LTS VM, 16 vCPU, 32 GB RAM
- Toolchain:
gcc-10-aarch64-linux-gnu10.5.0 from Ubuntu universe repo - Kernel source:
linux-5.15.72.tar.xzfrom kernel.org (verified SHA256) - Build configuration: EFG's exposed
/proc/config.gzplus three module enables forNF_TABLES,NF_FLOW_TABLE,NF_FLOW_TABLE_INET - LOCALVERSION:
-ui-cn9670(matching the EFG's published version string) - Build time: 1 minute 52 seconds (16-thread parallel build)
Modules produced:
net/netfilter/nf_tables.ko (10.3 MB)
net/netfilter/nf_flow_table.ko (1.8 MB)
net/netfilter/nf_flow_table_inet.ko (495 KB)
Vermagic verification (build host):
$ for ko in nf_tables.ko nf_flow_table.ko nf_flow_table_inet.ko; do
strings $ko | grep ^vermagic
done
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
Vermagic verification (EFG, in-tree module):
$ modinfo nf_conntrack_ftp | grep vermagic
vermagic: 5.15.72-ui-cn9670 SMP mod_unload aarch64
Match: exact, character-for-character.
Kernel panic on load attempt (insmod ./nf_tables.ko):
Unable to handle kernel NULL pointer dereference at virtual address 0x0000000000000120
ESR = 0x96000005, EC = 0x25: DABT (current EL), IL = 32 bits
FSC = 0x05: level 1 translation fault
[0000000000000120] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] SMP
Code: 910003fd b9432021 f9000bf3 f9455400 (f8615813)
Kernel panic — not syncing: Oops: Fatal exception
Recovery: watchdog hard-reboot, ~2 minute downtime, no permanent damage. Failover to secondary gateway functioned correctly throughout.
Root cause: CONFIG_MODVERSIONS is disabled in the EFG's kernel config, so symbol-CRC verification did not catch the binary ABI mismatch between vanilla 5.15.72 and Ubiquiti's patched 5.15.72-ui-cn9670 build at module load time. The module linked successfully against the running kernel but encountered mismatched struct layouts during init, dereferencing a NULL pointer in the netfilter subsystem.
GPL source request status: filed with [email protected] requesting the complete corresponding source code for kernel 5.15.72-ui-cn9670, including all Ubiquiti and Marvell patches, build configuration, toolchain version, and packaging scripts. Outcome will determine whether the experiment can be re-attempted with a kernel tree that produces ABI-compatible modules.
All measurements were taken on a single physical machine over a continuous test session. Configuration files, scripts, and raw iperf3 outputs are available on request.
Build environment (same VM as A.7):
- Ubuntu 24.04 LTS, 16 vCPU, 32 GB RAM
- gcc-10-aarch64-linux-gnu 10.5.0
- linux-yocto repository, branch
v5.15/standard/cn-sdkv5.15/octeon - Repository URL:
https://git.yoctoproject.org/linux-yocto.git
Tree state:
$ git branch --show-current
v5.15/standard/cn-sdkv5.15/octeon
$ git log --oneline -3
7f33f19a49e6 (HEAD) Merge branch 'v5.15/standard/base' into v5.15/standard/cn-sdkv5.15/octeon
65333c3a0bcd Merge tag 'v5.15.203' into v5.15/standard/base
b9d57c40a767 Linux 5.15.203
Modifications to make HEAD identify as 5.15.72:
$ sed -i 's/^SUBLEVEL = .*/SUBLEVEL = 72/' Makefile
$ touch .scmversion # suppress dirty marker
$ make kernelrelease
5.15.72-ui-cn9670
Configuration (using EFG's /proc/config.gz as base):
CONFIG_LOCALVERSION="-ui-cn9670"
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NF_TABLES_IPV6=y
CONFIG_NF_FLOW_TABLE=m
CONFIG_NF_FLOW_TABLE_INET=m
CONFIG_NF_FLOW_TABLE_IPV4=m
CONFIG_NF_FLOW_TABLE_IPV6=m
CONFIG_NF_FLOW_TABLE_PROCFS=y
# CONFIG_DEBUG_INFO_BTF is not set
# CONFIG_MODULE_SIG is not set
Build output:
$ time make -j16
real 1m59s
user 23m50s
sys 4m33s
$ for ko in $(find . -name 'nf_tables.ko' -o -name 'nf_flow_table*.ko' | sort); do
echo "=== $(basename $ko) ==="
strings $ko | grep -E '^(vermagic|name|depends|description)='
done
=== nf_flow_table_ipv4.ko ===
description=Netfilter flow table support
depends=nf_flow_table,nf_tables
name=nf_flow_table_ipv4
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_ipv6.ko ===
description=Netfilter flow table IPv6 module
depends=nf_flow_table,nf_tables
name=nf_flow_table_ipv6
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table.ko ===
description=Netfilter flow table module
depends=
name=nf_flow_table
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_inet.ko ===
description=Netfilter flow table mixed IPv4/IPv6 module
depends=nf_flow_table,nf_tables
name=nf_flow_table_inet
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_tables.ko ===
depends=
name=nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
Crash trace from EFG load attempt:
[ 3368.013405] Unable to handle kernel NULL pointer dereference at virtual address 0
[ 3368.022216] Mem abort info:
[ 3368.025005] ESR = 0x96000005
[ 3368.028072] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3368.033402] FSC = 0x05: level 1 translation fault
[ 3368.074382] Modules linked in: nf_tables(+) wireguard libchacha20poly1305 ...
xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O)
xt_dyn_random ip6table_nat xt_conntrack xt_connmark xt_TCPMSS pppoe
pppox bonding xt_dpi(O) ip6table_mangle iptable_mangle ip6table_filter
ip6_tables uio_pdrv_genirq ui_lcm(O) ifb ppp_generic slhc
ubnthal(PO) ubnt_common(PO) drm drm_panel_orientation_quirks
[ 3368.121977] CPU: 3 PID: 211748 Comm: insmod Tainted: P W O 5.15.72-ui-cn9670 #5.15.72
[ 3368.130936] Hardware name: Marvell OcteonTX CN96XX board (DT)
[ 3368.143638] pc : nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.149059] lr : ops_init+0x3c/0x120
[ 3368.227314] x2 : ffff00019027b300 x1 : 0000000000000000 x0 : 0000000000000000
[ 3368.234825] nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.238053] ops_init+0x3c/0x120
[ 3368.242840] register_pernet_operations+0xec/0x240
[ 3368.247195] register_pernet_subsys+0x2c/0x50
[ 3368.252609] nf_tables_module_init+0x24/0x100 [nf_tables]
[ 3368.297899] ---[ end trace d3e1e407900e8e95 ]---
[ 3368.316500] Kernel panic - not syncing: Oops: Fatal exception
The HA failover handled the brief outage; service downtime was approximately 8 seconds.
# Step 1: Extract EFG kernel image (already gzip-compressed PE/COFF aarch64 image)
# from EFG: /boot/vmlinuz-5.15.72-ui-cn9670 (12 MB)
$ gunzip -c /boot/vmlinuz-5.15.72-ui-cn9670 > efg-vmlinuz
$ binwalk efg-vmlinuz | head -3
0 0x0 Linux kernel ARM64 image, image size: 29818880 bytes
# Step 2: Capture running symbol table (kallsyms is unrestricted on EFG)
# from EFG:
$ cat /proc/kallsyms > /tmp/efg-kallsyms.txt
$ wc -l /tmp/efg-kallsyms.txt
130789
# Step 3: Build vanilla 5.15.72 vmlinux (full build, not just modules)
$ cd ~/efg-build/vanilla-5.15.72/linux-5.15.72
$ make -j16 vmlinux
# Step 4: BSP vmlinux (already built for module experiment in 11.5)
# Step 5: Three-way symbol comparison
$ awk '{print $3}' /tmp/efg-kallsyms.txt | sort -u > /tmp/efg-syms.txt
$ nm ~/efg-build/marvell-bsp/linux-yocto-cnxk-5.15/vmlinux 2>/dev/null \
| awk '{print $3}' | sort -u > /tmp/bsp-syms.txt
$ nm ~/efg-build/vanilla-5.15.72/linux-5.15.72/vmlinux 2>/dev/null \
| awk '{print $3}' | sort -u > /tmp/vanilla-syms.txt
$ wc -l /tmp/*-syms.txt
115998 /tmp/bsp-syms.txt
120399 /tmp/efg-syms.txt
112581 /tmp/vanilla-syms.txt
# Step 6: Find symbols in EFG kernel but not in either public source
$ comm -23 /tmp/efg-syms.txt \
<(sort -u /tmp/vanilla-syms.txt /tmp/bsp-syms.txt) \
| grep -vE "^(\.L[0-9]+|\.LC[0-9]+|\.LBE|\.LFE|\.LFB|\.Letext|\.Ldebug|\.Lframe|__compound_literal\.|__func__\.|__warned\.|CSWTCH\.)" \
> /tmp/efg-unique-real-syms.txt
$ wc -l /tmp/efg-unique-real-syms.txt
6357Filter rationale: The grep -vE pattern excludes compiler-generated local labels (.L<N>, .LC<N>, .LBE<N>, etc.) which differ across every build of every kernel and carry no information about kernel structure. The remaining 6,357 symbols are real exported names, function names, and global variable names.
Top-level breakdown by name prefix:
$ awk -F'_' '{print $1}' /tmp/efg-unique-real-syms.txt | grep -v "^\." \
| sort | uniq -c | sort -rn | head -20
2646 (no prefix or various)
799 drm
195 bond
116 tdts
113 wg
104 my
66 fsv
59 ppp
51 mlxsw
46 shell
45 ubnthal
44 proc
44 get
42 dev
42 bonding
33 tm
32 nf
30 tcp
29 pppoe
27 ppu
Note: the drm count includes graphics driver code that may have come from a different source than vanilla or BSP (Ubiquiti uses Mediatek display panel for the EFG's front-panel LCD). The wg (WireGuard) count likely reflects an upstream backport. The tdts, tm, ubnthal, nf*dpi* numbers are the diagnostic ones.
The following text was sent to [email protected]:
Subject: GPL Source Request — Enterprise Fortress Gateway (EFG) Kernel Source
I am the owner of an Ubiquiti Enterprise Fortress Gateway (EFG) running firmware version [version], with kernel version 5.15.72-ui-cn9670. Per the terms of GPL-2.0, I am formally requesting the complete corresponding source code for this firmware's GPL-licensed components, including but not limited to:
- The complete Linux kernel source tree corresponding to 5.15.72-ui-cn9670, including:
- The base kernel source
- All patches applied by Ubiquiti and any third parties (Marvell, Trend Micro, etc.)
- The kernel build configuration (.config)
- The Marvell OCTEON CN9670 BSP drivers (octeontx2_pf, octeontx2_vf, octeontx2_af, rvu_*, NIX, CPT, SSO, NPA)
- Source code for any GPL-licensed kernel modules including those tagged with the GPL/GPL-compatible MODULE_LICENSE() declarations
- The device tree files (.dts, .dtsi) used by the firmware
- The build system, packaging recipes, and toolchain specification (compiler version, flags) sufficient to reproduce the binary
- Any other GPL components in the firmware (busybox, systemd, etc.)
Per GPL-2.0 §3, this source must be made available under the same license, in a form accessible to me. Acceptable delivery: a downloadable archive, a public git repository link, or physical media at cost.
[contact details]
The escalation path documented in Section 13.3 applies if no response is received.


Fantastic summary ! I wish Ubiquiti engineering and product management would actually READ all this!