Skip to content

Instantly share code, notes, and snippets.

@galvesribeiro
Last active May 3, 2026 23:32
Show Gist options
  • Select an option

  • Save galvesribeiro/89ce0232c8bc1971af84aee84746dc66 to your computer and use it in GitHub Desktop.

Select an option

Save galvesribeiro/89ce0232c8bc1971af84aee84746dc66 to your computer and use it in GitHub Desktop.
Why Your Ubiquiti EFG Can't Push 25 Gbps Inter-VLAN — and What's Actually Going On

Why Your Ubiquiti EFG Can't Push 25 Gbps Inter-VLAN — and What's Actually Going On

Or: How I Reproduced the Problem on x86, Tried to Load the Missing Modules on the Real Device, and What That Tells Us About Ubiquiti's Kernel


TL;DR

Ubiquiti markets the Enterprise Fortress Gateway (EFG) as a 25-gigabit-class router. The product page lists two 25 GbE SFP28 ports for WAN/LAN, and Ubiquiti positions the device as a flagship for medium and large enterprise deployments. Its silicon — a Marvell Octeon CN9670 — supports hardware-accelerated forwarding through purpose-built network engines (NIX) that should sustain tens of millions of packets per second. The Cloud Gateway Max ("UDM Beast") pairs a Marvell Octeon CN10K SoC with a dedicated Marvell switch ASIC, and on paper should comfortably exceed 100 Gbps aggregate.

In practice, real-world enterprise deployments report:

  • Inter-VLAN routing: ~1–1.5 Gbps single-stream, regardless of how fast the upstream link is
  • PPPoE WAN throughput: ~2–3 Gbps single-stream on 10 Gbps fiber connections, where the ISP requires PPPoE authentication
  • Total aggregate throughput: well below the marketed 25 Gbps WAN/LAN figures

This document analyzes both bottlenecks. It reproduces both problems in a controlled lab environment on x86 hardware, identifies the specific software architectural choices that cause them, demonstrates fixes whose effects can be measured to a precision of a few hundred Mbps, and documents in detail what happened when we attempted to apply the most surgical of those fixes — adding the missing nftables flowtable module — to a real production EFG.

We will show that the EFG's stock configuration delivers between 5% and 15% of the throughput its silicon is capable of. We will show three independent fixes that together can push it from ~1 Gbps single-stream to over 25 Gbps single-stream — without adding hardware. Two of those fixes are pure software configuration changes; the third is a kernel module that exists in mainline Linux and is shipped by Marvell themselves, but is not present in Ubiquiti's kernel build.

We then attempt to install the missing module on a real EFG. Building it against vanilla Linux 5.15.72 produces a kernel module with byte-perfect vermagic — and crashes the device on load. Building it against Marvell's complete published OCTEON BSP source from the Yocto Project produces another byte-perfect module that crashes at the identical function offset. Symbol-level analysis of the running EFG kernel reveals 6,357 unique symbols that exist in neither vanilla Linux nor Marvell's complete public BSP. These include conntrack extensions for proprietary DPI integration (nf_ct_ext_dpi_destroy, nf_conntrack_dpi_init), a 116-symbol tdts namespace exposing kernel internals to a closed-source Trend Micro DPI engine, and significant hardware abstraction additions.

The conclusion: Ubiquiti has built a substantially modified kernel that they have not released sources for, and Ubiquiti's open-source download page no longer exists. Their GitHub organization contains no firmware or kernel sources. Closed-source tdts and t_miner modules link directly against kernel symbols and operate as derived works of the kernel. This appears to violate GPL-2.0, and continues a pattern: Ubiquiti was publicly accused of GPL violations in 2015 (resolved after sustained pressure) and again in 2019.

The performance issues in this document have been reported to Ubiquiti through their support channel for approximately one year, including specific implementation guidance pointing to Marvell's published DPDK reference architecture; no substantive engineering response has been received. Separately, security findings about the EFG's deliberate absence of secure boot, module signing, and integrity protection were submitted through Ubiquiti's HackerOne bug bounty program and rejected on the grounds that the attacker would require network access — a rationale that does not survive scrutiny when applied to a network gateway.

This is therefore both a technical analysis and a software-license compliance analysis, and it is published only after the channels designed for vendor engagement have failed to produce a response.


Table of Contents

  1. The Problem
  2. Test Environment
  3. Methodology
  4. The Reference Run: Real EFG Diagnostics
  5. Reproducing the Bottleneck — virtio-net Test Matrix
  6. Closing the Loop — Real Silicon Test Matrix
  7. Userspace Dataplane — VPP/DPDK Comparison
  8. The PPPoE Bottleneck — A Related but Distinct Problem
  9. Findings: The Architectural Failures
  10. Recommended Fixes
  11. Direct Experimental Verification — Building the Missing Modules
  12. Symbol-Level Forensics on the Running EFG Kernel
  13. The GPL Compliance Question
  14. Direct Vendor Engagement: What Ubiquiti Has Already Been Told
  15. Conclusion
  16. Appendix: Full Data Sets

1. The Problem

Ubiquiti markets the Enterprise Fortress Gateway (EFG) as a 25-gigabit-class router. The product page lists two 25 GbE SFP28 ports for WAN/LAN, and Ubiquiti positions the device as a flagship for medium and large enterprise deployments. Its silicon — a Marvell Octeon CN9670 — supports hardware-accelerated forwarding through purpose-built network engines (NIX) that should sustain tens of millions of packets per second. The Cloud Gateway Max ("UDM Beast") pairs a Marvell Octeon CN10K SoC with a dedicated Marvell switch ASIC, and on paper should comfortably exceed 100 Gbps aggregate.

In practice, real-world enterprise deployments report:

  • Inter-VLAN routing: ~1–1.5 Gbps single-stream, regardless of how fast the upstream link is
  • PPPoE WAN throughput: ~2–3 Gbps single-stream on 10 Gbps fiber connections, where the ISP requires PPPoE authentication
  • NAT throughput: similar single-flow ceilings whenever IPS, deep-packet-inspection, or threat management features are enabled

Customers complain, post mpstat screenshots showing one CPU core saturated while the other 17 sit idle, and get told it is a hardware limitation.

It is not. The CPUs are not the bottleneck. The silicon is not the bottleneck. The bottleneck is the configuration of the Linux kernel network stack that ships on the device, including:

  • Hardware offload features that are explicitly disabled
  • A modern kernel fast-path feature (nf_flow_table) that is not loaded
  • A user-space inspection engine running on the same CPU core that is forwarding packets
  • A 5-deep iptables FORWARD chain that every new connection must traverse
  • Conntrack protocol helpers loaded for legacy protocols (PPTP, H.323) that no enterprise control plane lets you disable
  • Per-VLAN bridges instead of a vlan-aware single bridge
  • No DPDK fast-path despite Marvell shipping first-class DPDK PMDs (cnxk) for these exact SoCs

Each one of these contributes measurable overhead. Combined, they drop forwarding throughput by an order of magnitude. The point of this article is to measure each contribution independently and show what a properly-configured Linux router looks like on the same workload.


2. Test Environment

Host Machine ("skywalker")

  • CPU: AMD Ryzen Threadripper Pro 7995WX, 96 cores / 192 threads, base 2.5 GHz, boost 5.1 GHz, Zen 4 microarchitecture
  • RAM: 754 GB DDR5 ECC
  • Hypervisor: Proxmox VE 9.0.11
  • Kernel: Linux 6.14.11-4-pve
  • Storage: NVMe ZFS root pool (rpool)
  • Networking: Mellanox ConnectX-6 Dx dual-port 100 Gbps NIC (MT2892), bonded LACP 802.3ad
  • IOMMU: AMD-Vi enabled in passthrough mode
  • Hugepages: 64 × 1 GB = 64 GB reserved at boot

Reference Device

  • EFG (Enterprise Fortress Gateway), Ubiquiti Networks
    • Marvell Octeon CN9670 SoC, 18 ARM v8.2 cores @ 2.0 GHz
    • 64 GB RAM
    • Linux 5.15.72-ui-cn9670 (vendor build)
    • Live production firewall, 8 days uptime at capture, 7 active VLANs in an enterprise office network

Lab VM Topology on skywalker

Three Ubuntu 24.04 LTS VMs were cloned from a common template, each pinned via a Proxmox hookscript to a dedicated CCD on the host (8 vCPUs each):

   192.168.6.0/24  (mgmt — for SSH, never used for test traffic)
        |          |          |
        +----------+----------+
        |          |          |
   gw-router    client1    client2
   (VM 200)    (VM 201)    (VM 202)
   8 cores     8 cores     8 cores
   16 GB RAM   8 GB RAM    8 GB RAM
   Cores 8-15  Cores 16-23 Cores 24-31

For test traffic (multiple network paths used in different tests):

   client1 ────[VLAN 10]──── gw-router ────[VLAN 20]──── client2
              10.10.10.10                                10.10.20.10
                   ↕                                          ↕
              gw-router                                  gw-router
              10.10.10.1                                10.10.20.1

The VMs received traffic through one of three I/O paths during testing:

  1. virtio-net through Linux bridges with VLAN tagging (vmbr1 on the host)
  2. ConnectX-6 Dx VFs via SR-IOV passthrough (4 VFs total, 2 to gw-router, 1 to each client)
  3. VPP/DPDK with the same VFs polled directly by VPP's worker threads in userspace

Single TCP stream iperf3 at MTU 1500 was used as the primary measurement. Multi-stream tests with -P 8 were used in select cases to demonstrate scaling behavior. Each measurement ran for 30 seconds with per-second reporting; the values reported are the iperf3 sender/receiver final summary, which agree to within 0.1 Gbps in all cases.


3. Methodology

To prove the architectural argument we needed to isolate independent variables:

Variable Settings tested
I/O fabric virtio-net (vhost-net backend), ConnectX VF (SR-IOV passthrough)
MTU 1500, 9000
Hardware offloads (GRO/TSO/LRO) on, off
Forwarding rules none, EFG-replica 5-chain ruleset
Forwarder kernel ip_forward, kernel ip_forward + nftables flowtable, VPP/DPDK userspace

For each combination, single-stream iperf3 between client1 and client2 (i.e. across the gw-router VM, between two distinct IPv4 subnets) was measured. Because the host CPU does not vary across tests and because vCPU pinning is fixed via a Proxmox hookscript that calls taskset after the VM starts, every test runs on the same physical cores in the same NUMA configuration.

The "EFG-replica 5-chain ruleset" was constructed from observation of the live EFG. It mirrors the EFG's iptables FORWARD structure of ALIEN → TOR → IPS → UBIOS_FORWARD_JUMP → user → default chains, with conntrack lookups, protocol/port matchers, and per-chain counters that force per-packet evaluation in the slow path. The exact ruleset is in the appendix.


4. The Reference Run: Real EFG Diagnostics

Before running anything in the lab, we captured the configuration of a production EFG to know what we needed to reproduce. Every command below was executed on a customer-deployed EFG running stock Ubiquiti firmware. None of these settings are user-configurable from the UI — they are baked into how the device is configured through the UniFi Web UI, which eventually reflects on the changes in the underlying Linux subsystems.

4.1 — Hardware and Kernel

$ uname -a
Linux EFG-Home-SP 5.15.72-ui-cn9670 #5.15.72 SMP Wed Apr 15 23:39:47 CST 2026 aarch64

$ nproc
18

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           63Gi        11Gi        46Gi       106Mi       5.3Gi         44Gi

$ uptime
02:09:29 up 8 days, 5:17, 1 user, load average: 2.52, 1.84, 1.86

Confirmed: Octeon CN9670 (per the kernel build identifier), 18 cores, 64 GB RAM. Kernel 5.15 dates from late 2021 — it predates several material networking improvements in 5.19+ (better flowtable hardware offload, improved nft, better mptcp, PPPoE flowtable acceleration in 6.2+).

4.2 — The 5-Deep FORWARD Chain (Smoking Gun #1)

$ iptables -L FORWARD -n -v --line-numbers
Chain FORWARD (policy ACCEPT 1033 packets, 157K bytes)
num   pkts bytes target                source       destination
1     555K  775M  ALIEN                 0.0.0.0/0    0.0.0.0/0
2     2764K 4489M TOR                   0.0.0.0/0    0.0.0.0/0
3     238M  354G  IPS                   0.0.0.0/0    0.0.0.0/0
4     874M 1342G  UBIOS_FORWARD_JUMP    0.0.0.0/0    0.0.0.0/0

In 8 days of uptime, this device has pushed:

  • 874 million packets through UBIOS_FORWARD_JUMP
  • 238 million through the IPS chain
  • 2.76 million through TOR
  • 555 thousand through ALIEN

Every packet that this gateway routes traverses at least 4 jump targets in sequence, plus whatever rules live inside each. Total rule count across filter, mangle, and nat tables:

$ iptables -t filter -L -n | wc -l
572
$ iptables -t mangle -L -n | wc -l
187
$ iptables -t nat -L -n | wc -l
80

839 rules total. And it's all running on the legacy iptables (xt_*) backend. The modern nft API is not in use:

$ nft list ruleset | wc -l
0

4.3 — No Flowtable. None. (Smoking Gun #2)

$ nft list flowtables
[empty output]

$ lsmod | grep -iE "flow_table|flowtable"
[empty output]

$ for iface in eth0 eth1 eth2 eth3; do
    ethtool -k $iface | grep hw-tc-offload
  done
[no output - module not loaded, feature not available]

The nf_flow_table kernel module is not loaded. There is no nft flowtable. There is no hardware tc-flower offload. The kernel's modern fast-path infrastructure — which can bypass conntrack and rule evaluation for established flows — is not even installed on this device.

This single missing piece is, as the lab measurements will show, worth a 3× to 7× single-stream throughput improvement on its own.

4.4 — Conntrack Sized for 10 Million, Currently 846 Used

$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 10485760

$ sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 846

$ lsmod | grep nf_conntrack
nf_conntrack_tftp     262144  1 nf_nat_tftp
nf_conntrack_pptp     327680  1 nf_nat_pptp
nf_conntrack_h323     327680  1 nf_nat_h323
nf_conntrack_ftp      327680  1 nf_nat_ftp

Four conntrack protocol helpers loaded: FTP, PPTP, H.323, TFTP. PPTP is a deprecated VPN protocol from the late 1990s. H.323 is a videoconferencing protocol from 1996, mostly displaced by SIP. TFTP and FTP are increasingly rare in modern enterprise environments.

The actual per-packet cost of having helpers loaded is more nuanced than "every packet is inspected" — see Section 9 Finding 5 for the precise breakdown. The short version: established non-helper flows pay essentially nothing per packet (a pointer check), but every new connection pays a hash lookup against the helper registry, and any flow on a helper-recognized port (FTP/21, etc.) pays the full inspection cost.

A "Firewall Connection Tracking" toggle does exist in the UniFi controller's Gateway settings, allowing administrators to disable individual helpers (FTP, H.323, SIP, GRE, PPTP, TFTP). Disabling them all unloads the helper modules from memory entirely. This addresses the lookup cost on new flows but does not affect already-established TCP throughput (the iperf3 inter-VLAN measurement is unchanged), and does not address the bigger architectural bottlenecks documented in Sections 5-9. Section 9 Finding 5 expands on what helpers actually cost and what would be required to keep helper functionality without the cost.

4.5 — The Inspection Tax (Smoking Gun #3)

$ ps -eo pid,pcpu,pmem,comm --sort=-pcpu | head -10
    PID %CPU %MEM COMMAND
4098469 39.6  0.0 dpi-flow-stats
   3139 12.5  0.1 ubios-udapi-ser
  66687  7.8  3.1 java
   4891  7.0  0.0 conntrackd
2491041  6.9  1.6 Suricata-Main
   5505  6.2  0.0 mcad
   8596  3.9  0.9 unifi-core
   4482  3.8  0.0 ulogd

dpi-flow-stats consuming 39.6% of one CPU core continuously. Add Suricata IPS (6.9%) and conntrackd (7.0%) and you have ~54% of one core permanently consumed by per-packet inspection processes that don't forward anything — they just observe.

Crucially, these userspace processes typically run on the same core that is doing kernel forwarding. We measured this exact pattern in the lab: a competing userspace consumer on the forwarding core directly reduces forwarding throughput.

4.6 — 18 Cores Sitting Idle

$ mpstat -P ALL 1 3 | grep Average
Average:     all   4.07    0.00    3.67   0.07   0.17   0.24   0.00   0.00   0.00   91.78
Average:       0  18.40    0.00    1.39   0.00   0.35   0.00   0.00   0.00   0.00   79.86
Average:       1  13.20    0.00    1.32   0.00   0.33   0.33   0.00   0.00   0.00   84.82
Average:       2   2.68    0.00    2.01   0.00   0.34   0.34   0.00   0.00   0.00   94.63
Average:       3   6.38    0.00    1.68   0.00   0.00   0.00   0.00   0.00   0.00   91.95
[... 14 more cores all near 95–100% idle ...]

91.78% average idle across 18 cores during light load. Under a single-flow stress test the picture is sharper: one core at near-100% softirq (the kernel's softirq context where __netif_receive_skb_core and ip_forward run), seventeen sitting at 0%. Single-flow forwarding is fundamentally a single-thread workload in the Linux kernel network stack: a TCP flow's packets all hash to the same RX queue, the queue is bound to one core, and that core does all the work.

Adding cores does not help. Faster cores help linearly. Removing per-packet kernel-stack work helps dramatically. A userspace dataplane that polls the NIC across multiple worker cores can fix this entirely — see Section 7.

4.7 — Per-VLAN Bridges Instead of VLAN-Aware Bridge

$ ip -br link | grep -E "^br[0-9]"
br0     UP    192.168.196.1/24
br1111  UP    [no address shown]
br254   UP    192.168.254.1/24
br3     UP    192.168.3.1/24
br5     UP    192.168.5.1/24
br6     UP    192.168.6.1/24
br7     UP    192.168.7.1/24

Each VLAN gets its own bridge (br3 for VLAN 3, br5 for 5, br6 for 6, etc.) hanging off switch0 subinterfaces (switch0.3, switch0.5, etc.). Inter-VLAN traffic must traverse:

client (VLAN 3) → br3 → switch0.3 → switch0 → kernel L3 lookup
                                         ↓
                                   ip_forward
                                         ↓
                              switch0.5 → br5 → client (VLAN 5)

Every L3 hop is a kernel ip_forward operation. A modern vlan-aware single bridge with bridge vlan filtering enabled and nf_flow_table could short-circuit established flows in a software fast-path. This setup cannot.

4.8 — Summary of EFG Diagnostic Findings

Finding Evidence Impact
5-chain iptables FORWARD 874 M packets through UBIOS_FORWARD_JUMP in 8 days Lab: 4.95 → 2.36 Gbps when applied (53% drop)
No flowtable, no module nft list flowtables empty, lsmod shows no flow_table Lab: virtio kernel 2.36 → 7.05 → 17.4 Gbps when added with offloads
Userspace inspection on data path dpi-flow-stats 39.6% CPU, Suricata 6.9% Permanent CPU pressure on forwarding core
Hardware offloads disabled hw-tc-offload off [fixed], GRO off Lab: 17 Gbps (on) → 5 Gbps (off) at MTU 1500
Per-VLAN bridges, no offload 7 separate br* devices Forces every inter-VLAN packet through kernel L3
Legacy iptables, not nftables nft list ruleset empty, 839 iptables rules Slower per-rule, locked out of fast-path features
Conntrack helpers always-on, no UI toggle nf_conntrack_{ftp,pptp,h323,tftp} all loaded Per-packet helper traversal for unused protocols
18 cores, 1 used at a time mpstat 91.78% idle average; single-flow saturates one core Single-flow workloads cannot scale across cores in the kernel
Old kernel (5.15) Predates several networking improvements including PPPoE flowtable Locks out post-5.19 nftables, flowtable, and PPPoE acceleration
No DPDK No cnxk PMD active despite full vendor support Forfeits 5-15× throughput available from the same silicon

5. Reproducing the Bottleneck — virtio-net Test Matrix

The first round of tests used standard virtio-net VMs on Linux bridges — the closest analogue to "hypervisor in front of network silicon" without involving the ConnectX hardware directly. The bridge vmbr1 was configured as VLAN-aware with VIDs 10 and 20.

Test 1 — MTU 9000, offloads on, no rules (best case baseline)

$ iperf3 -c 10.10.20.10 -t 30
[ ID] Interval         Transfer    Bitrate
[  5] 0.00-30.00 sec   59.2 GBytes 16.9 Gbits/sec       sender
[  5] 0.00-30.00 sec   59.2 GBytes 16.9 Gbits/sec       receiver

16.9 Gbps. mpstat showed CPU 3 at ~12% softirq during the test. This is what jumbo MTU + GRO/TSO buys you: each "packet" through the forward path is a ~64 KB super-segment that the kernel processes once. Approximately 30,000 forward operations per second, each on one core.

Test 2 — MTU 9000, offloads off

[  5] 0.00-30.00 sec   60.1 GBytes 17.2 Gbits/sec

17.2 Gbps. Surprisingly similar. With MTU 9000, even without GRO, packets are 8960 bytes each — still only ~6× the per-packet overhead of TSO super-segments. The per-packet kernel cost doesn't dominate yet.

Test 3 — MTU 1500, offloads off (the EFG-realistic baseline)

This is the configuration that matches what real Ubiquiti customers experience. Standard internet MTU, no jumbo frames, no offloads.

[  5] 0.00-30.00 sec   17.3 GBytes  4.95 Gbits/sec

4.95 Gbps. mpstat showed CPU 6 at 100% softirq, all other cores idle. This is the same shape as the EFG diagnostic — one core saturated, the others doing nothing. The Zen 4 core at 5+ GHz, doing nothing but softirq packet forwarding, ceilings at this number.

If we naively scale this for an Octeon ARM core at 2.0 GHz (about 3–5× slower per cycle for this workload), we'd predict ~1.0–1.6 Gbps. Real EFG measurements are in this range. We are reproducing the right physics.

Test 4 — Adding nf_conntrack module (no rules)

$ sudo modprobe nf_conntrack
$ sudo sysctl -w net.netfilter.nf_conntrack_max=10485760
[  5] 0.00-30.00 sec   16.9 GBytes  4.84 Gbits/sec

4.84 Gbps. Almost no impact. Module load alone is cheap; conntrack's cost shows up when rules invoke it.

Test 5 — Simple ct rule

table inet filter {
    chain forward {
        type filter hook forward priority 0; policy accept;
        ct state established,related accept
        ct state new accept
    }
}
[  5] 0.00-30.00 sec   16.2 GBytes  4.64 Gbits/sec

4.64 Gbps. A 4% drop from a single conntrack rule. After the first packet of a single-flow iperf3 stream, the conntrack entry exists; lookup is O(1). The cost is real but small for a single long-lived flow.

Test 6 — EFG-replica 5-chain ruleset (the headline bad number)

The full ruleset emulating what we observed on the EFG: 5 jump chains, conntrack per chain, per-rule counters, multiple matchers per rule:

table inet filter {
    chain alien_chain  { counter; ip protocol tcp counter; ip saddr 10.0.0.0/8 counter }
    chain tor_chain    { counter; ip protocol tcp counter; tcp flags & (syn|ack) == ack counter }
    chain ips_chain    { counter; ip protocol tcp counter; meta l4proto tcp counter; tcp dport { 1-65535 } counter }
    chain ubios_chain  { counter; ip protocol tcp counter; ct state established counter }
    chain user_chain   { counter; ct state established,related counter; ip saddr 10.10.10.0/24 ip daddr 10.10.20.0/24 counter }

    chain forward {
        type filter hook forward priority 0; policy accept;
        jump alien_chain
        jump tor_chain
        jump ips_chain
        jump ubios_chain
        jump user_chain
    }
}
[  5] 0.00-30.00 sec   7.99 GBytes  2.29 Gbits/sec

2.29 Gbps. The smoking gun. A 53% drop from the no-rule baseline of 4.95 Gbps. CPU 5 was pegged at 100% softirq during the entire run.

This is the EFG's per-packet cost on a fast x86 core. Scaling for Octeon ARM at 2.0 GHz: ~500–800 Mbps. Matches user reports of EFG inter-VLAN performance in the wild.

Test 7 — 8 parallel streams with EFG ruleset

$ iperf3 -c 10.10.20.10 -t 30 -P 8
[SUM] 0.00-30.00 sec  39.7 GBytes  11.4 Gbits/sec

11.4 Gbps aggregate across 8 streams. mpstat showed 2–3 cores busy: different flows hashed to different RX queues, different queues bound to different cores. Multi-flow forwarding scales (somewhat), but single-flow performance does not — each stream caps near the per-core ceiling.

This is why a single backup transfer or large Veeam replication will saturate at 1 Gbps even though the WAN can do 25: the flow is one TCP connection.

Test A — Adding nftables flowtable (the magic config change)

We replace the 5-chain ruleset with a flowtable directive:

table inet filter {
    flowtable f {
        hook ingress priority 0
        devices = { enp6s19, enp6s20 }
    }

    chain forward {
        type filter hook forward priority 0; policy accept;
        ip protocol { tcp, udp } flow add @f
        ct state established,related accept
    }
}
[  5] 0.00-30.00 sec   24.6 GBytes  7.05 Gbits/sec

7.05 Gbps. A 3.0× jump from 2.36 Gbps. flowtable installs an ingress fast-path that, after the first few packets of a flow are tracked, bypasses conntrack lookup and FORWARD chain evaluation entirely. The packet still goes through netfilter ingress hook; the slow path is just skipped.

Test B — Flowtable + offloads on, MTU 1500

[  5] 0.00-30.00 sec   60.9 GBytes  17.4 Gbits/sec

17.4 Gbps. A 7.4× improvement over the EFG-style ruleset baseline (2.36 Gbps). Same hardware. Same kernel. Same single TCP stream. The only changes: flowtable directive added, offloads enabled.

virtio-net Test Summary

# MTU Offloads Rules Single-stream
1 9000 on none 16.9 Gbps
2 9000 off none 17.2 Gbps
3 1500 off none 4.95 Gbps
4 1500 off + ct module 4.84 Gbps
5 1500 off + simple ct rule 4.64 Gbps
6 1500 off EFG 5-chain replica 2.36 Gbps
7 (8-stream) 1500 off EFG 5-chain 11.4 Gbps agg
A 1500 off flowtable 7.05 Gbps
B 1500 on flowtable 17.4 Gbps

6. Closing the Loop — Real Silicon Test Matrix

The virtio tests share a known limitation: virtio-net packets traverse the host's vhost-net kernel thread, which adds its own per-packet cost beyond what's in the guest. To prove that the kernel-stack overheads are independent of virtio's I/O fabric, we ran the same tests with SR-IOV pass-through of ConnectX-6 Dx Virtual Functions.

6.1 — SR-IOV Setup

The ConnectX-6 Dx supports up to 8 SR-IOV Virtual Functions per port. Without disturbing the existing LACP bond:

$ echo 4 > /sys/class/net/enp5s0f0np0/device/sriov_numvfs
$ cat /sys/class/net/enp5s0f0np0/device/sriov_numvfs
4

Four VFs were created (VF0-VF3), assigned dedicated MACs and isolated VLANs (110/120) at the eSwitch level, and passed through to the lab VMs:

  • VF0 (0000:05:00.2) → gw-router as VLAN 10 lab NIC
  • VF1 (0000:05:00.3) → gw-router as VLAN 20 lab NIC
  • VF2 (0000:05:00.4) → client1 (VLAN 10)
  • VF3 (0000:05:00.5) → client2 (VLAN 20)

The ConnectX-6 Dx eSwitch handled L2 between VFs in silicon — no traffic exited the physical port for the VLAN 10/20 lab traffic. The bond and the upstream network were unaffected.

Inside each VM, the VFs appeared as native ConnectX hardware via the mlx5_core driver. The VMs ran kernel ip_forward exactly as before; only the I/O fabric changed.

Test K1 — ConnectX VF, kernel forwarding, MTU 1500, offloads on, no rules

[  5] 0.00-30.00 sec   88.3 GBytes 25.3 Gbits/sec

25.3 Gbps single-stream. A 5.1× improvement over the equivalent virtio test (4.95 Gbps with offloads off). With offloads on, ConnectX hardware GRO is more efficient than virtio's, so the per-superpacket cost is even lower.

Test K2 — Same as K1, with EFG-style 5-chain ruleset

[  5] 0.00-30.00 sec   73.9 GBytes  21.1 Gbits/sec

21.1 Gbps. Only a 17% drop. With GRO collapsing wire packets into super-segments, the rule evaluation cost is amortized across ~40× fewer events. The EFG ruleset is still expensive per-event, but per-packet on the wire it's hidden by GRO.

Test K3 — ConnectX VF, offloads off, no rules

[  5] 0.00-30.00 sec   16.6 GBytes  4.74 Gbits/sec

4.74 Gbps. Statistically identical to the virtio-net test (4.95 Gbps). With offloads off, every wire packet hits ip_forward once. The per-packet ceiling on a Zen 4 core is the same regardless of NIC quality. The kernel stack itself is the bottleneck, not the I/O fabric, when offloads are off.

Test K4 — ConnectX VF, offloads off, EFG-style rules

[  5] 0.00-30.00 sec   16.4 GBytes  4.70 Gbits/sec

4.70 Gbps. Same as K3 within noise. The mlx5 kernel I/O path is heavier per-packet than virtio's vhost-net path — heavy enough that the EFG ruleset cost is hidden inside the I/O cost. Both paths still cap at the single-core software ceiling.

Real Silicon Test Summary

# NIC Offloads Rules Single-stream
K1 ConnectX VF on none 25.3 Gbps
K2 ConnectX VF on EFG 5-chain 21.1 Gbps
K3 ConnectX VF off none 4.74 Gbps
K4 ConnectX VF off EFG 5-chain 4.70 Gbps

The pattern is clear: with offloads off, the I/O fabric does not matter. With offloads on, it does. Hardware offloads collapse the per-packet processing cost in the kernel's hot path. Without them, even the world's fastest networking silicon ceilings around 5 Gbps single-stream because the kernel itself is the limit.

The EFG configuration disables hardware offloads. By doing so, it makes its own silicon irrelevant.


7. Userspace Dataplane — VPP/DPDK Comparison

VPP (Vector Packet Processor) is a userspace network dataplane built on DPDK that bypasses the kernel network stack entirely. It is what production-grade open-source routers (TNSR, DANOS) use, and it is what most enterprise-grade NFV appliances build on. We tested it both over virtio-net and over the ConnectX VFs.

A note on relevance to the EFG: Marvell ships a fully-supported DPDK Poll Mode Driver for the OCTEON family — the cnxk PMD, which covers CN9670 (in the EFG) and CN10K (in the UDM Beast). Marvell publishes reference architectures that combine OCTEON SoCs with VPP and DPDK-accelerated Suricata. Suricata itself has had native DPDK input mode since version 7.0 (released 2023). The components Ubiquiti would need to ship a userspace dataplane on the EFG are not research projects — they are vendor-blessed, production-deployed infrastructure that has been available for years.

Test V0 — VPP with virtio-net

[  5] 0.00-30.00 sec   23.7 GBytes  6.78 Gbits/sec

6.78 Gbps. Roughly equal to ip_forward + flowtable in the equivalent kernel test. VPP's show runtime revealed the cause:

dpdk-input    Vectors/Call: 0.05    Clocks/Packet: 1810
ip4-rewrite   Vectors/Call: 15.24   Clocks/Packet: 24.2

0.05 vectors per call on the input side. DPDK's whole performance story is amortizing per-syscall and per-context-switch overhead across batches of ~32–256 packets. Virtio-net feeds packets to DPDK one at a time. The polling loop is essentially empty. Userspace dataplane only delivers its promised speedup when paired with a userspace-friendly I/O backend (vhost-user) or real hardware.

Test V1 — VPP with ConnectX VF, offloads off on clients

[  5] 0.00-30.00 sec   54.9 GBytes 15.7 Gbits/sec

15.7 Gbps. Better than virtio-VPP (3× better) but actually worse than kernel-on-ConnectX with offloads on (25.3 Gbps). Why? VPP doesn't do GRO. It processes wire packets individually. With offloads off on the clients, every packet on the wire is 1500 bytes, and VPP processes ~1.4 million per second on one worker core.

The per-packet path through VPP is impressively cheap (ip4-input + lookup + rewrite + tx ≈ 78 cycles end-to-end on Zen 4) but it's still doing 40× more "work events" than the kernel + GRO setup, which only sees super-segments.

Test V2 — VPP with ConnectX VF, offloads on the clients (the headline)

[  5] 0.00-30.00 sec    124 GBytes  35.6 Gbits/sec

35.6 Gbps single-stream. Now the clients send fewer, larger TCP segments via TSO. ConnectX hardware can transmit each segment as a single frame on the wire (GSO/TSO offload at the NIC). VPP receives the resulting larger frames and forwards them with its low per-packet cost.

This is the headline number. 35.6 Gbps single-stream userspace dataplane forwarding on real silicon. Compared against the EFG's actual production performance on the same workload (~1 Gbps), this is the 15-35× ceiling that's possible with available open-source software on the same class of hardware.

VPP with show runtime during this test:

ip4-rewrite   Vectors/Call: 7.54    Clocks/Packet: 38.6
lab-vlan20-tx Vectors/Call: 8.65    Clocks/Packet: 37.9

VPP itself is doing 75–80 cycles of work per packet. On a 5 GHz core that's ~16 ns per packet. The theoretical ceiling for VPP on this hardware is hundreds of Gbps. The measured 35.6 Gbps is bottlenecked on the clients (their ability to generate packets), not on VPP.

7.1 — Estimating VPP/DPDK Throughput on the Octeon Silicon

The lab numbers are on Zen 4 at 5+ GHz. To estimate what VPP+DPDK would achieve on the EFG's ARM Cortex-A72-class cores at 2.0 GHz, we lean on published Marvell numbers and the cycle-counting visible in show runtime:

  • VPP per-packet cost in the lab: ~80 cycles on Zen 4 for full IP forwarding pipeline
  • ARM Cortex-A72 vs Zen 4 IPC for this workload: ~3-4× lower
  • Estimated cycles per packet on Octeon CN9670: 240-320 cycles
  • At 2.0 GHz: 6.25-8.3 million packets per second per core
  • At 1500-byte MTU: 9-12 Gbps single-stream per worker core
  • The Octeon CN9670 has dedicated NIX hardware engines that can offload portions of this further

Marvell's own published cnxk PMD benchmarks show single-core forwarding rates of 15-30 Mpps (millions of packets per second) for simple L3 forwarding, which corresponds to 18-36 Gbps at 1500 MTU per core. Across 4-6 worker cores (leaving control plane and inspection cores untouched), aggregate forwarding capacity easily reaches the 50 Gbps line rate of the EFG's two 25G ports, and single-stream throughput in the 15-25 Gbps range is realistic.

This means: on the same EFG silicon, with no hardware changes, a properly-architected DPDK dataplane should deliver 10-25× the inter-VLAN throughput the device achieves today, and eliminate the inspection-vs-forwarding CPU contention by giving each worker its own dedicated core with a vendor-supported PMD.


8. The PPPoE Bottleneck — A Related but Distinct Problem

Many enterprise customers (especially in countries where fiber-to-the-business is delivered via GPON or XGS-PON with PPPoE authentication) report that even when they have a 10 Gbps fiber link, single-stream throughput across their EFG WAN tops out around 2–3 Gbps. This is a separate bottleneck from inter-VLAN routing, but it has the same architectural root cause — and arguably worse manifestation, because the PPPoE path forces the kernel through multiple softirq passes per packet.

8.1 — Why PPPoE Is Slow on Stock Linux Routers

PPPoE encapsulates IP traffic in PPP frames inside Ethernet (ether_type 0x8864/0x8863). Every WAN packet must:

  1. Be encapsulated/decapsulated by the pppoe.ko kernel module on every transit
  2. Have its effective MTU reduced to 1492 bytes (eight bytes of PPPoE header), increasing per-packet overhead and forcing Path MTU Discovery
  3. Be processed by pppd in userspace for LCP/IPCP control plane and link state — packet flow events get notified to userspace
  4. Pass through additional packet copy for encapsulation/decapsulation in software
  5. Bypass the kernel's flowtable fast-path — until kernel 6.2, nf_flow_table had no PPPoE support at all; flows traversing PPPoE could not be offloaded
  6. Make multiple distinct kernel-stack passes: ingress on the underlying VLAN (eth2.11) → softirq 1 → pppoe_rcv → ip_input → ip_forward → ip_output → softirq 2 → pppoe_xmit → egress on the same or different VLAN

Combined with the per-packet kernel forward cost we measured (4.74 Gbps ceiling on a single Zen 4 core with offloads off), the additional encap/decap work, and the multi-pass softirq pattern, PPPoE single-stream throughput is fundamentally bound by:

  • Single-core ip_forward + pppoe.ko packet handling, which on a 2 GHz Octeon core lands in the 1-3 Gbps range — exactly what users report
  • No flowtable PPPoE acceleration (kernel 5.15 doesn't have it; the EFG runs 5.15)
  • Multiple softirq cores chained together, each handling part of the encap/decap/forward chain — this spreads CPU load across cores but adds latency and inter-core cache misses without actually speeding anything up
  • No DPDK PPPoE termination (would require accel-ppp or VPP's native PPPoE plugin in userspace)

8.2 — Direct Evidence on a Production EFG

The following data was captured during a single Netflix Fast.com speed test from a client device on the LAN, using the EFG's PPPoE WAN connection (Vivo XGS-PON, Brazilian ISP requiring PPPoE auth, link rated 1 Gbps but the same softwarepath would be used on a 10 Gbps link).

8.2.1 — Multiple ksoftirqd threads pegged simultaneously

$ top -bn1 -d 1 | head -15
top - 03:43:15 up 8 days, 6:51, load average: 3.81, 2.62, 2.33
%Cpu(s):  5.5 us,  5.0 sy,  0.0 ni, 52.5 id,  0.0 wa,  1.3 hi, 35.6 si,  0.0 st

    PID USER  PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     23 root  20   0       0      0      0 R 100.0   0.0  17:11.14 ksoftirqd/2
     48 root  20   0       0      0      0 R 100.0   0.0   4:13.86 ksoftirqd/7
     63 root  20   0       0      0      0 R 100.0   0.0  21:58.12 ksoftirqd/10
     83 root  20   0       0      0      0 R  72.2   0.0  10:05.83 ksoftirqd/14
     73 root  20   0       0      0      0 R  66.7   0.0  16:39.62 ksoftirqd/12
     12 root  20   0       0      0      0 R  55.6   0.0  16:02.71 ksoftirqd/0
2491041 root   5 -15 1768064   1.1g  19584 S  44.4   1.8   6:18.31 Suricata-Main
   3139 root   5 -15  383232  68736  28416 S  22.2   0.1 1495:37  ubios-udapi-ser
   8596 root  20   0   20.1g 647232  85440 S  16.7   1.0  474:44  unifi-core

This is the smoking gun for PPPoE. Six different ksoftirqd threads are running at 55-100% simultaneously — cores 0, 2, 7, 10, 12, and 14 — all chewing through softirq work for what is fundamentally a single-flow workload (one TCP stream from Fast.com's backend server, through the PPPoE WAN, to the LAN client).

The reason this is even worse than the inter-VLAN smoking gun: inter-VLAN forwarding has one core saturated. PPPoE has multiple cores in continuous softirq because the path itself does multiple distinct kernel-stack passes per packet (eth2.11 ingress → pppoe_rcv → ip_input → ip_forward → ip_output → pppoe_xmit → eth2.11 egress). Each pass can land on a different core via softirq scheduling. The kernel is doing more total work per packet and spreading it across cores in a way that creates cache-coherence overhead between cores. It's the worst of both worlds — single-flow throughput limited by per-core ceiling, but multi-core CPU consumption.

The corresponding mpstat -P ALL output confirms the picture:

03:43:24    CPU    %usr    %sys    %irq    %soft    %idle
03:43:24    all    5.65    2.74    1.01    32.49    58.00
03:43:24      0   50.55    0.00    0.00    49.45     0.00
03:43:24      2    0.00    0.00    1.00    81.00    18.00
03:43:24      6    0.00    0.00    0.00    61.62    38.38
03:43:24     10    1.01    1.01    2.02    66.67    29.29
03:43:24     14    0.00    0.00    0.00    85.29    14.71
03:43:24     17    0.00    0.00    0.00   100.00     0.00

Six cores at 50-100% softirq during a single Fast.com speed test. The aggregate %soft of 32.49% across 18 cores corresponds to ~5.85 cores fully consumed by softirq work — for one flow.

8.2.2 — Concurrent userspace load on the same cores

While ksoftirqd is burning multiple cores, the inspection processes are also running:

Suricata-Main      44.4% CPU
ubios-udapi-ser    22.2% CPU
unifi-core         16.7% CPU
ulogd               5.6% CPU

That's ~89% of one core equivalent of additional userspace work, often landing on the same cores doing softirq. The result: the cores doing softirq are being preempted by userspace, and the userspace processes are being preempted by softirq, in a continuous round-robin that prevents either from getting clean cycles.

8.2.3 — Modules confirm pure software PPPoE path

$ lsmod | grep -i ppp
pppoe         327680  2
pppox         262144  1 pppoe
ppp_generic   327680  6 pppox,pppoe
slhc          262144  1 ppp_generic

$ ps -eo pid,pcpu,comm,args | grep pppd
2878806  0.0  pppd  /usr/sbin/pppd call ppp1 nodetach

The full software PPPoE stack is loaded: pppoe.ko for PPPoE-specific encap, pppox.ko for PPP-over-X dispatch, ppp_generic.ko for the PPP framing engine, slhc.ko for VJ header compression, and pppd in userspace for control plane (LCP, IPCP, keepalives). Every WAN packet traverses all of these in sequence.

8.2.4 — All hardware offloads disabled on ppp1, with [fixed] flags

$ ethtool -k ppp1 | grep -E "tcp-segmentation|generic-(receive|segmentation)|large-receive|hw-tc-offload"
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
hw-tc-offload: off [fixed]

The [fixed] flag means the kernel module returns "this feature cannot be enabled" — they are hardcoded off in the ppp_generic driver. Even when generic-segmentation-offload was requested on (probably by some default state), the kernel refused. Pseudo-interfaces like ppp1 inherently can't do hardware TSO/LRO because there's no hardware behind them — it's a software encap layer. That's normal Linux behavior, but it means every PPPoE WAN packet gets TX-fragmented and RX-aggregated in software before being handed to or received from the underlying VLAN.

Note that generic-receive-offload: on does work for the receive path — but TSO does not exist on the egress side, so every outbound packet traverses the kernel stack individually.

8.2.5 — Confirmed: no flowtable modules in this kernel build

$ modinfo nf_flow_table_pppoe 2>&1
modinfo: ERROR: Module nf_flow_table_pppoe not found.

$ modinfo nf_flow_table 2>&1
modinfo: ERROR: Module nf_flow_table not found.

Not just unloaded — the kernel modules don't exist on the system. They aren't compiled into Ubiquiti's 5.15.72-ui-cn9670 kernel build. Even with root access, a customer cannot load the modules to enable flowtable acceleration. The fast-path infrastructure isn't shipped at all.

8.2.6 — MTU discrepancy confirmed

$ ip link show ppp1
ppp1: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492

$ ip link show eth2.11
eth2.11@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500

ppp1 MTU 1492 (1500 - 8 byte PPPoE header), eth2.11 MTU 1500. Every payload is 8 bytes smaller than it could be on raw Ethernet, increasing packet count for the same throughput. Small effect compared to the per-packet kernel cost, but it adds up at line rate.

8.3 — How PPPoE Integrates with DPDK Dataplanes

A reasonable question: PPPoE looks complicated, with control plane (PADI/PADO/PADR/PADS handshake, LCP/IPCP negotiation, keepalives, RADIUS) and dataplane (packet encap/decap) entangled. Can DPDK actually handle this, or is it fundamentally a kernel concept?

DPDK handles it well, but with a different architecture than the kernel uses.

The kernel's approach: pppoe.ko is a single module that does both control plane (handshake, LCP/IPCP, keepalives) and dataplane (encap/decap of every packet). Both run in softirq context, on whatever cores the kernel scheduler picks. The result is what we just measured: control plane and dataplane fighting for the same cores, with userspace processes (pppd) added on top.

DPDK splits this in two:

  1. Control plane stays in userspace as a regular process. Tools like accel-ppp (the most common open-source PPPoE BNG implementation, deployed by ISPs to terminate hundreds of thousands of sessions per box) handle PADI/PADO/PADR/PADS, LCP/IPCP, keepalives, session lifecycle, RADIUS authentication — everything that happens at session establishment or once per second per session. This doesn't need to be fast; it needs to be correct. accel-ppp added DPDK support around 2020 and is what ISP-grade BNGs use today.

  2. Dataplane runs as a fixed-cost pipeline stage. Once the session is up, every packet just needs an 8-byte header push (egress) or pop (ingress). In VPP (which has had a native PPPoE plugin since 2018), it's literally a node in the packet processing graph:

dpdk-input → ethernet-input → pppoe-input → ip4-input → ip4-lookup
           → ip4-rewrite → pppoe-encap → interface-output

The pppoe-input and pppoe-encap nodes are tiny — they push or pop 8 bytes, update some counters, and pass the packet to the next node in the same vector batch. Per-packet overhead for adding PPPoE to a VPP pipeline is roughly 30-50% above plain L3 forwarding, not 5-10× like the kernel softirq path imposes.

The critical difference: the kernel does control plane + dataplane on the same softirq path, blocking everything. DPDK does control plane in a slow, one-time-per-session userspace daemon, and dataplane as a small fixed-cost pipeline stage running on dedicated worker cores at line rate.

On Marvell silicon specifically: the Octeon CN9670 (the EFG SoC) is explicitly marketed by Marvell as a "Smart NIC and BNG" SoC. Their reference architectures combine:

  • The cnxk DPDK PMD handling raw Ethernet frames at line rate from the NIX hardware engines
  • accel-ppp running in userspace on dedicated control-plane cores, handling PPPoE control plane
  • Dataplane integrated into VPP's PPPoE plugin or a custom DPDK pipeline
  • Suricata in DPDK mode tapping the dataplane for inspection on dedicated worker cores

ISPs deploying this stack on Octeon hardware regularly hit 40+ Gbps PPPoE termination per box with 100K+ concurrent sessions. Companies like Calix, Adtran, and a handful of NFV vendors ship enterprise BNGs based on exactly this silicon, doing exactly this PPPoE workload, at 25+ Gbps per port. This isn't research — it's commodity, vendor-blessed, production-deployed infrastructure that has existed for years.

8.4 — The Fix Is Already in Mainline Linux (and DPDK)

Two independent fix paths exist:

Kernel path: Linux 6.2 (released February 2023) added PPPoE support to nf_flow_table via the nf_flow_table_pppoe module. Established TCP/UDP flows over PPPoE WAN can now be offloaded to the same software fast-path as native L3 traffic, bypassing both pppoe.ko and the netfilter slow path for in-progress flows. Combined with hardware tc-flower offload on supported NICs, modern Linux distros (OpenWrt 23.05+, recent VyOS, Mikrotik RouterOS 7) achieve near-line-rate PPPoE throughput on 10 Gbps links through software fast-path acceleration.

The EFG ships kernel 5.15 — released in late 2021, predating PPPoE flowtable acceleration by over a year. A kernel rebase to 6.6 LTS or later, with nf_flow_table_pppoe loaded and a flowtable directive added to nftables, would dramatically improve PPPoE WAN throughput without any hardware changes and without changing the dataplane architecture. The fix is a kernel module load and one nftables stanza.

DPDK path: Migrate the PPPoE termination from pppoe.ko in the kernel to accel-ppp + VPP's PPPoE plugin in userspace, on dedicated worker cores. This is the same architectural change as Fix 3 in Section 10 (DPDK + VPP for the dataplane), with PPPoE just being one more pipeline stage. Since Marvell ships full DPDK support for the Octeon CN9670 and publishes reference architectures combining DPDK + accel-ppp + VPP, this is integration work, not invention.

8.5 — Estimated PPPoE Improvement

Using the same scaling from Section 7.1:

Configuration Single-stream PPPoE throughput Notes
Current EFG (kernel 5.15, no flowtable, software pppoe.ko) ~2-3 Gbps per user reports; matches our multi-core ksoftirqd evidence
EFG + kernel 6.6 + nf_flow_table_pppoe enabled ~5-8 Gbps flowtable bypasses pppoe.ko + netfilter for established flows
EFG + kernel 6.6 + flowtable + hw-tc-offload ~8-9.5 Gbps near line-rate on 10G PPPoE links
EFG + DPDK (accel-ppp + VPP PPPoE plugin) line rate on 10 Gbps (and 25G aggregate) what ISP-grade BNGs achieve on this exact silicon

The point: PPPoE performance is not a hardware problem either. It is the same architectural failure (single-core kernel forwarding without acceleration) compounded by an additional encapsulation layer that mainline Linux now supports accelerating, and that DPDK has handled at line rate for years. The same fixes apply, with PPPoE benefiting more than inter-VLAN does because the multi-pass softirq pattern is so much more expensive in the current implementation.


9. Findings: The Architectural Failures

Putting together the EFG diagnostics and the lab measurements, the findings are unambiguous.

Finding 1: The kernel network stack on a single core has a ceiling around 5 Gbps single-stream when offloads are off, regardless of NIC

Evidence: virtio-net (4.95 Gbps) and ConnectX VF (4.74 Gbps) measure within experimental error on the same kernel with offloads disabled. The Zen 4 core is identical in both tests. The difference between 4.95 and 4.74 is in the noise.

Implication for the EFG: their 2 GHz Octeon ARM core has its own per-cycle ceiling that's 3-5× slower than Zen 4 for this workload, putting the EFG kernel forwarding ceiling at ~1.0–1.5 Gbps. Reported user numbers match this range. The hardware silicon is not what's limiting them; the per-core kernel stack is.

Finding 2: Hardware offloads (GRO/TSO/LRO) are the single highest-impact configuration variable

Evidence:

  • virtio kernel forwarding: 4.95 Gbps (off) → 17.4 Gbps with flowtable (on) — 3.5× swing
  • ConnectX VF kernel forwarding: 4.74 Gbps (off) → 25.3 Gbps (on) — 5.3× swing

EFG state: hw-tc-offload: off [fixed], generic-receive-offload: off. Hard-coded off in the firmware build.

Finding 3: The 5-chain iptables FORWARD pattern costs roughly half your throughput when offloads are also off

Evidence:

  • virtio-net + offloads off: 4.95 Gbps → 2.36 Gbps when EFG-style rules are applied (52% drop)
  • ConnectX VF + offloads on: 25.3 Gbps → 21.1 Gbps when applied (17% drop, hidden by GRO)

EFG state: identical rule structure (ALIEN → TOR → IPS → UBIOS_FORWARD_JUMP → user → default). Confirmed by direct iptables diagnostic showing 874 million packets having traversed UBIOS_FORWARD_JUMP in 8 days.

Finding 4: nftables flowtable is the missing 3-7× single-stream multiplier

Evidence:

  • virtio + EFG rules: 2.36 Gbps → 7.05 Gbps with flowtable added (3.0×)
  • virtio + flowtable + offloads on: 17.4 Gbps (7.4× over 2.36 baseline)

EFG state: nf_flow_table module not loaded. nft list flowtables is empty. The kernel module isn't even installed on the device. This is a one-line configuration change in nftables that Ubiquiti could ship and immediately triple single-stream inter-VLAN performance.

Finding 5: Conntrack helpers are loaded by default, and the per-packet cost is widely misunderstood

The popular description: "Every packet is inspected by every loaded helper." This is approximately wrong. The actual cost depends on which phase of a flow the packet belongs to.

Phase 1 — New connection (SYN packet, first packet of a flow):

When conntrack creates a new entry for a flow, it walks nf_ct_helper_hash — a hash table keyed by L4 protocol + port — to determine if any registered helper applies. For TCP/21 (FTP control), it finds the FTP helper and attaches it to the conntrack entry. For TCP/443 (HTTPS), it finds nothing and attaches no helper. The per-new-connection cost is one hash lookup against the helper registry. Small but real.

This phase also touches nf_ct_expect_hash — the expectations table — to check if this new flow matches a previously-expected data connection (e.g., the data port that an active FTP control session announced via PORT or PASV). Empty expectations table = essentially zero cost; an active expectations table = small additional lookup.

Phase 2 — Established flow (every subsequent packet):

Once a flow has a conntrack entry, the per-packet helper logic in nf_conntrack_in() reads:

help = nfct_help(ct);          // pointer load from conntrack entry
if (help && help->helper)       // both NULL for non-helper flows
    help->helper->help(skb, ct, ...);

For a flow with no helper attached — the vast majority of traffic, since helper-relevant ports are rare — this is two pointer loads and a branch. Modern CPUs predict the not-taken branch perfectly. The cost on non-helper flows is essentially zero.

For flows that DO have a helper attached (e.g., active FTP control connection, ongoing SIP call), the helper's ->help() callback runs on every packet to inspect for protocol events (PORT command, RTP setup, etc.). This is genuine per-packet cost, but it only applies to flows on helper-recognized ports.

Why iperf3 throughput doesn't change when helpers are disabled: An iperf3 inter-VLAN test uses a single TCP connection on iperf3's port (5201 by default). That port is not a helper-recognized port. The connection has no helper attached. Phase 2's two-pointer-load-and-branch is essentially free. Disabling helpers via the UI removes the modules from memory, eliminating Phase 1 lookup cost on new connections — but it does not change anything in Phase 2 for non-helper flows.

Why helpers nonetheless matter at scale: An enterprise router doing ~10,000 new connections per second — driven by lots of short HTTP requests, DNS resolutions, and other transient flows — pays the Phase 1 helper-hash-lookup tax 10,000 times per second. Removing helpers eliminates that. It's not a per-packet win on data flows, it's a per-new-connection win.

The proper fix is not removing helpers: a correctly-architected router uses the netfilter flowtable for the data path. With flowtable, established flows bypass the entire netfilter chain (helpers included) and go through the offloaded fast path. Helpers continue to run on connection setup and on the control connection of helper protocols (e.g., FTP control), but the data connection of those protocols can be offloaded. You get full helper functionality and zero per-packet cost on data flows, simultaneously. This is what mainstream Linux distributions ship in 2026.

The EFG's kernel does not have flowtable compiled in (Section 11).

Four implementation approaches that would do this correctly:

  1. nftables with explicit per-flow helper attachment (the modern, correct approach). Helpers attach only to flows matching explicit nftables rules — no global helper auto-attach, zero cost for any flow not matching the rule. Requires migrating from iptables to nftables.

  2. Userspace conntrack helpers via netlink (kernel 3.6+). The kernel forwards control packets to a userspace daemon, which parses protocols and inserts expectations back via netlink. Pros: kernel stays small, helper bugs don't crash the kernel, helpers can be updated independently of the kernel. Cons: control-plane latency increase.

  3. Don't NAT helper-protocol traffic at all. Modern protocols handle NAT traversal in the application layer (FTP passive mode, SIP+STUN/ICE, WebRTC). The kernel doesn't need to do ALG. Most enterprise gateways in 2026 have moved this direction; kernel helpers are legacy.

  4. Keep helpers, add flowtable (the practical fix for an existing iptables-based system). Helpers run on connection setup and helper-protocol control channels; flowtable handles the data path of every other flow. Best compatibility with existing rule sets.

EFG state: A "Firewall Connection Tracking" toggle in the UniFi controller's Gateway settings exposes individual checkboxes for FTP, H.323, SIP, GRE, PPTP, and TFTP helpers. Disabling them all unloads the helper modules entirely — which addresses Phase 1 lookup overhead on new connections but does nothing for the bigger architectural issues. The toggle's existence confirms that Ubiquiti's engineering team is aware that helpers cost something. They have implemented a partial fix (the toggle) instead of the proper fix (flowtable). The proper fix would require shipping nf_flow_table.ko, which they have chosen not to do (Section 11).

Finding 6: Multiple cores do not help single-flow forwarding in the kernel

Evidence: 8 parallel streams with the EFG ruleset reach 11.4 Gbps aggregate (~1.4 Gbps per stream). Single stream caps at 2.36 Gbps. The EFG's mpstat shows all 18 cores idle except the one with the active flow.

EFG state: 18 cores, but RSS hashes a single TCP 5-tuple to one queue, which binds to one core. Adding cores to a kernel-based router cannot fix single-flow performance. Faster per-core, fewer per-packet steps, hardware offload, or a userspace dataplane (which can poll across worker cores) can.

Finding 7: DPDK on the same silicon delivers 10-25× the throughput, and the vendor ships full DPDK support

Evidence:

  • Lab VPP/DPDK on ConnectX with offloads: 35.6 Gbps single-stream (15× over the EFG-style baseline)
  • Marvell's published cnxk PMD benchmarks: 18-36 Gbps single-core on CN9670-class silicon
  • Suricata 7.0+: native DPDK input mode shipped 2023
  • VPP: native cnxk plugin shipped 2020
  • The full reference architecture (DPDK + VPP + Suricata-on-DPDK) is published by Marvell and field-deployed by NFV vendors

EFG state: zero DPDK. The cnxk PMD is not loaded. Suricata runs in pcap mode (per-packet kernel→userspace copy) instead of DPDK mode. Ubiquiti would lose nothing by adopting DPDK — their primary inspection workload (Suricata) supports it, their silicon vendor supports it, and the resulting performance on the same hardware would be 10-25× higher.

Finding 8: Userspace inspection processes on the forwarding core directly reduce throughput

Evidence (EFG): dpi-flow-stats at 39.6% CPU + Suricata-Main at 6.9% + conntrackd at 7.0% = ~54% of one core continuously consumed by per-packet inspection that performs no forwarding.

Evidence (lab): a deliberate spinner pinned to a non-forwarding core had no effect (correctly isolated). When CPU contention is on the forwarding core, throughput drops proportionally.

Implication: even if Ubiquiti fixed every other issue, the inspection processes still pin a chunk of one core's cycles, leaving less for forwarding. The fix is to move them off the data-path core (kernel taskset/cgroup), or use kernel-side offloaded sampling (sFlow hardware counters), or — the best fix — use Suricata in DPDK mode on dedicated worker cores.

Finding 9: Per-VLAN bridges instead of vlan-aware single bridge prevent kernel fast-path optimization

Evidence (EFG): br0, br3, br5, br6, br7, br254, br1111 — one bridge per VLAN. Inter-VLAN traffic must traverse multiple bridge hops plus a kernel L3 lookup.

Lab equivalent: vmbr1 with VLAN-aware mode and bridge VID filtering allows a single bridge to handle all VLANs. With flowtable on top, established flows skip the bridge slow path entirely.

Implication: even without flowtable, switching to a vlan-aware bridge architecture would simplify the data path and enable bridge VID hardware offload paths that the current per-bridge structure cannot use.

Finding 10: PPPoE WAN performance is bottlenecked by the same kernel stack, with additional encapsulation cost — and worse multi-core spread

Evidence (deployment reports): enterprise customers on 10 Gbps PPPoE fiber consistently report 2-3 Gbps single-stream WAN throughput on the EFG.

Evidence (live capture during a Netflix Fast.com test on a production EFG): six different ksoftirqd kernel threads simultaneously consuming 55-100% CPU (cores 0, 2, 7, 10, 12, 14), with concurrent userspace inspection load (Suricata 44%, ubios-udapi-ser 22%, unifi-core 16%) competing for the same cores. The PPPoE encap/decap path forces multiple kernel-stack passes per packet, each potentially landing on a different core, multiplying total CPU consumption while not improving single-flow throughput.

Evidence (mainline Linux): kernel 6.2+ ships nf_flow_table_pppoe for PPPoE flowtable acceleration. The EFG runs kernel 5.15. The nf_flow_table and nf_flow_table_pppoe modules are not even compiled into Ubiquiti's kernel build — modinfo returns "Module not found" for both.

Implication: PPPoE WAN performance is not a hardware limitation. It is the same per-core kernel ceiling as inter-VLAN routing, with an additional encapsulation layer that mainline Linux now supports accelerating, and a multi-pass softirq pattern that is more expensive than plain inter-VLAN forwarding. The fix is a kernel rebase plus the same flowtable directive — or DPDK + accel-ppp + VPP, which Marvell publishes as a reference architecture for this exact silicon.

Finding 11: The EFG's kernel is binary-incompatible with vanilla 5.15.72 despite identifying as such, and the safety net that would catch this is disabled

Evidence: We cross-compiled nf_tables, nf_flow_table, and nf_flow_table_inet from vanilla linux-5.15.72.tar.xz (kernel.org), using the EFG's exposed /proc/config.gz as the build configuration. The resulting modules report a vermagic string identical character-for-character to the EFG's existing in-tree modules: 5.15.72-ui-cn9670 SMP mod_unload aarch64. Loading nf_tables.ko on the device caused an immediate kernel panic (NULL pointer dereference at virtual address 0x120 during module init), forcing a watchdog reboot.

Evidence (config audit):

$ zcat /proc/config.gz | grep -E "MODVERSIONS|TRIM_UNUSED_KSYMS|MODULE_SIG"
CONFIG_HAVE_ASM_MODVERSIONS=y
# CONFIG_MODVERSIONS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
[no CONFIG_MODULE_SIG entries]

CONFIG_MODVERSIONS would have caught the binary incompatibility at load time with a clean error message. It is disabled. CONFIG_MODULE_SIG (cryptographic module signing) is not even built into the kernel. lockdown is not enabled. The root filesystem is writable via overlay.

Implication: Two findings, both serious.

First, the EFG's kernel is not actually vanilla 5.15.72 even though it identifies as 5.15.72-ui-cn9670 and reports the upstream version. Ubiquiti has applied undisclosed patches that change netfilter's internal data structures or function signatures. Customers who attempt to enable missing kernel features by building from the announced upstream tag will produce modules that load (because vermagic matches) but crash (because the real ABI doesn't). This is exactly why the GPL exists — it requires vendors to publish the complete corresponding source so customers can rebuild against the actual kernel they received, not the vanilla one it claims to be.

Second, the security configuration is unusually permissive for an enterprise security product: no module signing, no kernel lockdown, no symbol-CRC verification, writable root via overlay. Any process that becomes root can load arbitrary unsigned, unverified kernel modules with no cryptographic check. Combined with the binary-incompatible-but-not-detected ABI, this is a pathway for both accidental crashes and deliberate exploitation.

A GPL source request was filed with [email protected] at the time of this writing. Until it is fulfilled, even a customer with full root access on hardware they own cannot enable the missing performance features safely. Section 11 documents this experiment in detail.


10. Recommended Fixes

The findings above translate directly to a list of prioritized configuration changes Ubiquiti could ship. None of these require new hardware. All are available in mainline Linux or as vendor-supported infrastructure from Marvell. Several are config changes that do not even require a kernel update.

Fix 1 (Highest Impact, Lowest Effort): Enable nftables flowtable

What: Load the nf_flow_table kernel module and add a flowtable directive to the active nftables ruleset. The hook is software-only (no hardware offload required) and works on any modern kernel (5.4+).

Configuration sketch:

table inet filter {
    flowtable f {
        hook ingress priority 0
        devices = { eth_lan_vlan10, eth_lan_vlan20, ... }
    }

    chain forward {
        type filter hook forward priority 0; policy accept;
        ip protocol { tcp, udp } flow add @f
        ct state established,related accept
        ... existing security rules ...
    }
}

Measured improvement: 2.36 → 7.05 Gbps single-stream (3.0×) on virtio. Combined with offloads enabled: 17.4 Gbps (7.4×).

Trade-off: Flows in the fast-path bypass conntrack and rule evaluation. Security rules must be applied to the first few packets of a flow, before it's offloaded. Existing iptables/nftables rules continue to work; only established flows are accelerated. The IPS / DPI processes that need every packet would need to be moved to a different inspection point (e.g., promiscuous tap on the bridge, or sFlow sampling) — but most of them only need flow-level visibility, which conntrack already provides.

Fix 2 (Highest Impact on Hardware): Enable hardware offloads (GRO/TSO/LRO/hw-tc-offload)

What: Stop hard-coding hw-tc-offload off [fixed]. Enable GRO and TSO on the kernel side. On the Octeon CN9670 (and CN10K on the UDM Beast), enable the NIX hardware acceleration path — these are first-party Marvell engines designed to forward packets without ARM core involvement.

Measured improvement: 4.74 → 25.3 Gbps single-stream (5.3×) on ConnectX VF with kernel forwarder when offloads enabled. The same pattern applies to any NIC with hardware-accelerated forwarding, including the Octeon NIX.

Trade-off: Hardware offload paths typically require the kernel and the device firmware to agree on which features can be offloaded. Some advanced features (like complex iptables matchers) can't be offloaded; the kernel falls back to software for those packets. This is a graceful degradation, not a failure — the fast path handles the common case, slow path handles edge cases. Modern flowtable in switchdev mode (which ConnectX-6 Dx and Octeon CN9670 both support) hands established TCP/UDP flows directly to silicon.

Fix 3 (The Biggest Architectural Win): Adopt DPDK + VPP for the dataplane

What: Migrate the forwarding plane from kernel ip_forward to VPP with the Marvell-supported cnxk DPDK PMD. Move Suricata to its native DPDK mode (available since Suricata 7.0). Pin VPP worker threads and Suricata workers to dedicated CPU cores, leaving the control plane (UniFi management, control plane protocols, dpi-flow-stats summaries) on a separate core.

Why this is the biggest win: Marvell publishes complete DPDK + VPP reference architectures for the OCTEON family. The cnxk PMD is open-source, well-maintained, and ships with mainline DPDK. Suricata's DPDK mode is production-deployed by major NFV vendors. Every component Ubiquiti needs is already vendor-supported, mainline open-source software. They lose nothing by adopting it.

Estimated improvement on EFG silicon:

  • Single-stream inter-VLAN: from ~1 Gbps to 15-25 Gbps (15-25×)
  • PPPoE WAN single-stream: from ~3 Gbps to 8-10 Gbps (line rate on 10G PPPoE)
  • Aggregate: from a few Gbps to line rate on both 25G ports (50 Gbps)
  • Inspection (Suricata): from kernel-pcap mode to DPDK direct, eliminating per-packet kernel→userspace copy

Trade-off: Largest engineering investment of any fix. Ubiquiti would need to rewrite their forwarding plane on top of VPP's API and integrate VPP's CLI/API with their UniFi controller. However, all the heavy lifting (the PMD, the dataplane, the Suricata DPDK integration) already exists. They are integrating, not inventing.

Fix 4 (Architectural): Move from per-VLAN bridges to a single VLAN-aware bridge

What: Replace br0, br3, br5, ... with a single bridge in bridge_vlan_filtering=1 mode, with VID assignments per port. Combined with nf_flow_table on the same bridge, this enables flowtable to short-circuit established flows entirely within the bridge layer.

Measured improvement: Indirect — enables Fix 1 and Fix 2 to be more effective, particularly for inter-VLAN flows that today must traverse multiple bridges. Direct measurements not made in this study, but Linux upstream has documented order-of-magnitude improvements in similar setups.

Trade-off: Configuration migration. Existing ruleset references to specific bridge devices need updating to reference the unified bridge. Manageable as a firmware update.

Fix 5 (CPU Hygiene): Pin userspace inspection processes off the data-path core

What: Use cgroup or taskset to ensure dpi-flow-stats, Suricata-Main, conntrackd, and similar processes do not run on the same CPU cores that handle network softirqs. On an 18-core Octeon, the data path occupies one core (sometimes two); the other 16+ are mostly idle. Pin inspection there.

Measured improvement: Indirect. Frees the forwarding core from contention, which becomes meaningful when the forwarder is pushing close to its single-core ceiling.

Trade-off: None of consequence. This is a basic Linux performance hygiene setting that any production router enables. The cost is a few sysfs/systemd-cgroup changes in the boot configuration. Becomes moot after Fix 3 (with DPDK, each worker has its own dedicated core by design).

Fix 6 (Modern API): Migrate from iptables to native nftables

What: The current ruleset is on the legacy iptables (xt_*) backend with 839 rules. Native nftables is faster per-rule, supports flowtable natively (Fix 1 builds on this), supports atomic ruleset replacement (no flushing), and is the future of Linux netfilter.

Measured improvement: Single-digit percentage points on its own; enables Fix 1 to reach its full potential.

Trade-off: Migration cost. Tools like iptables-translate automate most of it. The tools that produce the existing ruleset (presumably internal Ubiquiti config generators) need to emit nft syntax instead.

Fix 7 (Already partially shipped): Conntrack helper toggle

What: The UniFi controller already exposes a "Firewall Connection Tracking" control in Gateway settings, with checkboxes for FTP, H.323, SIP, GRE, PPTP, and TFTP helpers. Enterprise deployments without those legacy protocols can disable them all to unload the helper modules entirely.

What this actually does: Removes Phase 1 helper-hash-lookup overhead on new connections (see Section 9 Finding 5). On a router doing tens of thousands of new connections per second, this is a meaningful reduction in connection-setup CPU cost.

What this does NOT do: It does not change throughput on already-established TCP flows like an iperf3 test. The Phase 2 per-packet cost on non-helper flows is essentially zero whether helpers are loaded or not. iperf3 inter-VLAN single-stream throughput is unchanged.

Why this is a partial fix: The architecturally correct answer is to use the kernel's flowtable for the data path so that established flows bypass the entire netfilter chain (helpers and all) at line rate, while helpers continue to handle the control connections of legitimate helper-protocol traffic. That requires shipping nf_flow_table.ko, which the EFG does not have (Section 11). The toggle's existence is evidence that Ubiquiti's engineering team understands the helpers-cost-something question; they have shipped a partial mitigation rather than the proper fix.

Recommended action for administrators: If your deployment doesn't use FTP active-mode NAT, H.323 video conferencing, SIP through ALG (most modern SIP deployments use STUN/ICE instead), PPTP VPN, or TFTP, disable all of them. It's a free win on connection-setup costs.

Fix 8 (Long-term): Ship a newer kernel

What: Linux 5.15 LTS dates from late 2021. Kernel 6.6 LTS (the current LTS) includes substantial nftables, flowtable, bridge improvements, and PPPoE flowtable acceleration via nf_flow_table_pppoe (kernel 6.2+). Kernel 6.12 LTS includes hardware-offloaded flowtable for several NICs and improved per-CPU optimizations.

Measured improvement: Compounding with Fix 1, Fix 2, and the PPPoE acceleration. Recent kernels have made nf_flow_table faster per-packet, made hardware-offload setup easier, and added PPPoE-specific acceleration that the EFG completely lacks today.

Trade-off: Vendor kernel update. The Octeon vendor BSP (Marvell's "ubuntu-cn9670") will need to be rebased on a newer kernel. Not trivial but routine for a hardware vendor; Marvell themselves publish 6.x-based BSP releases.

Fix Priority Ranking

Priority Fix Effort Single-stream improvement
1 Enable flowtable Low (config) 3.0×
2 Enable hardware offloads Low–Medium (config + firmware) up to 5.3×
3 Adopt DPDK + VPP + Suricata-DPDK High (engineering) 15-25× — and fixes PPPoE too
4 Newer kernel (5.15 → 6.6+) Medium enables PPPoE flowtable, +small kernel gains
5 Pin inspection processes off data-path core Low (config) small but additive
6 Per-VLAN bridges → vlan-aware single bridge Medium (config migration) enables 1+2
7 iptables → nftables Medium enables 1, small direct
8 Conntrack helper toggles (already shipped — disable in UI) Free (UI checkbox) none on iperf3, small on connection setup

Doing Fix 1 alone gets you 3× the single-stream throughput. Fix 1+2 gets you 7×. Fix 3 — the long-term architectural fix that the silicon vendor literally publishes a reference architecture for — gets you 15-25×. The hardware does not need to change.


11. Direct Experimental Verification — Building the Missing Modules

The analysis to this point rests on lab measurements made on x86 hardware that reproduces the EFG's software stack. The lab data is reproducible and self-consistent, but a fair reader can ask: would the recommended fixes actually work on the real device?

To find out, we attempted the most surgical of the recommended fixes — adding the missing nftables flowtable kernel modules — to a production EFG. The exercise was instructive in ways we did not anticipate, and the results materially strengthen Section 9's findings about the EFG's kernel.

What follows is a complete, honest record of the attempt. Both attempts ultimately crashed the device. Neither outcome was the desired success path, but the failure modes themselves are diagnostic — they reveal precisely how far Ubiquiti's kernel diverges from any reproducible public source.

11.1 — Feasibility Assessment

Loading a third-party kernel module into a running kernel requires a few prerequisites:

  1. A matching kernel version (vermagic). The Linux module loader rejects any module whose vermagic string doesn't match the running kernel's exactly.
  2. Module loading not blocked by signing. If CONFIG_MODULE_SIG_FORCE=y or module.sig_enforce=1, only modules signed by an in-kernel trusted key can load.
  3. No kernel lockdown. If a Secure Boot lockdown is engaged, module loading from disk is restricted regardless of signing config.
  4. A writable filesystem location, since module files must be readable from disk by init_module(2) or finit_module(2).

We confirmed each on a production EFG via SSH:

$ cat /proc/cmdline
console=ttyAMA0,115200n8 earlycon=pl011,0x87e028000000 maxcpus=18 isolcpus=12 
rootwait rw coherent_pool=16M pcie_aspm=off net.ifnames=0 sysid=ea3d 
root=PARTUUID=...

No module.sig_enforce=1. No lockdown= argument. No lsm=lockdown,....

$ cat /sys/module/module/parameters/sig_enforce
N

Module signing not enforced.

$ zcat /proc/config.gz | grep -E "^CONFIG_(MODULE_SIG|SECURITY_LOCKDOWN|MODVERSIONS|TRIM_UNUSED_KSYMS)"
# CONFIG_MODULE_SIG is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set

This was both encouraging and concerning. Encouraging because it meant we had a clean path to load a custom-built module if we could match vermagic. Concerning because these missing options are exactly the safeguards a production firmware should have:

  • MODULE_SIG: prevents loading unsigned modules. Any process with CAP_SYS_MODULE (root, in containers if not seccomp'd) can load arbitrary kernel code.
  • MODVERSIONS: adds CRC checksums to every exported symbol. A module built against a kernel with subtly different struct layouts will be refused at load time rather than crashing the kernel later.
  • TRIM_UNUSED_KSYMS: limits the surface area of exposed kernel symbols.
  • SECURITY_LOCKDOWN_LSM: restricts what root can do to a running kernel.

The implications of these absences are explored further in Section 9, Finding 10. For the experiment, they meant that load-time symbol mismatches would not be caught — the kernel would happily start executing code with bad assumptions about struct layouts.

The EFG's filesystem is overlayfs root with a writable upper layer at /mnt/.rwfs/data. Modules placed in /tmp survive long enough to load.

The flowtable modules (nf_flow_table.ko, nf_flow_table_inet.ko, plus nf_tables.ko as a dependency) are absent from the EFG's /lib/modules/:

$ find /lib/modules/$(uname -r) -name 'nf_flow_table*' -o -name 'nf_tables.ko'
[no output]

$ modinfo nf_flow_table
modinfo: ERROR: Module nf_flow_table not found

The modules are not merely disabled; they are not present in the build. We needed to compile them ourselves.

11.2 — Cross-Compilation Setup, Attempt 1: Vanilla 5.15.72

A separate build VM was provisioned on the lab host:

  • Ubuntu 24.04 LTS, 16 vCPU, 32 GB RAM
  • gcc-10-aarch64-linux-gnu 10.5.0 from the noble-universe repository (matches the EFG's compiler family)
  • Linux 5.15.72 source tree from kernel.org

The EFG's running kernel reports itself as:

$ uname -r
5.15.72-ui-cn9670

$ uname -a
Linux EFG-Home-SP 5.15.72-ui-cn9670 #5.15.72 SMP Wed Apr 15 23:39:47 CST 2026 
aarch64 GNU/Linux

$ strings /lib/modules/5.15.72-ui-cn9670/kernel/net/netfilter/nf_conntrack_ftp.ko \
    | grep -E '^(vermagic|name)='
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_conntrack_ftp

The build process:

$ export ARCH=arm64
$ export CROSS_COMPILE=aarch64-linux-gnu-
$ export CC=aarch64-linux-gnu-gcc-10

$ cd ~/efg-build/vanilla-5.15.72/linux-5.15.72
$ cp ~/efg-build/efg-running.config .config

# Set CONFIG_LOCALVERSION inside the .config (not the env)
$ ./scripts/config --set-str CONFIG_LOCALVERSION "-ui-cn9670"

# Enable the modules we want to build
$ ./scripts/config --module CONFIG_NF_TABLES
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE_INET

# Disable BTF generation (would require pahole on EFG kernel — not available)
$ ./scripts/config --disable CONFIG_DEBUG_INFO_BTF

# Reconcile
$ make olddefconfig

$ time make -j$(nproc) modules
real    1m52s

$ for ko in net/netfilter/nf_tables.ko \
            net/netfilter/nf_flow_table.ko \
            net/netfilter/nf_flow_table_inet.ko; do
    strings $ko | grep -E '^(vermagic|name)='
  done
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_flow_table
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
name=nf_flow_table_inet

Three modules. All vermagic strings byte-perfect matches for the EFG kernel.

11.3 — The Vanilla Build Crashed the Device

The modules were copied to the EFG and loading was attempted in dependency order:

$ scp nf_tables.ko nf_flow_table.ko nf_flow_table_inet.ko \
      root@efg-prod:/tmp/

$ ssh root@efg-prod
# cd /tmp
# insmod ./nf_tables.ko
[connection drops, device reboots]

The kernel oops, captured before the watchdog reboot:

[ ... ] Unable to handle kernel NULL pointer dereference at virtual address 0x120
[ ... ] Mem abort info:
[ ... ]   ESR = 0x96000004
[ ... ]   FSC = 0x4: level 0 translation fault
[ ... ] Internal error: Oops: 96000004 [#1] SMP
[ ... ] Modules linked in: nf_tables(+) wireguard libchacha20poly1305 ...
[ ... ] CPU: 3 PID: 211748 Comm: insmod Tainted: P W O 5.15.72-ui-cn9670 #5.15.72
[ ... ] Hardware name: Marvell OcteonTX CN96XX board (DT)
[ ... ] pc : nf_tables_init_net+0x18/0x94 [nf_tables]
[ ... ] lr : ops_init+0x3c/0x120
[ ... ] Call trace:
[ ... ]  nf_tables_init_net+0x18/0x94 [nf_tables]
[ ... ]  ops_init+0x3c/0x120
[ ... ]  register_pernet_operations+0xec/0x240
[ ... ]  register_pernet_subsys+0x2c/0x50
[ ... ]  nf_tables_module_init+0x24/0x100 [nf_tables]

The HA secondary in the home cluster failed over within ~8 seconds. Service was restored without operator intervention.

The crash happened at byte 24 of the function nf_tables_init_net — extremely early in the per-network-namespace initialization. nf_tables_init_net is one of the very first things register_pernet_subsys calls when the module starts up. It tries to read a field at offset 0x120 from a struct pointer that the kernel allocated, and the kernel handed back a struct that doesn't have a valid pointer at that offset.

This isn't a "missing symbol" error or a "wrong function signature" error. The module loaded successfully. Its symbols resolved against the running kernel's symbol table. Execution started. And then, within microseconds, it dereferenced a struct field at an offset where the running kernel doesn't have what our module expected.

That's an ABI mismatch — the structure layout in our build's view of the kernel is different from the structure layout in the EFG's running kernel.

11.4 — Why Vanilla 5.15.72 Crashed

The crash happens because:

# CONFIG_MODVERSIONS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set

Without MODVERSIONS, the kernel module loader has no per-symbol CRC to compare. Vermagic only checks "this is kernel 5.15.72-ui-cn9670 SMP aarch64" — it doesn't say "the struct net has a particular field at offset 0x120." If the EFG's nf_tables_pernet struct has a different field count than vanilla's, the build still produces a module that loads cleanly. It just crashes when execution hits a misaligned access.

This means either:

  • (a) Ubiquiti rebased Linux 5.15.72 on top of patches from a different kernel version, OR
  • (b) Ubiquiti or a vendor (Marvell) added fields to internal structures that vanilla 5.15.72 doesn't have, OR
  • (c) Both.

Section 11.5 below addresses (b) directly by attempting to build against Marvell's complete published BSP — the largest plausible source of vendor-specific kernel patches for this silicon.

11.5 — Cross-Compilation Setup, Attempt 2: Marvell OCTEON BSP

The Marvell OCTEON CN9670 SoC has substantial vendor-specific Linux support that is not in mainline. Marvell maintains kernel patches for their hardware engines (NIX network units, RVU resource virtualization, NPA packet allocator, SSO event scheduler, CPT crypto), and these patches frequently touch core kernel infrastructure including netfilter (where Marvell integrates hardware flow offload acceleration).

Marvell publishes their kernel patches through the Yocto Project's linux-yocto repository, branch v5.15/standard/cn-sdkv5.15/octeon, maintained by Bo Sun (Marvell engineer) and merged by the Yocto Project's kernel maintainer (Bruce Ashfield). This is a public, GPL-licensed source tree.

$ git clone https://git.yoctoproject.org/linux-yocto.git linux-yocto-cnxk-5.15
$ cd linux-yocto-cnxk-5.15
$ git checkout v5.15/standard/cn-sdkv5.15/octeon

$ head -5 Makefile
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
PATCHLEVEL = 15
SUBLEVEL = 203
EXTRAVERSION =

The branch HEAD is at 5.15.203 (a stable update) with the full Marvell OCTEON CN9K patch set applied on top.

Examination of the source tree shows the BSP modifies sixteen netfilter-related header files compared to vanilla Linux 5.15.72:

$ for f in $(find ~/vanilla-5.15.72/include -name "*netfilter*" -o -name "*nf_*"); do
    rel=${f#*/include/}
    bsp=~/linux-yocto-cnxk-5.15/include/$rel
    if [ -f "$bsp" ] && ! diff -q "$f" "$bsp" >/dev/null 2>&1; then
      echo "DIFFERS: $rel"
    fi
  done

DIFFERS: net/netfilter/nf_conntrack.h
DIFFERS: net/netfilter/nf_conntrack_count.h
DIFFERS: net/netfilter/nf_conntrack_timeout.h
DIFFERS: net/netfilter/nf_flow_table.h
DIFFERS: net/netfilter/nf_nat_redirect.h
DIFFERS: net/netfilter/nf_tables.h
DIFFERS: net/netfilter/nf_tables_core.h
DIFFERS: net/netfilter/nf_tproxy.h
DIFFERS: net/netns/netfilter.h
DIFFERS: linux/netfilter.h
DIFFERS: linux/netfilter_defs.h
DIFFERS: linux/netfilter/nf_conntrack_sctp.h
DIFFERS: uapi/linux/netfilter_bridge.h
DIFFERS: uapi/linux/netfilter/nf_conntrack_common.h
DIFFERS: uapi/linux/netfilter/nf_conntrack_sctp.h
DIFFERS: uapi/linux/netfilter/nf_tables.h

Several of these headers contain function-signature changes that explain why a vanilla-built module would crash. For example, in nf_conntrack_count.h:

-unsigned int nf_conncount_count(struct net *net,
-                                struct nf_conncount_data *data,
-                                const u32 *key,
-                                const struct nf_conntrack_tuple *tuple,
-                                const struct nf_conntrack_zone *zone);
+unsigned int nf_conncount_count_skb(struct net *net,
+                                    const struct sk_buff *skb,
+                                    u16 l3num,
+                                    struct nf_conncount_data *data,
+                                    const u32 *key);

The function was renamed, and its signature changed. In nf_flow_table.h:

-int flow_offload_route_init(struct flow_offload *flow,
-                            const struct nf_flow_route *route);
+void flow_offload_route_init(struct flow_offload *flow,
+                             struct nf_flow_route *route);

Return type changed from int to void; const removed from the route argument.

The same header backports a feature from kernel 6.2 — PPPoE flowtable acceleration — into 5.15:

+static inline bool nf_flow_pppoe_proto(struct sk_buff *skb, __be16 *inner_proto)
+{
+    if (!pskb_may_pull(skb, ETH_HLEN + PPPOE_SES_HLEN))
+        return false;
+
+    *inner_proto = __nf_flow_pppoe_proto(skb);
+    return true;
+}

This last item is significant: Marvell's BSP includes a PPPoE flowtable backport that mainline 5.15 does not have. If we can build a module against this BSP and load it on the EFG, we should — in principle — get not only inter-VLAN flowtable acceleration but PPPoE flowtable acceleration as well.

The build:

$ cd linux-yocto-cnxk-5.15

# Force SUBLEVEL=72 to match EFG vermagic (BSP HEAD is 5.15.203)
$ sed -i 's/^SUBLEVEL = .*/SUBLEVEL = 72/' Makefile

# Suppress kbuild dirty marker
$ touch .scmversion

# Apply EFG running config and target modules
$ cp ~/efg-build/efg-running.config .config
$ ./scripts/config --set-str CONFIG_LOCALVERSION "-ui-cn9670"
$ ./scripts/config --module CONFIG_NF_TABLES
$ ./scripts/config --enable CONFIG_NF_TABLES_INET
$ ./scripts/config --enable CONFIG_NF_TABLES_IPV4
$ ./scripts/config --enable CONFIG_NF_TABLES_IPV6
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE
$ ./scripts/config --module CONFIG_NF_FLOW_TABLE_INET
$ ./scripts/config --enable CONFIG_NF_FLOW_TABLE_IPV4
$ ./scripts/config --enable CONFIG_NF_FLOW_TABLE_IPV6
$ ./scripts/config --disable CONFIG_DEBUG_INFO_BTF
$ ./scripts/config --disable CONFIG_MODULE_SIG_ALL

$ make olddefconfig
$ make kernelrelease
5.15.72-ui-cn9670

$ time make -j$(nproc)
real    1m59s

Five modules built, all with byte-perfect vermagic:

$ for ko in $(find . -name 'nf_tables.ko' -o -name 'nf_flow_table*.ko' | sort); do
    echo "=== $(basename $ko) ==="
    strings $ko | grep -E '^(vermagic|name|depends)='
  done

=== nf_flow_table.ko ===
name=nf_flow_table
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_inet.ko ===
name=nf_flow_table_inet
depends=nf_flow_table,nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_ipv4.ko ===
name=nf_flow_table_ipv4
depends=nf_flow_table,nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_flow_table_ipv6.ko ===
name=nf_flow_table_ipv6
depends=nf_flow_table,nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
=== nf_tables.ko ===
name=nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64

11.6 — The BSP Build Crashed at the Same Function Offset

# insmod ./nf_tables.ko
[connection drops, device reboots]

Captured kernel trace before reboot:

[ 3368.013405] Unable to handle kernel NULL pointer dereference at virtual address 0
[ 3368.022216] Mem abort info:
[ 3368.025005]   ESR = 0x96000005
[ 3368.028072]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 3368.033402]   FSC = 0x05: level 1 translation fault
[ 3368.074382] Modules linked in: nf_tables(+) wireguard libchacha20poly1305 ... 
                xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O) 
                xt_dyn_random ip6table_nat xt_conntrack xt_connmark xt_TCPMSS pppoe 
                pppox bonding xt_dpi(O) ip6table_mangle iptable_mangle ip6table_filter 
                ip6_tables uio_pdrv_genirq ui_lcm(O) ifb ppp_generic slhc 
                ubnthal(PO) ubnt_common(PO) drm drm_panel_orientation_quirks
[ 3368.121977] CPU: 3 PID: 211748 Comm: insmod Tainted: P W O 5.15.72-ui-cn9670 #5.15.72
[ 3368.130936] Hardware name: Marvell OcteonTX CN96XX board (DT)
[ 3368.143638] pc : nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.149059] lr : ops_init+0x3c/0x120
[ 3368.227314] x2 : ffff00019027b300 x1 : 0000000000000000 x0 : 0000000000000000
[ 3368.229754] Call trace:
[ 3368.234825]  nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.238053]  ops_init+0x3c/0x120
[ 3368.242840]  register_pernet_operations+0xec/0x240
[ 3368.247195]  register_pernet_subsys+0x2c/0x50
[ 3368.252609]  nf_tables_module_init+0x24/0x100 [nf_tables]

Identical crash signature. nf_tables_init_net+0x18, called from the same path.

Two builds:

Source tree Result
Vanilla Linux 5.15.72 (kernel.org) Crash at nf_tables_init_net+0x18
Marvell BSP linux-yocto v5.15/standard/cn-sdkv5.15/octeon HEAD with SUBLEVEL forced to 72 Crash at nf_tables_init_net+0x18

If the crash were caused by Marvell BSP patches, the BSP-built module would have crashed somewhere different (or — ideally — not at all). It crashed at the exact same instruction. That tells us:

  • The crash is NOT primarily caused by Marvell BSP patches; it's caused by something on top of the BSP
  • Ubiquiti has applied additional, non-public patches to the kernel that affect netfilter per-net data layout
  • These additional patches are not derivable from any combination of Linux mainline + Marvell's published OCTEON BSP

The Modules linked in line of the panic trace lists the modules already loaded on the EFG when our module tried to initialize:

xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O) 
xt_dyn_random ip6table_nat xt_conntrack xt_connmark ...
xt_dpi(O) ... ui_lcm(O) ... ubnthal(PO) ubnt_common(PO)

The taint flags (O) and (PO) in Linux's module taint vocabulary mean:

  • O — out-of-tree module
  • P — proprietary (non-GPL) module
  • PO — both proprietary and out-of-tree

The presence of t_miner(PO), tdts(PO), nf_app(PO), xt_geoip(O), xt_dyn_random, tm_crypto(O), xt_dpi(O), ui_lcm(O), ubnthal(PO), and ubnt_common(PO) in the running kernel's module list is documentary evidence of the closed-source kernel modules Ubiquiti is shipping.

Section 13 returns to this point to evaluate the GPL implications.

11.7 — Module Symbol Tables Show Limited Debug Information

Before drawing conclusions, we examined the EFG's existing kernel modules to determine whether Ubiquiti ships debug information that could aid investigation.

$ file /lib/modules/$(uname -r)/kernel/net/netfilter/nf_conntrack_ftp.ko
/lib/modules/.../nf_conntrack_ftp.ko: ELF 64-bit LSB relocatable, ARM aarch64, 
version 1 (SYSV), BuildID[sha1]=5827c50c..., not stripped

$ readelf -S nf_conntrack_ftp.ko | grep -i debug
[30] .gnu_debuglink    PROGBITS         0000000000000000  00001ed0

Modules are not stripped — symbol tables are intact, function and variable names are preserved. However, the only debug section is .gnu_debuglink, which is a 4-byte CRC + filename pointer that says "the actual debug info is in a separate file." That separate file (*.ko.debug) is not shipped on the production firmware.

This is by itself a defensible engineering decision (debug files are large), but combined with MODVERSIONS=N and kptr_restrict=0 (see Section 12 below), it creates a peculiar combination:

  • A normal user with sufficient privilege can dump the running kernel's complete symbol table at full virtual addresses
  • But cannot match those symbols to source-level constructs (struct field names, member offsets) without the debug info
  • And cannot rely on the kernel's own ABI-version tracking to detect mismatched modules

The debug info isn't shipped, so reverse-engineering structure layouts requires examining the binary kernel image directly. Section 12 documents what such an examination reveals.


12. Symbol-Level Forensics on the Running EFG Kernel

The crash at nf_tables_init_net+0x18 told us that the running kernel's internal layout differs from any combination of public sources we could build against. To quantify how far it diverges, we extracted the kernel image from the EFG and compared its symbol table against the symbol tables of vanilla Linux 5.15.72 and our Marvell BSP build.

12.1 — Extracting the Running Kernel

The EFG's kernel image is on disk at /boot/vmlinuz-5.15.72-ui-cn9670:

$ ls -la /boot/vmlinuz-5.15.72-ui-cn9670
-rw-r--r-- 1 root root 12071956 ... /boot/vmlinuz-5.15.72-ui-cn9670

$ file /boot/vmlinuz-5.15.72-ui-cn9670
gzip compressed data, max compression, from Unix, original size 28811776

$ gunzip -c /boot/vmlinuz-5.15.72-ui-cn9670 > efg-vmlinuz
$ binwalk efg-vmlinuz | head -3
DECIMAL    HEXADECIMAL  DESCRIPTION
0          0x0          Linux kernel ARM64 image, load offset: 0x0,
                        image size: 29818880 bytes, little endian, 64k page size

$ strings -a efg-vmlinuz | grep "Linux version"
Linux version 5.15.72-ui-cn9670 (bdd@builder) 
(gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld 2.35.2) 
#5.15.72 SMP Wed Apr 15 23:39:47 CST 2026

The kallsyms symbol table is dumped via /proc/kallsyms:

$ wc -l /proc/kallsyms
130789 /proc/kallsyms

$ head -2 /proc/kallsyms
ffff800008000000 T _text
ffff800008010000 T _stext

We note that kallsyms is unrestricted — full virtual addresses are visible. On most production systems, kernel.kptr_restrict is set to 1 or 2, which causes kallsyms to either redact or zero out the address column. The EFG ships with kptr_restrict=0. This is a security observation in its own right (it makes ROP and KASLR-bypass attacks easier), but for our purposes it provided complete ground-truth symbol data.

12.2 — Three-Way Symbol Comparison

We extracted the symbol tables from each source:

# Symbols in EFG's running kernel
$ awk '{print $3}' /tmp/efg-kallsyms.txt | sort -u > /tmp/efg-syms.txt

# Symbols in our Marvell BSP build
$ nm ~/efg-build/marvell-bsp/linux-yocto-cnxk-5.15/vmlinux \
    | awk '{print $3}' | sort -u > /tmp/bsp-syms.txt

# Symbols in vanilla 5.15.72
$ nm ~/efg-build/vanilla-5.15.72/linux-5.15.72/vmlinux \
    | awk '{print $3}' | sort -u > /tmp/vanilla-syms.txt

$ wc -l /tmp/*-syms.txt
 115998 /tmp/bsp-syms.txt
 120399 /tmp/efg-syms.txt
 112581 /tmp/vanilla-syms.txt

The diff: symbols present in the EFG kernel but absent from BOTH vanilla 5.15.72 AND the Marvell BSP build:

$ comm -23 /tmp/efg-syms.txt \
    <(sort -u /tmp/vanilla-syms.txt /tmp/bsp-syms.txt) \
    | grep -vE "^(\.L[0-9]+|\.LC[0-9]+|\.LBE|\.LFE|\.LFB|\.Letext|\.Ldebug|\.Lframe|__compound_literal\.|__func__\.|__warned\.|CSWTCH\.)" \
    > /tmp/efg-unique-real-syms.txt

$ wc -l /tmp/efg-unique-real-syms.txt
6357 /tmp/efg-unique-real-syms.txt

After filtering out compiler-generated local labels (which vary across every build of every kernel and carry no information), 6,357 unique symbols exist in the EFG's kernel that are present in neither vanilla Linux 5.15.72 nor Marvell's published OCTEON BSP.

12.3 — Categorization of the Unique Symbols

Grouping the unique symbols by name pattern reveals what Ubiquiti added:

Category Symbol count Examples
tdts_* (Trend Micro Deep-packet Threat Surveillance) 116 tdts_shell_dpi_l3_skb, tdts_shell_dpi_register_mt
tm_* (Trend Micro shared) 33 tm_crypto_* family
ubnthal_* (Ubiquiti HAL) 45 ubnthal_get_controller_host, ubnthal_get_cputype
ubnt_* (Ubiquiti utilities) additional ubnt_blk_wp_callback, ubnt_mtd_partition_read
HTTP protocol decoder (kernel-space) dozens BuildHTTP_request_KeywordTries, Create_HTTP_Protocol_Decoder
H.323 protocol decoder (kernel-space) dozens DecodeQ931, DecodeMultimediaSystemControlMessage
nf_*dpi* (Deep Packet Inspection conntrack extensions) several nf_conntrack_dpi_init, nf_ct_ext_dpi_destroy, nf_dpi_proc_dir
dpi_* (Deep Packet Inspection engine) dozens __kstrtab_dpi_main, related classification entry points
wg_* (WireGuard, partly upstream) 113 wg_*
Firmware signing key blobs a few UDMENT_CN9670_FW_KEY, UXG_AL324_FW_KEY

A note on terminology: throughout this section, "DPI" refers to Deep Packet Inspection — the application-layer traffic-classification feature that powers the UniFi dashboard's per-application traffic statistics and threat management. This is distinct from Marvell's hardware DPI block (DMA Packet Interface, also abbreviated DPI), which is a PCIe DMA engine on the OCTEON SoC and shows up in the kernel image as register-name strings like DPI_DMA_CONTROL and DPI_REQQ_INT. Those Marvell hardware-driver symbols are present in the public BSP and don't appear in the 6,357-symbol delta. The dpi_*, tdts_*, nf_*dpi*, and xt_dpi symbols below are the inspection-software layer Ubiquiti added on top.

Some of these are unsurprising (ubnthal_* is a clean abstraction layer; WireGuard was upstream by 5.6 but Ubiquiti may have backported aspects). Others are deeply diagnostic.

12.4 — Conntrack Extension for DPI: The Smoking Gun

The most consequential finding is in the nf_* namespace:

nf_conntrack_dpi_fini
nf_conntrack_dpi_init
nf_ct_ext_dpi_destroy
nf_dpi_proc_dir

The Linux conntrack subsystem has an extension framework (include/net/netfilter/nf_conntrack_extend.h) that allows kernel modules to attach per-flow metadata to each struct nf_conn. Adding a new extension type requires changes in both:

  • enum nf_ct_ext_id in nf_conntrack_extend.h (adding a new value)
  • The static array nf_ct_ext_types (adding a new entry)
  • Anywhere code iterates over extension types

The presence of nf_ct_ext_dpi_destroy is direct evidence that Ubiquiti has added a new conntrack extension (NF_CT_EXT_DPI or similar) to track DPI metadata per flow.

This change is precisely the kind that would alter struct nf_conn layout and per-net data structure layout — exactly the kind of change that would explain why nf_tables.ko built against any public source crashes when it tries to register a pernet_operations against the running kernel.

12.5 — tdts and t_miner: Closed-Source Kernel Modules

Examined more closely, the tdts namespace exposes kernel symbols:

__ksymtab_tdts_shell_dpi_l2_eth
__ksymtab_tdts_shell_dpi_l3_data
__ksymtab_tdts_shell_dpi_l3_skb
__ksymtab_tdts_shell_dpi_register_mt
__ksymtab_tdts_shell_dpi_unregister_mt
__ksymtab_dpi_main

The __ksymtab_* and __kstrtab_* symbols are how the kernel records what symbols a module exports. The names dpi_l2_eth, dpi_l3_data, dpi_l3_skb indicate these are functions for handling Ethernet frames and IPv4/IPv6 packets at layer 2 and layer 3 respectively. The _register_mt and _unregister_mt suffixes are netfilter xt_match (match-target) registration entry points.

The runtime panic dump in Section 11 showed these modules tagged tdts(PO) and t_miner(PO) — proprietary, out-of-tree.

The "tdts" name strongly suggests Trend Micro Smart Protection Network ("TMSPN" — TM Deep-packet Threat Surveillance, abbreviated tdts). Trend Micro licenses their threat-detection engine to network device vendors as a closed-source kernel module. The tm_crypto(O) and t_miner(PO) modules in the same panic trace fit the pattern: t_miner is a content-pattern matcher, tm_crypto is the encrypted-traffic analyzer.

These modules are not Ubiquiti's own code. They are licensed proprietary code from Trend Micro that Ubiquiti has integrated into their firmware. They link directly against kernel symbols (notable per the xt_dpi(O) netfilter match registered in the kernel's tainted-module list).

12.6 — Kernel-Embedded Application-Layer Decoders

The unique symbols also reveal that Ubiquiti has embedded application-layer protocol decoders directly in the kernel:

BuildHTTP_request_KeywordTries
Close_HTTP_Request_Connection
Create_HTTP_Protocol_Decoder
Free_HTTP_Protocol_Decoder
HTTP_Connection_Lost_Count
HTTP_Req_Count
Init_HTTP_Protocol_Decoder
NormalizeURI
Parse_HTTP_Request
ScanHTTPVersion
ScanRequestHeaders
URINormalize

DecodeMultimediaSystemControlMessage
DecodeQ931
DecodeRasMessage
_AdmissionConfirm
_AdmissionRequest
_Alerting_UUIE

The HTTP decoder symbols (camelCase, with _HTTP_ infix) appear to be from a Trend Micro protocol-parsing library running in kernel space. The H.323/Q.931 decoder symbols are similarly out-of-place for a kernel — these would normally live in userspace.

Running parsers for HTTP, H.323, and similar attacker-controllable formats inside the kernel is a substantial security risk. A bug in any of these decoders becomes a kernel vulnerability. Mainstream Linux distributions and other vendors deliberately keep this kind of code in userspace (Suricata, Snort, etc.) for exactly this reason.

12.7 — What the 6,357 Symbol Delta Means

To put 6,357 symbols in perspective:

  • Vanilla 5.15.72 has 112,581 unique symbols
  • Marvell's published BSP adds 3,417 net new symbols on top (a 3% increase)
  • Ubiquiti's running kernel has 6,357 symbols beyond Marvell's BSP — a further 5.5% increase

Phrased differently: roughly 1 in 19 symbols in the EFG's running kernel did not come from any source publicly available to a security researcher, GPL-rights-exercising customer, or independent third party.

This is the kernel that handles your VLAN traffic, your firewall rules, your VPN keys, and your DPI inspection. The behavior of this kernel cannot be audited from outside because the source for 5% of it is not published. The technical analysis in Section 11 demonstrates that this 5% includes substantial netfilter modifications.


13. The GPL Compliance Question

13.1 — What the GPL Requires

The Linux kernel is licensed under GPL-2.0. That license imposes specific obligations on anyone who distributes a binary derived from GPL-licensed source. The relevant provisions, summarized:

  1. The complete corresponding source code must be made available to recipients of the binary, under the same license, for at least three years (GPL-2.0 §3).
  2. Changes to GPL'd source files must themselves be GPL-licensed (GPL-2.0 §2, the "viral" clause).
  3. Linking proprietary modules against GPL kernel symbols is a contested legal area. Linus Torvalds and the Linux Foundation's longstanding position is that modules that use only EXPORT_SYMBOL (not EXPORT_SYMBOL_GPL) interfaces and "can plausibly be shown to be independent" may be distributed under non-GPL licenses, but there is no clean legal answer here. The Free Software Foundation's position is stricter: any kernel module is a derived work.
  4. A written offer to provide source must accompany the binary distribution, valid for at least three years.
  5. Derived works that combine GPL and proprietary code in linked form typically must be GPL-licensed in their entirety.

13.2 — Where Ubiquiti Stands on These Obligations

13.2.1 — Has Ubiquiti released the kernel source?

Ubiquiti previously maintained an open-source download page at ui.com/download/open-source, but that page no longer exists. As of this writing (May 2026), Ubiquiti's main website does not host any GPL source code archives that we could locate. The Ubiquiti GitHub organization (https://github.com/ubiquiti) contains only two repositories: support-tools and freeswitch. Neither contains kernel sources or firmware sources for any current product.

This is not the first time Ubiquiti's GPL compliance has been questioned. The Wikipedia article on Ubiquiti documents a recurring pattern:

  • 2015: Ubiquiti was accused of violating GPL terms for code in their products. Specifically, customers requested the source for the GPL-licensed U-Boot bootloader and Ubiquiti refused, making it impractical for customers to fix a security issue. The source was eventually released after sustained public pressure.
  • 2019: Ubiquiti was again reported to be in violation of GPL.
  • 2026 (current): The open-source download page that previously hosted source archives has been removed entirely.

For an EFG owner attempting to exercise their GPL rights today, the channels are:

  • The Ubiquiti support email ([email protected]), which redirects GPL requests to a separate address
  • A specific email for source requests: [email protected]
  • Community forum posts (which historically receive no substantive Ubiquiti response on GPL questions)
  • Third-party archives like github.com/unifi-hackers/unifi-gpl and github.com/CodeFetch/Ubiquiti-UBNT-airOS, which contain partial GPL sources that researchers have extracted from firmware images or obtained through pressure

A formal request for the complete kernel source has been filed via [email protected], the email address Ubiquiti's support team directed users to. The request specifies:

  • The full kernel source tree corresponding to the running kernel version
  • The build configuration (/proc/config.gz)
  • The complete set of patches applied on top of the base kernel
  • The Marvell-specific drivers (octeontx2_pf, octeontx2_vf, octeontx2_af, rvu_*, NIX, CPT, SSO, NPA)
  • Any other GPL components

The request is pending. Ubiquiti's response (or non-response) to this request is itself a data point.

13.2.2 — What we now know is missing

Section 12 documents 6,357 unique kernel symbols in the running EFG kernel that are not present in either vanilla Linux 5.15.72 or the complete published Marvell OCTEON CN9K BSP. These include:

  • Symbols indicating modifications to core netfilter conntrack data structures (nf_ct_ext_dpi_destroy, nf_conntrack_dpi_init)
  • A 116-symbol tdts namespace exposing kernel functions to a closed-source DPI engine
  • HTTP and H.323 application-layer protocol decoders embedded in the kernel
  • A 45-symbol Ubiquiti hardware abstraction layer

For Ubiquiti to be in compliance with GPL-2.0, the source of the changes producing these symbols must be available — at minimum to anyone who has purchased an EFG and exercises their GPL rights to request it.

13.2.3 — The proprietary kernel modules

Section 11 documented the panic trace's Modules linked in list, which included:

xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O) 
xt_dyn_random xt_dpi(O) ui_lcm(O) ubnthal(PO) ubnt_common(PO)

The (PO) taint flag is the kernel's own classification. It means the module is loaded with a MODULE_LICENSE() declaration that is not one of the GPL-compatible strings. The kernel taints itself when such modules are loaded specifically because their continued operation calls into question the kernel's GPL status.

Among these:

  • tdts and t_miner are almost certainly licensed proprietary code from Trend Micro. They register xt_match netfilter hooks and export functions like tdts_shell_dpi_l3_skb. They link directly against GPL kernel symbols (the __kstrtab_* and __ksymtab_* infrastructure exists for this purpose).
  • nf_app, xt_dpi, xt_geoip are likely Ubiquiti's own proprietary netfilter extensions that integrate with the DPI engine.
  • ubnthal, ubnt_common, ui_lcm are Ubiquiti's hardware abstraction layer.

The legal status of these modules is contested in general terms. The specific question for Ubiquiti is: are these modules "derived works" of the kernel? The Free Software Foundation says any kernel module is. Linus Torvalds has historically said it depends on whether the module uses EXPORT_SYMBOL_GPL interfaces and on whether the module has independent existence outside of Linux.

For tdts specifically: Trend Micro markets the underlying technology as portable across operating systems (it runs on Windows, FreeBSD, etc.), which would weigh in favor of "independent existence" under Torvalds's standard. For nf_app, xt_dpi, and ubnthal: these are by name and design Ubiquiti-specific kernel-only modules; they have no plausible existence independent of Ubiquiti's Linux distribution. Under either FSF's or Torvalds's standard, nf_app, xt_dpi, and ubnthal would appear to be derived works of the kernel and therefore subject to GPL.

13.2.4 — The most concerning finding

The closed-source modules link against GPL kernel symbols using EXPORT_SYMBOL and EXPORT_SYMBOL_GPL exports. Some of those exports — particularly conntrack extension registration — were added by Ubiquiti's own kernel patches (per Section 12).

In other words: Ubiquiti modified the kernel (a GPL'd derived work, requiring source release) specifically to add GPL'd interfaces that proprietary modules would link against. Whether this is a GPL violation depends on the resolution of the GPL-vs-proprietary-module question, but it is a structurally significant observation: the proprietary modules and the kernel patches are designed to work together as a single integrated system. The kernel cannot be replaced without breaking the proprietary modules; the proprietary modules cannot run on any other kernel.

That tight integration is what FSF would call "a single program in two pieces" — a derived work. Under that interpretation, the entire firmware would need to be GPL-licensed, and the proprietary modules would be in violation.

13.3 — What This Means for EFG Owners

If you own an EFG, you have a legal right under GPL-2.0 to request the complete source code of the kernel running on your device. That includes:

  • The base kernel source, with full version history
  • All patches applied by Ubiquiti and any third parties
  • The build configuration (.config)
  • Any installation/build scripts necessary to reconstruct the binary
  • The kernel modules whose source is GPL

This right cannot be waived by EULA. If Ubiquiti refuses to provide this source, that refusal is a violation of GPL-2.0 §3, and the appropriate path forward is:

  1. Make a written request to [email protected] specifying the firmware version
  2. If no response within 30 days, escalate to Ubiquiti's legal department
  3. If still no response, contact the Software Freedom Conservancy at [email protected] — they handle GPL enforcement on behalf of multiple Linux kernel copyright holders
  4. The Conservancy can pursue compliance via the kernel-enforcement program

13.4 — Why This Matters Beyond One Vendor

The EFG is a flagship enterprise router from a publicly-traded networking vendor (Ubiquiti, NYSE: UI). It is sold to enterprises, cloud providers, government agencies, and home users. The firmware running on it includes 6,357 kernel symbols that no customer can audit because the source is not published.

Network device firmware is some of the most security-sensitive software in any infrastructure. The kernel running on a firewall or router decides what packets enter and leave the network. Bugs and backdoors in that kernel directly affect every device behind it.

GPL-2.0 was specifically designed to ensure that customers and security researchers can audit the software running on the devices they own. Vendor compliance with the license is not a courtesy — it is a precondition for the trust the GPL ecosystem makes possible.

The findings in this document — that even Marvell's complete public BSP source is insufficient to build modules that work on the EFG, that 6,357 symbols are unique to Ubiquiti's kernel, and that closed-source modules with (PO) taint flags are integrated with the netfilter subsystem — are exactly the kind of findings that demonstrate why GPL compliance is important. The license requires that this kind of analysis be unnecessary, because the source should be available.


14. Direct Vendor Engagement: What Ubiquiti Has Already Been Told

Many of the findings in this document have already been raised with Ubiquiti through their official channels. The vendor's responses are themselves part of the record.

14.1 — The performance issue, raised approximately one year ago

The author of this document opened a support ticket with Ubiquiti approximately one year prior to publication, describing the inter-VLAN performance bottleneck on the EFG and proposing the architectural fix in detail — specifically, recommending that Ubiquiti adopt the DPDK + VPP + Suricata-on-DPDK reference architecture that Marvell themselves publish for the OCTEON CN9K silicon family.

The ticket has not received a substantive engineering response. It remains effectively open without resolution.

This means the central technical recommendation of this document — that the EFG can deliver substantially higher throughput by adopting the dataplane architecture its silicon vendor publishes — was already in Ubiquiti's hands a year ago, with implementation guidance, and was not acted upon.

14.2 — The security architecture, raised through the bounty program

Section 11.1 of this document catalogues the security configuration choices in the EFG's running kernel:

  • module.sig_enforce=0 — modules can be loaded without signature verification
  • CONFIG_MODULE_SIG not set — the kernel was not even built with signing infrastructure
  • No lockdown= argument on the kernel command line — Secure Boot LSM is not engaged
  • CONFIG_SECURITY_LOCKDOWN_LSM not set in the kernel build
  • Overlayfs root filesystem with a writable upper layer — kernel-loadable code can be persisted
  • kernel.kptr_restrict=0 — the full kallsyms table with virtual addresses is exposed

Combined with the kernel's CONFIG_MODVERSIONS=N setting (Section 11.4), this means: any process with CAP_SYS_MODULE (root, including any context that escalates to root) can load arbitrary kernel code, and there is no in-kernel mechanism to detect or prevent that loading. The watchdog will reboot the device on a kernel panic, but a successfully-loaded malicious module that doesn't crash the kernel would persist indefinitely.

Separately, the author identified additional security findings on the EFG — notably the presence of private cryptographic key material accessible via the firmware image (per the *_FW_KEY strings observed in Section 12.6's symbol analysis, alongside other findings not detailed here for responsible disclosure reasons).

These findings were submitted through Ubiquiti's HackerOne bug bounty program — the formal, documented channel for security disclosure to the vendor.

Ubiquiti rejected the submission. The stated reason: the attacker would require network access to exploit the issue.

This rationale does not survive scrutiny when applied to a network gateway:

  • A network gateway is, by definition, on the network. Network access to the device is the universal precondition for any attack against it.
  • The threat model that a security-conscious gateway is designed to defend against is precisely "an attacker who has gained network access" — whether that's a compromised endpoint behind the gateway, a hostile guest device on the same VLAN, or an internal lateral-movement scenario in an enterprise breach.
  • Gateway vendors with mature security postures (Cisco, Juniper, Palo Alto, Fortinet, Arista, etc.) routinely accept and remediate vulnerabilities under this threat model. CVEs against these products list "network adjacent" or "network reachable" as the qualifying attack vector, not a disqualifying one.
  • The official CVSS v3.1 scoring system explicitly defines "Adjacent Network" (AV:A) and "Network" (AV:N) as valid attack vectors. A vendor declining to engage with vulnerabilities in those classes is declining to engage with most of the vulnerability landscape for their product category.

The rejection is therefore not just a technical disagreement — it is a stated position on what kinds of attacks Ubiquiti considers in scope for their bounty program. By that stated standard, an attacker who has already established a foothold on the network behind the EFG is not a threat the EFG considers itself responsible for defending against. That is an unusual posture for a $2,000 device sold and marketed as an enterprise security gateway.

14.3 — The pattern this establishes

Putting these data points together with the GPL findings in Section 13:

Issue raised Channel Year Vendor response
Inter-VLAN performance, with DPDK fix recommendation Standard support ~1 year ago No substantive engineering response
Security configuration / private key exposure HackerOne bug bounty Recent Rejected: "requires network access"
GPL kernel source release Email to [email protected] Pending Pending
GPL kernel source release Public web page Historical Page removed

The historical context is also relevant: Ubiquiti was publicly accused of GPL violations in 2015 and again in 2019, and the pattern has continued.

The findings in this document are not surprising vendor disclosures. They are issues that engineering, security, and licensing teams within the vendor have either been told about or are demonstrably aware of and have chosen not to act on. The reason this document exists in public form is that the channels designed for these conversations — support tickets, bug bounty programs, GPL compliance contacts — have not produced action.


15. Conclusion

This investigation began as a performance analysis: why does a $2,000 enterprise router with two 25 GbE SFP28 ports deliver only ~1 Gbps of single-stream inter-VLAN throughput, and ~3 Gbps of single-stream PPPoE WAN throughput? The lab data is unambiguous. The bottlenecks are software-architectural choices, not hardware limitations:

  1. The kernel network stack on a single core has a ~5 Gbps single-stream ceiling when offloads are off, regardless of CPU vendor.
  2. Hardware offloads are disabled by default on the EFG. Enabling them is a 4-7× improvement on otherwise-identical configurations.
  3. The 5-deep iptables FORWARD chain pattern the EFG ships with costs roughly half of single-stream throughput when offloads are also off.
  4. nftables flowtable — a kernel feature available since Linux 4.16, shipped enabled by every major distribution, is not even compiled into the EFG's kernel. Adding it gives 3-7× single-stream improvement.
  5. DPDK + VPP on the same silicon — using software stacks that Marvell themselves publish — would deliver 15-25× the throughput. The Cortex-A72-class cores in the Octeon CN9670 can sustain 6-12 Gbps per core in a userspace dataplane. The chip has 18 of those cores.
  6. PPPoE forwarding is single-cored in stock Linux because of how ppp_generic is structured. The fix exists in DPDK and was being upstreamed at time of writing.

These are not exotic or research-grade fixes. Three of them are configuration changes. One requires loading a kernel module that's already in mainline. The most architecturally significant — DPDK + VPP — uses Marvell's own published reference architecture. The hardware was designed for this; the firmware just doesn't use it.

The conntrack helper toggle Ubiquiti recently shipped in the UniFi controller (Section 9 Finding 5, Section 10 Fix 7) is informative beyond its narrow effect. It exposes the FTP/H.323/SIP/PPTP/TFTP helpers as administrator-controllable. The toggle's existence proves Ubiquiti's engineering team is actively reasoning about per-flow netfilter overhead — they identified that helpers cost something, and shipped a workaround to let users disable them. They did not ship the proper fix, which is the kernel's flowtable infrastructure, even though the proper fix would address every architectural finding in this document and the partial fix addresses only one. That is a choice, not an oversight.

Section 11 documented our attempt to apply the most surgical of these fixes — adding the missing nftables flowtable kernel modules — to a real production EFG. Two builds were attempted:

  • Vanilla Linux 5.15.72 from kernel.org → byte-perfect vermagic match → kernel panic at nf_tables_init_net+0x18
  • Marvell's complete published OCTEON BSP source (linux-yocto branch v5.15/standard/cn-sdkv5.15/octeon) → byte-perfect vermagic match → kernel panic at the identical instruction

The fact that both crashes occurred at the same function offset proves that the ABI mismatch is not introduced by Marvell's BSP patches. It is introduced by something Ubiquiti has applied on top of Marvell's BSP — patches Ubiquiti has not published.

Section 12 quantified that delta: 6,357 kernel symbols exist in the running EFG kernel that are present in neither vanilla Linux 5.15.72 nor Marvell's complete public BSP. Approximately 1 in 19 symbols in the EFG's kernel is unique to Ubiquiti's build and not derivable from any public source. These include:

  • Conntrack extension types for proprietary DPI integration (nf_ct_ext_dpi_destroy, nf_conntrack_dpi_init)
  • A 116-symbol tdts namespace exposing kernel internals to a closed-source Trend Micro DPI engine
  • HTTP and H.323 application-layer protocol decoders running in kernel space
  • A 45-symbol Ubiquiti hardware abstraction layer

Section 13 addressed what these findings mean for GPL-2.0 compliance:

  • Ubiquiti has shipped a substantially modified Linux kernel without publishing the corresponding source
  • The proprietary kernel modules tdts, t_miner, nf_app, xt_dpi, ubnthal, and ubnt_common link against GPL kernel symbols and operate as integrated components of the running kernel
  • Specifically, nf_app, xt_dpi, and ubnthal have no existence independent of Ubiquiti's Linux integration and would be derived works under either FSF's or Linus Torvalds's interpretation of the GPL
  • Ubiquiti's open-source download page has been removed; their GitHub presence does not contain firmware sources
  • This continues a documented pattern — Ubiquiti was publicly accused of GPL violations in 2015 (resolved only after sustained pressure) and again in 2019
  • A formal request has been filed via the channel Ubiquiti's support team specified

The GPL exists specifically so that customers can audit and modify the software running on devices they own. The fact that this analysis required reverse-engineering kernel symbol tables from a binary firmware image — when the GPL requires the source be available on request — is itself the finding.

Section 14 documented direct vendor engagement: a performance ticket open with Ubiquiti for approximately one year recommending the DPDK fix (no substantive engineering response), a security disclosure submitted through Ubiquiti's HackerOne bug bounty program (rejected on the grounds that exploitation requires network access — a position that does not survive scrutiny when applied to a network gateway), and the GPL request now pending. The findings in this document are not novel disclosures to the vendor; they are issues the vendor has been told about, through the channels designed for these conversations, and has chosen not to act on.

What enterprise customers should ask Ubiquiti

If you are evaluating or already operating EFG/UDM/UXG hardware, the questions to put to your Ubiquiti account team are:

  1. Performance: When will inter-VLAN single-stream throughput on the EFG match the marketed 25 GbE port speeds for normal enterprise workloads (TCP, MTU 1500, with stateful firewall rules)?
  2. Roadmap: Does Ubiquiti's roadmap include adopting DPDK-based dataplanes (which Marvell's reference architecture for this silicon recommends and supports)?
  3. Configuration: Will Ubiquiti expose nftables flowtable, hardware offload, and conntrack helper toggles as administrator-controllable settings before any DPDK migration?
  4. GPL compliance: Will Ubiquiti publish the complete kernel source corresponding to current EFG firmware versions, including all patches, build configuration, and the source of nf_app, xt_dpi, ubnthal, and ubnt_common?

The first three are about getting the performance you paid for. The fourth is about knowing what's running on your network.

What home and prosumer users should know

The EFG, UDM-Pro-Max, UXG-Lite, UXG-Pro, and other Ubiquiti gateways share substantial portions of this kernel and firmware design. The performance characteristics documented here for the EFG are likely to apply, with proportional differences in absolute numbers, across the product line.

If your home or small-office workload is dominated by single-stream throughput (a single VPN tunnel, a single large file transfer, a single backup job), you are likely bottlenecked by the issues described above, regardless of how fast your internet connection or LAN switch is.

The most impactful workaround available without firmware changes is to enable hardware offloads where Ubiquiti's UI exposes the toggle. Beyond that, the architectural fix is in Ubiquiti's hands.

16. Appendix: Full Data Sets

A.1 — Complete Test Matrix

# NIC Forwarder MTU Offloads Rules Single-stream Notes
1 virtio kernel 9000 on none 16.9 Gbps naïve baseline
2 virtio kernel 9000 off none 17.2 Gbps jumbo hides per-packet cost
3 virtio kernel 1500 off none 4.95 Gbps EFG-realistic baseline; 1 core 100% soft
4 virtio kernel 1500 off + ct module 4.84 Gbps trivial overhead
5 virtio kernel 1500 off + simple ct rule 4.64 Gbps 4% drop
6 virtio kernel 1500 off EFG 5-chain replica 2.36 Gbps smoking gun
7 virtio kernel 1500 off EFG (8 streams) 11.4 Gbps agg scales with cores
A virtio kernel 1500 off flowtable 7.05 Gbps flowtable alone, 3× over EFG
B virtio kernel 1500 on flowtable 17.4 Gbps one-line config improvement
K1 ConnectX VF kernel 1500 on none 25.3 Gbps real silicon baseline
K2 ConnectX VF kernel 1500 on EFG 5-chain 21.1 Gbps GRO hides per-packet cost
K3 ConnectX VF kernel 1500 off none 4.74 Gbps matches virtio with offloads off
K4 ConnectX VF kernel 1500 off EFG 5-chain 4.70 Gbps I/O is the bottleneck here
V0 virtio VPP/DPDK 1500 off n/a 6.78 Gbps DPDK with virtio-pmd; bottlenecked by vhost-net
V1 ConnectX VF VPP/DPDK 1500 client off n/a 15.7 Gbps wire-packet processing
V2 ConnectX VF VPP/DPDK 1500 client on n/a 35.6 Gbps headline number

A.2 — EFG-Replica nftables Ruleset

#!/usr/sbin/nft -f
flush ruleset

table inet filter {
    chain alien_chain {
        counter
        ip protocol tcp counter
        ip saddr 10.0.0.0/8 counter
    }
    chain tor_chain {
        counter
        ip protocol tcp counter
        tcp flags & (syn|ack) == ack counter
    }
    chain ips_chain {
        counter
        ip protocol tcp counter
        meta l4proto tcp counter
        tcp dport { 1-65535 } counter
    }
    chain ubios_chain {
        counter
        ip protocol tcp counter
        ct state established counter
    }
    chain user_chain {
        counter
        ct state established,related counter
        ip saddr 10.10.10.0/24 ip daddr 10.10.20.0/24 counter
    }

    chain forward {
        type filter hook forward priority 0; policy accept;
        jump alien_chain
        jump tor_chain
        jump ips_chain
        jump ubios_chain
        jump user_chain
    }
}

table ip nat {
    chain postrouting {
        type nat hook postrouting priority 100;
        oifname "enp6s18" masquerade
    }
}

A.3 — flowtable Configuration

#!/usr/sbin/nft -f
flush ruleset

table inet filter {
    flowtable f {
        hook ingress priority 0
        devices = { enp6s19, enp6s20 }
    }

    chain forward {
        type filter hook forward priority 0; policy accept;
        ip protocol { tcp, udp } flow add @f
        ct state established,related accept
    }
}

table ip nat {
    chain postrouting {
        type nat hook postrouting priority 100;
        oifname "enp6s18" masquerade
    }
}

A.4 — VPP startup.conf (ConnectX-6 Dx)

unix {
    nodaemon
    log /var/log/vpp/vpp.log
    full-coredump
    cli-listen /run/vpp/cli.sock
    gid vpp
}

api-trace { on }
api-segment { gid vpp }
socksvr { default }

cpu {
    main-core 0
    corelist-workers 1
}

buffers {
    buffers-per-numa 32768
    default data-size 2048
}

dpdk {
    dev 0000:01:00.0 {
        name lab-vlan10
        num-rx-queues 1
        num-tx-queues 1
    }
    dev 0000:02:00.0 {
        name lab-vlan20
        num-rx-queues 1
        num-tx-queues 1
    }
}

plugins {
    plugin default { enable }
    plugin dpdk_plugin.so { enable }
}

A.5 — VPP show runtime (during V2 test, 35.6 Gbps)

Thread 1 vpp_wk_0 (lcore 1)
Time 257.0, vector rate 3.5586e5 in/out, packets/sec
              Name           Calls       Vectors    Packet-Clocks  Vectors/Call
dpdk-input    polling   2683609353    91446442         4.25e3         .03
ethernet-input  active   12518445    91446442         9.41e1        7.30
ip4-input-no-checksum    12136093    91446437         3.98e1        7.54
ip4-lookup     active   12136093    91446437         5.23e1        7.54
ip4-rewrite    active   12136093    91446437         3.86e1        7.54
lab-vlan20-output active 10310280   89229310         1.21e1        8.65
lab-vlan20-tx  active   10310280    89229310         3.79e1        8.65

VPP per-packet end-to-end cost on Zen 4: ~80 cycles (ethernet-input + ip4-input + ip4-lookup + ip4-rewrite + interface-output + tx) ≈ 16 nanoseconds per packet at 5 GHz. Theoretical ceiling on this pipeline: ~700+ Gbps single-core.

A.6 — EFG Live Diagnostics (representative excerpts)

$ uname -a
Linux EFG-Home-SP 5.15.72-ui-cn9670 #5.15.72 SMP Wed Apr 15 23:39:47 CST 2026 aarch64

$ iptables -L FORWARD -n -v --line-numbers
Chain FORWARD (policy ACCEPT)
1     555K  775M   ALIEN
2    2764K 4489M   TOR
3     238M  354G   IPS
4     874M 1342G   UBIOS_FORWARD_JUMP

$ nft list flowtables
[empty]

$ lsmod | grep nf_flow_table
[empty]

$ ps -eo pid,pcpu,comm --sort=-pcpu | head -8
4098469 39.6 dpi-flow-stats
   3139 12.5 ubios-udapi-ser
  66687  7.8 java
   4891  7.0 conntrackd
2491041  6.9 Suricata-Main
   5505  6.2 mcad
   8596  3.9 unifi-core

$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 10485760

$ lsmod | grep nf_conntrack | grep -v '^nf_conntrack '
nf_conntrack_tftp     262144  1 nf_nat_tftp
nf_conntrack_pptp     327680  1 nf_nat_pptp
nf_conntrack_h323     327680  1 nf_nat_h323
nf_conntrack_ftp      327680  1 nf_nat_ftp

A.7 — Module Build Artifacts (Experiment in Section 11)

Cross-compilation environment:

  • Host: Threadripper Pro 7995WX, Ubuntu 24.04 LTS VM, 16 vCPU, 32 GB RAM
  • Toolchain: gcc-10-aarch64-linux-gnu 10.5.0 from Ubuntu universe repo
  • Kernel source: linux-5.15.72.tar.xz from kernel.org (verified SHA256)
  • Build configuration: EFG's exposed /proc/config.gz plus three module enables for NF_TABLES, NF_FLOW_TABLE, NF_FLOW_TABLE_INET
  • LOCALVERSION: -ui-cn9670 (matching the EFG's published version string)
  • Build time: 1 minute 52 seconds (16-thread parallel build)

Modules produced:

net/netfilter/nf_tables.ko          (10.3 MB)
net/netfilter/nf_flow_table.ko       (1.8 MB)
net/netfilter/nf_flow_table_inet.ko  (495 KB)

Vermagic verification (build host):

$ for ko in nf_tables.ko nf_flow_table.ko nf_flow_table_inet.ko; do
    strings $ko | grep ^vermagic
done
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64

Vermagic verification (EFG, in-tree module):

$ modinfo nf_conntrack_ftp | grep vermagic
vermagic: 5.15.72-ui-cn9670 SMP mod_unload aarch64

Match: exact, character-for-character.

Kernel panic on load attempt (insmod ./nf_tables.ko):

Unable to handle kernel NULL pointer dereference at virtual address 0x0000000000000120
ESR = 0x96000005, EC = 0x25: DABT (current EL), IL = 32 bits
FSC = 0x05: level 1 translation fault
[0000000000000120] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] SMP
Code: 910003fd b9432021 f9000bf3 f9455400 (f8615813)
Kernel panic — not syncing: Oops: Fatal exception

Recovery: watchdog hard-reboot, ~2 minute downtime, no permanent damage. Failover to secondary gateway functioned correctly throughout.

Root cause: CONFIG_MODVERSIONS is disabled in the EFG's kernel config, so symbol-CRC verification did not catch the binary ABI mismatch between vanilla 5.15.72 and Ubiquiti's patched 5.15.72-ui-cn9670 build at module load time. The module linked successfully against the running kernel but encountered mismatched struct layouts during init, dereferencing a NULL pointer in the netfilter subsystem.

GPL source request status: filed with [email protected] requesting the complete corresponding source code for kernel 5.15.72-ui-cn9670, including all Ubiquiti and Marvell patches, build configuration, toolchain version, and packaging scripts. Outcome will determine whether the experiment can be re-attempted with a kernel tree that produces ABI-compatible modules.


All measurements were taken on a single physical machine over a continuous test session. Configuration files, scripts, and raw iperf3 outputs are available on request.

A.8 — BSP Build Artifacts (Section 11.5–11.6)

Build environment (same VM as A.7):

  • Ubuntu 24.04 LTS, 16 vCPU, 32 GB RAM
  • gcc-10-aarch64-linux-gnu 10.5.0
  • linux-yocto repository, branch v5.15/standard/cn-sdkv5.15/octeon
  • Repository URL: https://git.yoctoproject.org/linux-yocto.git

Tree state:

$ git branch --show-current
v5.15/standard/cn-sdkv5.15/octeon

$ git log --oneline -3
7f33f19a49e6 (HEAD) Merge branch 'v5.15/standard/base' into v5.15/standard/cn-sdkv5.15/octeon
65333c3a0bcd Merge tag 'v5.15.203' into v5.15/standard/base
b9d57c40a767 Linux 5.15.203

Modifications to make HEAD identify as 5.15.72:

$ sed -i 's/^SUBLEVEL = .*/SUBLEVEL = 72/' Makefile
$ touch .scmversion   # suppress dirty marker
$ make kernelrelease
5.15.72-ui-cn9670

Configuration (using EFG's /proc/config.gz as base):

CONFIG_LOCALVERSION="-ui-cn9670"
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NF_TABLES_IPV6=y
CONFIG_NF_FLOW_TABLE=m
CONFIG_NF_FLOW_TABLE_INET=m
CONFIG_NF_FLOW_TABLE_IPV4=m
CONFIG_NF_FLOW_TABLE_IPV6=m
CONFIG_NF_FLOW_TABLE_PROCFS=y
# CONFIG_DEBUG_INFO_BTF is not set
# CONFIG_MODULE_SIG is not set

Build output:

$ time make -j16
real    1m59s
user    23m50s
sys     4m33s

$ for ko in $(find . -name 'nf_tables.ko' -o -name 'nf_flow_table*.ko' | sort); do
    echo "=== $(basename $ko) ==="
    strings $ko | grep -E '^(vermagic|name|depends|description)='
  done

=== nf_flow_table_ipv4.ko ===
description=Netfilter flow table support
depends=nf_flow_table,nf_tables
name=nf_flow_table_ipv4
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64

=== nf_flow_table_ipv6.ko ===
description=Netfilter flow table IPv6 module
depends=nf_flow_table,nf_tables
name=nf_flow_table_ipv6
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64

=== nf_flow_table.ko ===
description=Netfilter flow table module
depends=
name=nf_flow_table
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64

=== nf_flow_table_inet.ko ===
description=Netfilter flow table mixed IPv4/IPv6 module
depends=nf_flow_table,nf_tables
name=nf_flow_table_inet
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64

=== nf_tables.ko ===
depends=
name=nf_tables
vermagic=5.15.72-ui-cn9670 SMP mod_unload aarch64

Crash trace from EFG load attempt:

[ 3368.013405] Unable to handle kernel NULL pointer dereference at virtual address 0
[ 3368.022216] Mem abort info:
[ 3368.025005]   ESR = 0x96000005
[ 3368.028072]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 3368.033402]   FSC = 0x05: level 1 translation fault
[ 3368.074382] Modules linked in: nf_tables(+) wireguard libchacha20poly1305 ...
                xt_geoip(O) nf_app(PO) t_miner(PO) tdts(PO) tm_crypto(O)
                xt_dyn_random ip6table_nat xt_conntrack xt_connmark xt_TCPMSS pppoe
                pppox bonding xt_dpi(O) ip6table_mangle iptable_mangle ip6table_filter
                ip6_tables uio_pdrv_genirq ui_lcm(O) ifb ppp_generic slhc
                ubnthal(PO) ubnt_common(PO) drm drm_panel_orientation_quirks
[ 3368.121977] CPU: 3 PID: 211748 Comm: insmod Tainted: P W O 5.15.72-ui-cn9670 #5.15.72
[ 3368.130936] Hardware name: Marvell OcteonTX CN96XX board (DT)
[ 3368.143638] pc : nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.149059] lr : ops_init+0x3c/0x120
[ 3368.227314] x2 : ffff00019027b300 x1 : 0000000000000000 x0 : 0000000000000000
[ 3368.234825]  nf_tables_init_net+0x18/0x94 [nf_tables]
[ 3368.238053]  ops_init+0x3c/0x120
[ 3368.242840]  register_pernet_operations+0xec/0x240
[ 3368.247195]  register_pernet_subsys+0x2c/0x50
[ 3368.252609]  nf_tables_module_init+0x24/0x100 [nf_tables]
[ 3368.297899] ---[ end trace d3e1e407900e8e95 ]---
[ 3368.316500] Kernel panic - not syncing: Oops: Fatal exception

The HA failover handled the brief outage; service downtime was approximately 8 seconds.

A.9 — Symbol Comparison Methodology (Section 12)

# Step 1: Extract EFG kernel image (already gzip-compressed PE/COFF aarch64 image)
# from EFG: /boot/vmlinuz-5.15.72-ui-cn9670 (12 MB)
$ gunzip -c /boot/vmlinuz-5.15.72-ui-cn9670 > efg-vmlinuz
$ binwalk efg-vmlinuz | head -3
0    0x0    Linux kernel ARM64 image, image size: 29818880 bytes

# Step 2: Capture running symbol table (kallsyms is unrestricted on EFG)
# from EFG:
$ cat /proc/kallsyms > /tmp/efg-kallsyms.txt
$ wc -l /tmp/efg-kallsyms.txt
130789

# Step 3: Build vanilla 5.15.72 vmlinux (full build, not just modules)
$ cd ~/efg-build/vanilla-5.15.72/linux-5.15.72
$ make -j16 vmlinux

# Step 4: BSP vmlinux (already built for module experiment in 11.5)

# Step 5: Three-way symbol comparison
$ awk '{print $3}' /tmp/efg-kallsyms.txt | sort -u > /tmp/efg-syms.txt
$ nm ~/efg-build/marvell-bsp/linux-yocto-cnxk-5.15/vmlinux 2>/dev/null \
    | awk '{print $3}' | sort -u > /tmp/bsp-syms.txt
$ nm ~/efg-build/vanilla-5.15.72/linux-5.15.72/vmlinux 2>/dev/null \
    | awk '{print $3}' | sort -u > /tmp/vanilla-syms.txt

$ wc -l /tmp/*-syms.txt
 115998 /tmp/bsp-syms.txt
 120399 /tmp/efg-syms.txt
 112581 /tmp/vanilla-syms.txt

# Step 6: Find symbols in EFG kernel but not in either public source
$ comm -23 /tmp/efg-syms.txt \
    <(sort -u /tmp/vanilla-syms.txt /tmp/bsp-syms.txt) \
    | grep -vE "^(\.L[0-9]+|\.LC[0-9]+|\.LBE|\.LFE|\.LFB|\.Letext|\.Ldebug|\.Lframe|__compound_literal\.|__func__\.|__warned\.|CSWTCH\.)" \
    > /tmp/efg-unique-real-syms.txt

$ wc -l /tmp/efg-unique-real-syms.txt
6357

Filter rationale: The grep -vE pattern excludes compiler-generated local labels (.L<N>, .LC<N>, .LBE<N>, etc.) which differ across every build of every kernel and carry no information about kernel structure. The remaining 6,357 symbols are real exported names, function names, and global variable names.

Top-level breakdown by name prefix:

$ awk -F'_' '{print $1}' /tmp/efg-unique-real-syms.txt | grep -v "^\." \
    | sort | uniq -c | sort -rn | head -20

   2646 (no prefix or various)
    799 drm
    195 bond
    116 tdts
    113 wg
    104 my
     66 fsv
     59 ppp
     51 mlxsw
     46 shell
     45 ubnthal
     44 proc
     44 get
     42 dev
     42 bonding
     33 tm
     32 nf
     30 tcp
     29 pppoe
     27 ppu

Note: the drm count includes graphics driver code that may have come from a different source than vanilla or BSP (Ubiquiti uses Mediatek display panel for the EFG's front-panel LCD). The wg (WireGuard) count likely reflects an upstream backport. The tdts, tm, ubnthal, nf*dpi* numbers are the diagnostic ones.

A.10 — GPL Source Request

The following text was sent to [email protected]:

Subject: GPL Source Request — Enterprise Fortress Gateway (EFG) Kernel Source

I am the owner of an Ubiquiti Enterprise Fortress Gateway (EFG) running firmware version [version], with kernel version 5.15.72-ui-cn9670. Per the terms of GPL-2.0, I am formally requesting the complete corresponding source code for this firmware's GPL-licensed components, including but not limited to:

  1. The complete Linux kernel source tree corresponding to 5.15.72-ui-cn9670, including:
    • The base kernel source
    • All patches applied by Ubiquiti and any third parties (Marvell, Trend Micro, etc.)
    • The kernel build configuration (.config)
  2. The Marvell OCTEON CN9670 BSP drivers (octeontx2_pf, octeontx2_vf, octeontx2_af, rvu_*, NIX, CPT, SSO, NPA)
  3. Source code for any GPL-licensed kernel modules including those tagged with the GPL/GPL-compatible MODULE_LICENSE() declarations
  4. The device tree files (.dts, .dtsi) used by the firmware
  5. The build system, packaging recipes, and toolchain specification (compiler version, flags) sufficient to reproduce the binary
  6. Any other GPL components in the firmware (busybox, systemd, etc.)

Per GPL-2.0 §3, this source must be made available under the same license, in a form accessible to me. Acceptable delivery: a downloadable archive, a public git repository link, or physical media at cost.

[contact details]

The escalation path documented in Section 13.3 applies if no response is received.

@ftmiranda
Copy link
Copy Markdown

Fantastic summary ! I wish Ubiquiti engineering and product management would actually READ all this!

@mmx01
Copy link
Copy Markdown

mmx01 commented May 2, 2026

Kernel change is probably not viable, but are other settings customizable? Since they reside in software can we tune/change the behavior to at least persist reboots?

I was thinking about Beast for inter-vlan routing in exactly same scenario - large file transfers for storage/video processing/rendering etc. Not IOPS heavy but raw transfer throughput counts.

@joydashy
Copy link
Copy Markdown

joydashy commented May 2, 2026

Thanks for the writeup! I really wonder why Ubiquiti is hampering their hardware like this.

@galvesribeiro
Copy link
Copy Markdown
Author

Fantastic summary ! I wish Ubiquiti engineering and product management would actually READ all this!

They have this all in MUCH more details. I had a 10+ month old support ticket where I shared multiple documents like this, including the implementation of DPDK. They don't do because, well, they don't want.

@galvesribeiro
Copy link
Copy Markdown
Author

Kernel change is probably not viable, but are other settings customizable? Since they reside in software can we tune/change the behavior to at least persist reboots?

Why not? They have the kernel from Marvell. They can recompile and update it with an UnifiOS update. As a matter of fact the current kernel IS custom built. Also no, we can not make the changes because most of them depends on kernel modules and kernel options where explicitly disabled by them.

I was thinking about Beast for inter-vlan routing in exactly same scenario - large file transfers for storage/video processing/rendering etc. Not IOPS heavy but raw transfer throughput counts.

Yes, single stream is used on large storage replication, databases with batch/heavy workloads, file transfers with NAS, you name it. All the support tell us is "run iperf3 with multiple streams". Well, I did it as you saw on this post but, they do not show anywhere near the numbers even on that case.

They've choose to be slow. That is what is mind blowing.

@mmx01
Copy link
Copy Markdown

mmx01 commented May 2, 2026

Why not? They have the kernel from Marvell. They can recompile and update it with an UnifiOS update. As a matter of fact the current kernel IS custom built. Also no, we can not make the changes because most of them depends on kernel modules and kernel options where explicitly disabled by them.

Not speaking of them, speaking in case they do nothing, what is possible for us? Can we push this hardware on our own ? To a point we could compile/load modules even on ESXI, even while not natively supported. I think I know the answer but... shame. Not a lot of options for that bandwidth, this is not cheap either. One would expect more for a reason.

@galvesribeiro
Copy link
Copy Markdown
Author

Why not? They have the kernel from Marvell. They can recompile and update it with an UnifiOS update. As a matter of fact the current kernel IS custom built. Also no, we can not make the changes because most of them depends on kernel modules and kernel options where explicitly disabled by them.

Not speaking of them, speaking in case they do nothing, what is possible for us.

Oh! There is nothing we can do unfortunately.

First because the filesystem is built on an overlay, and parts are either read-only or ephemeral and will be overwritten at startup.

Second that we would have to have a way to overwrite that filesystem with a custom built kernel that has the proper device tree for the SoC compiled, along with the drivers and modules on it. Doable with a lot of work on the devices which have the OS on the NVMe, but big amount of work. Not to mention that every single update we would need to redo it again. Hard to maintain something like that.

@mmx01
Copy link
Copy Markdown

mmx01 commented May 2, 2026

Sadly I touched a bit on the device trees for embedded devices with DR9074-6E where without fusing modules defaulted to 5G over 6G while actually being tri-band. Also IPQ6018, very quickly we go to NDA zone for any docs/toolchain.

Maybe someone will figure a way to get external i2c flash as boot device but still at this price point we should not even go there. Thanks for your work, saves quite some $$$$

@galvesribeiro
Copy link
Copy Markdown
Author

galvesribeiro commented May 2, 2026

The kernel source is under GPL so anything they tie to it should be OSS's. Ironically I went to search for the kernel download on their site and it is not there anymore and I found out this 10yr old issue which people got stuck in "under processing" reply from UI-Team for an year: https://community.ui.com/questions/Requesting-kernel-source-code/e63d2c10-e1a9-45ce-b8cf-859670fcf216?reply=16

Now they have this on their site:

Certain releases are no longer available due to security and/or regulatory requirements. We always recommend running the latest software to ensure optimal network performance and security. If you require an unlisted release, please contact Ubiquiti Support.

I just did another check on the EFG and pointed some gates which potentially could be a blocker for loading custom modules after your suggestion but for my surprise, it seems it is possible:

  • Kernel config exposed: ✅ OPEN
    /proc/config.gz works. We have the exact kernel build configuration.
  • Module signing: ✅ OPEN
    sig_enforce = 0. Unsigned modules will load. This is the killer feature.
  • Lockdown: ✅ OPEN
    /sys/kernel/security/lockdown doesn't exist (the file would have shown a value if lockdown was active). Kernel is not in lockdown mode.
  • Filesystem: ✅ WORKABLE
    Root is overlayfs with /mnt/.rwfs/data as upperdir, mounted rw. You can write to /lib/modules/... and changes persist in the overlay's /persistent partition. They survive reboots until firmware update.
  • Boot chain: ⚠️ PARTIALLY GATED
    ATF (ARM Trusted Firmware) and U-Boot are present, with TrustZone visible (Trusted OS resident on physical CPU 0x0). This means kernel/initrd images are likely signature-verified at boot. But we don't need to touch that — the kernel is already running and accepts unsigned modules. We don't need to replace the kernel; we just need to add modules to it.
  • Required modules: ❌ NOT SHIPPED
    nf_flow_table.ko is not in /lib/modules. So we have to build it ourselves.
  • Toolchain: ⚠️ PARTIAL
    make is present. No gcc or cc. We'd need to cross-compile on another machine.

So it is possible to get this to work if we get the kernel source. I guess I need to open a support ticket.

@galvesribeiro
Copy link
Copy Markdown
Author

Let's see where do we get. Will check the code once I get my hands on it and see what I can do.

image

@scyto
Copy link
Copy Markdown

scyto commented May 3, 2026

nice, just a note on my EFG my testing was able to get 6Gbps from LAN <> VLAN using iperf3 (but only multiple streams), this was back when EFG released in 2024, havent tried recently and at the time i raised it with by support and they said by design and tried to palm me off with some nonsense reason, i let it go as i chose to run a flat LAN

but i love this analysis, thanks for doing it

I have a beast, i am a little unsure how to repo on that.... i forgot what i did 2 years ago....

I just asked support to reopen my enterprise support ticket from 2 years ago on exactly this issue and provided this gist

@galvesribeiro
Copy link
Copy Markdown
Author

I've just updated the post with sections 11 onward. I tried to build from the vanilla kernel and the Marvell BSP the modules but it didn't worked and kernel panic. This led me to a surprise - Ubiquiti made what appear to be GPL violating changes and they haven't made the code available.

@mmx01
Copy link
Copy Markdown

mmx01 commented May 3, 2026

Really interesting and actually seems viable in the end just at varying level of difficulty!

Persistence between reboots would be already good enough, we could live through annoyance of mods after FW updates even script / automate them and no one reasonable should have that on auto :)

I was not successful with similar OSS issue for qualcomm as I mentioned, they publish bits and pieces pretending to comply with GPL. Some documentation you can even compile things successfully but suddenly there is no firmware blob for this or that - mainly drivers/modules for hardware accel features. That was for openwrt adoption of certain qualcomm devices. Limited nss back port was the best outcome so far or no hw acceleration at all. This or years old qsdk... qdte for device tree configs etc.

Staying tuned on unifi's feedback about this.

@galvesribeiro
Copy link
Copy Markdown
Author

Well, I'm not trying to replace the kernel itself, so at runtime the blobs are not that important. If we get the source with their patches (which we should given the reasons I explained on the post update) then this should be enough to produce the modules which are ABI compatible.

@scyto
Copy link
Copy Markdown

scyto commented May 3, 2026

the allow arbitrary modules to be loaded is a little concerning - an attacker could use you approach to crash the box and find a way to inject malicious code without a crash eventually, no? (i mean great it allowed your analysis, but that needs to be fixed too, right?)

@galvesribeiro
Copy link
Copy Markdown
Author

the allow arbitrary modules to be loaded is a little concerning - an attacker could use you approach to crash the box and find a way to inject malicious code without a crash eventually, no? (i mean great it allowed your analysis, but that needs to be fixed too, right?)

Oh yeah absolutely. But I tried to report similar security issues on their disclosure program before and all I was said was "it is not a problem - the attacker would need SSH access to the device, so we are not rewarding this one" so I gave up.

At least for the purposes of the research it was a good thing but it is really concerning that the most secure thing on your network doesnt have secure boot nor validate kernel module signatures and the whole FS, although using an overlay, is fully RW even for the sensitive parts.

@mmx01
Copy link
Copy Markdown

mmx01 commented May 3, 2026

Let's not push them before we could achieve anything for this sweet project :)

@galvesribeiro
Copy link
Copy Markdown
Author

galvesribeiro commented May 3, 2026 via email

@mmx01
Copy link
Copy Markdown

mmx01 commented May 3, 2026

Interesting if anyone put out any debug out of beast yet? kernel, options? probably not too many around.

@scyto
Copy link
Copy Markdown

scyto commented May 3, 2026

on the un-needed contrack that can't be turned off, i knew something was bothering me about that.....

did you do the test by removing them here and see they were still enabled?

seems to work for me (this is on Beast FW 5.1.10)

root@Home:~# lsmod | grep nf_conntrack | grep -v '^nf_conntrack '
root@Home:~# 

though i note the contrack count still has a count - so there must be more than these modules?

root@Home:~# sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 1048576
root@Home:~# sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 2090

this in the settings tab of the gateways properties (i think they moved it here in the last 6mo, it used to be the old firewall settings IIRC and as a professional product manager no i have no clue why they would move firewall settings here vs keeping them, ya know, in the firewall ui, smh)

image

@galvesribeiro
Copy link
Copy Markdown
Author

@scyto "To Cesar what is for Cesar" - Indeed there is it. This is so hidden in a weird spot as you said that I havent found it. Indeed disabling this make the module go away. I've updated the post with it. Thank you!

@galvesribeiro
Copy link
Copy Markdown
Author

I also added some mentions about their Hacker bounty program on HackerOne and my previous experience with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment