Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save AnnoyingTechnology/94599c863f8a8bdf8f0458080375c8ea to your computer and use it in GitHub Desktop.

Select an option

Save AnnoyingTechnology/94599c863f8a8bdf8f0458080375c8ea to your computer and use it in GitHub Desktop.
Kernel panic on Linux 7.1 — root cause: MediaTek MT7925 Wi‑Fi driver list corruption

Kernel panic on Linux 7.1 — root cause: MediaTek MT7925 Wi‑Fi driver list corruption

Date: 2026-05-31 Machine: HP ZBook Ultra G1a 14" (AMD Strix Halo / Ryzen AI Max), BIOS X89 Ver. 01.04.03 (2025-12-03) Crashing kernel: linux-image-7.1-amd64 version 7.1~rc5-1~exp1 (Debian experimental release candidate) Working kernel (reverted to): 6.19.14+deb14-amd64


Verdict

The panics are a software bug in the in‑tree MediaTek mt7925/mt76 Wi‑Fi driver in the 7.1‑rc5 kernel — not a hardware fault. The driver corrupts an internal linked list (sta_poll_list) while processing Wi‑Fi TX‑status reports; the kernel's list‑hardening check catches the corruption and, because it happens in interrupt/NAPI context, it escalates to an unrecoverable panic.

Reverting to 6.19.14 is the correct mitigation — that kernel does not have the regression.

The smoking gun

From the EFI‑pstore crash dump (/var/lib/systemd/pstore/…, archived copies in /tmp/panic-report/):

slab kmalloc-8k start ffff8a7ae2bf2000 pointer offset 4160 size 8192
list_add corruption. prev->next should be next (ffff8a74cfe088f8),
                     but was ffff8a7ae2bf3040. (prev=ffff8a7ae2bf3040).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:32!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 12  Comm: napi/phy0-0   Tainted/Not tainted  7.1-amd64  Debian 7.1~rc5-1~exp1
RIP: 0010:__list_add_valid_or_report+0xa6/0xb0

Call Trace:
  __list_add_valid_or_report          <- list-hardening BUG (CONFIG_DEBUG_LIST)
  mt76_wcid_add_poll        [mt76]            <- adds station to dev->sta_poll_list
  mt7925_mac_add_txs.part.0 [mt7925_common]   <- while handling a TX-status report
  mt7925_rx_check           [mt7925_common]
  mt76_dma_rx_poll          [mt76]
  mt792x_poll_rx            [mt792x_lib]
  __napi_poll → napi_threaded_poll_loop → kthread   (threaded NAPI, softirq ctx)

Kernel panic - not syncing: Fatal exception in interrupt

Mechanism: mt76_wcid_add_poll() adds a station's poll_list node to sta_poll_list, but that node is already linked (prev->next points back at the node itself = a double‑add). CONFIG_DEBUG_LIST (lib/list_debug.c:32) detects it and executes ud2 (invalid opcode 0f 0b). Because the fault is inside the threaded‑NAPI RX poll (interrupt context), the Oops becomes Kernel panic — not syncing: Fatal exception in interrupt.

Why journalctl showed nothing: a panic in interrupt context never lets journald flush to disk. The only record is the firmware's EFI‑pstore, which systemd-pstore archived under /var/lib/systemd/pstore/. journalctl --list-boots only shows the 6.19 recovery boots, not the 7.1 crash boots.

Trigger

Both panics correlate with Wi‑Fi roaming — repeated re‑association between two BSSIDs of the same SSID (f4:92:bf:2d:45:55f4:92:bf:2e:45:55) immediately precedes the corruption. Station‑table / MLO‑link churn during roaming exercises the buggy TXS → poll‑list path.

Ruled out

  • RustDesk (mouce-library-fake-mouse, RustDesk UInput Keyboard) floods the log with x86/split lock detection … bus_lock trap warnings and set the W taint in one crash — harmless noise.
  • The clean reproduction proves the Wi‑Fi driver is the cause: a second crash was marked Not tainted and panicked only 248 s after boot with the identical RIP and mt76/mt7925 call trace — no RustDesk, no prior warning.

Hardware / environment

Wi‑Fi MediaTek MT7925 (Filogic 360, Wi‑Fi 7) — PCI 14c3:7925, drivers mt7925e / mt7925_common / mt792x_lib / mt76
GPU AMD Strix Halo Radeon 8050S/8060S (1002:1586)
NVMe SanDisk WD_BLACK SN7100
Kernel install 7.1‑rc5 installed 2026‑05‑29 18:08, upgraded rc4→rc5, from Debian experimental

⚠️ The metapackage linux-image-amd64 itself was upgraded to 7.1~rc5-1~exp1, so apt is tracking experimental — the next apt upgrade will pull another RC kernel and could reintroduce the crash.

Upstream status

  • The bug class is known and being actively fixed, but NOT merged into 7.0 or 7.1‑rc5.
    • Zac Bowling, "wifi: mt76: mt7925: MLO stability fixes" (PATCH v7 0/6, linux‑wireless/LKML, 2026‑01‑29) — patch 1/6 = "fix double wcid initialization race condition", the strongest match for this double‑add corruption.
    • Out‑of‑tree fixes + DKMS: https://github.com/zbowling/mt7925 (still maintained precisely because the fixes aren't upstream yet).
  • Already in 7.1 (so NOT the fix here): "do not add non‑sta wcid entries to the poll list" (AUTOSEL 6.16) and Aug‑2025 "fix list corruption" patches.
  • This exact signature is undocumented: not in zbowling's KNOWN_ISSUES.md, not OpenWrt mt76 #909 (paging fault in mt7925_mac_sta_add — different) or #1023 (firmware‑load strnlen overflow — different). The clean mt7925_mac_add_txs → mt76_wcid_add_poll list_add trace appears to be a new data point. (Debian's RC kernels build with CONFIG_DEBUG_LIST, which is why the corruption surfaces as a precise BUG here rather than a random later crash.)

Recommendations

  1. Stay on 6.19.14 (already done). Set it as the GRUB default so a stray reboot doesn't land on 7.1.
  2. Stop experimental from auto‑installing RC kernels — remove 7.1 and/or pin linux-image-amd64 back to stable/trixie:
    sudo apt remove linux-image-7.1-amd64 linux-headers-7.1-amd64
    # then fix apt sources/preferences so linux-image-amd64 tracks stable, not experimental
    
  3. Confirm the match (optional, decisive): test Zac Bowling's v7 series / DKMS on 7.1‑rc5.
    • Crash stops → it is the double‑wcid‑init race; report "confirmed" to help it merge.
    • Crash persists → distinct unfixed variant; file a fresh upstream report.
  4. Reporting: don't open a generic "mt7925 crashes" bug (covered). Do:
    • contribute this backtrace to the active effort (github.com/zbowling/mt7925 issue, and/or the linux‑wireless thread; CC mt76 maintainers Felix Fietkau / Lorenzo Bianconi and MediaTek Sean Wang / Deren Wu);
    • file a Debian bug (reportbug linux-image-7.1-amd64) — an experimental RC kernel hard‑panicking on common HP hardware; they can hold the RC or backport the fix.
  5. If you must run 7.1, the only reliable workaround until patched is to avoid the mt7925 RX path (blacklist mt7925e, use a USB Wi‑Fi adapter). No tunable fixes the list corruption itself.

Evidence locations

  • Raw firmware crash records: /var/lib/systemd/pstore/<epoch>/…/dmesg.txt (root‑only)
  • World‑readable copies: /tmp/panic-report/crash-*.txt
  • Collection script: /tmp/panic-investigate.sh
  • Full extracted dump: /tmp/dump.log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment