You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Purpose. This is the pre-Phase-3 research artifact mandated by docs/CYW43-REWRITE.md §5.9.2. It documents the exact iovar, wire-format, chip-compatibility, and reliability-context for PMKSA cache management on the CYW43439 chip as used in Raspberry Pi Pico W.
Status. Produced 2026-04-20 from first-party sources (Linux kernel tree, pico-sdk issue tracker, cyw43-driver upstream). Sign-off for Phase 3 PMKSA coding: ready.
TL;DR
The PMKSA iovar on CYW43439 is pmkid_info, plain (not bsscfg:-prefixed). Verified from Linux brcmfmac source.
Our blob is firmware 7.95.61 from 2023-01-11 (identical to Embassy's shipped firmware; verified from strings output on both). WLC version < 13.0, so we use the legacy API. V2 and V3 require newer firmware. The legacy API is adequate for everything we need.
Correction note 2026-04-20: an earlier draft of this document stated our blob was 43439A0_7_95_49_00 (2018-era). That was the pico-sdk reference's bundled version; our actual src/cyw43/firmware/43439A0_combined.bin is 7.95.61. The 43439A0_combined.bin filename is just a label.
Legacy payload: exactly 356 bytes = __le32 npmk + 16 × {u8 bssid[6]; u8 pmkid[16]}. No padding, no bsscfg prefix.
Flush = memset to zero, send the whole 356-byte buffer. That's it.
Important framing correction. The reliability issue most commonly reported on Pico W (pico-sdk #2153) is not a stale-PMKSA problem — it is an ICV_ERROR event flood triggered by missed rekeys under power-save mode. That issue was fixed upstream by adding pend_rejoin on event 49 (cyw43-driver PR #130, merged Jan 2025). Our current Zig driver is missing that handler. The rewrite already commits to handling it via the join state machine (§6.1).
PMKSA clear_on_boot is still worth doing — cheap (one 356-byte iovar write at end of wifiOn), addresses an independent 802.11 failure mode (AP-side cache-key drift after AP reboot), has no documented downside. But it is not the silver-bullet fix for the most-cited Pico W reliability bugs. The plan now reflects both fixes, and orders them: ICV_ERROR handler (high value, free) first, PMKSA clear (medium value, cheap) second.
brcmf_fil_iovar_data_set is brcmfmac's standard iovar-set path; it does not apply a bsscfg prefix. Contrast with e.g. bsscfg:event_msgs in cyw43-driver, where the bsscfg prefix is part of the name string. For pmkid_info the name is literally "pmkid_info" with no prefix.
Direction: both read and write (brcmf_fil_iovar_data_set for SET, brcmf_fil_iovar_data_get for GET). For clear-on-boot we only need SET.
Maps to WLC command:WLC_SET_VAR (263) for SET, WLC_GET_VAR (262) for GET. Same wire framing as every other iovar in the reference cyw43-driver.
2. Chip compatibility
CYW43439 is explicitly recognized by brcmfmac as CY_CC_43439_CHIP_ID. From S3 (feature.c):
if (drvr->bus_if->chip!=BRCM_CC_43430_CHIP_ID&&drvr->bus_if->chip!=BRCM_CC_4345_CHIP_ID&&drvr->bus_if->chip!=BRCM_CC_43454_CHIP_ID&&drvr->bus_if->chip!=CY_CC_43439_CHIP_ID)
brcmf_feat_iovar_data_set(ifp, BRCMF_FEAT_GSCAN, ...);
The chip is in the "legacy family" group along with 43430 / 4345 / 43454 — they share firmware lineage. Any feature supported on one is almost always supported across the group. pmkid_info has been present in brcmfmac's PMKSA flow for years (the legacy API predates both V2 and V3), so it is expected to be present on every firmware vintage shipped with these chips.
Firmware with WLC version ≥ 12.0 supports PMKID_V2.
Firmware with WLC version ≥ 13.0 supports PMKID_V3.
Firmware with WLC version < 12.0 supports legacy only.
Our blob vintage is 7.95.61 (Jan 2023, confirmed via version string dump). Firmware 7.x families ship with WLC versions well below 12 — so we are on the legacy PMKID API. The pico-sdk shipping cyw43-driver makes no attempt to negotiate V2/V3 at all, consistent with legacy being the only applicable API for this chip family.
Conclusion: our chip+blob uses the legacy API. V2 is never implemented by brcmfmac anyway (TODO: implement PMKID_V2 throughout cfg80211.c). V3 is the future-compatible path but not relevant to us.
2.2 Negotiation probe (optional)
brcmf_feat_is_enabled(ifp, BRCMF_FEAT_PMKID_V3) is the host-side flag. The firmware itself does not announce capability; the driver reads the WLC version via wlc_ver iovar and sets the feature flag accordingly. Our Zig port does not need to replicate this — we can hard-code legacy for the committed blob, and treat any firmware-upgrade as an R17 revalidation.
Alignment: the __le32 npmk is naturally 4-byte aligned; u8 bssid[6] and u8 pmkid[16] are byte-array fields so no padding. The C compiler emits exactly 356 bytes for this struct on every platform we care about.
3.1 Operation semantics (legacy API, from S1)
Operation
How it's built
Flush (clear-all)
memset(&pmk_list, 0, sizeof(pmk_list)) — i.e. npmk = 0, all 352 bytes of entry-array zero. Send.
Add an entry
Find slot where bssid matches (or first free slot up to npmk). Write bssid + pmkid (16 bytes). If new slot, npmk += 1. Send the whole 356-byte list.
Remove an entry
Find slot with matching bssid. Shift all subsequent entries down by one. Zero last slot. npmk -= 1. Send the whole 356-byte list.
All operations rewrite the whole list. This is not an efficiency concern at our scale (356 bytes, once per join/deauth) but is worth noting: there is no per-entry add/delete wire op in the legacy API. Each operation builds a complete list view and sends it atomically.
3.2 Endianness
npmk is __le32 — little-endian u32. Must be serialized as LE from the host.
bssid and pmkid are byte arrays — no endianness.
There is NO length or version prefix in the legacy API (contrast with V2/V3 which have __le16 version + __le16 length headers).
3.3 The complete wire payload for clear-on-boot
356 bytes, all zero. No version field, no length field. The firmware interprets npmk = 0 as "flush all entries."
802.11 scenario: the CYW43439 firmware can retain a PMKSA cache entry across a host reboot if the host chip does not power-cycle the CYW43 (our case — the chip stays powered, only the RP2040 restarts with watchdog/UF2). On reconnect, the chip attempts fast-reauth using the cached PMKID. If the AP has since forgotten the PMKSA (AP reboot, session timeout, config change), the AP refuses the fast-reauth; the chip retries with stale state; reliability degrades.
Clearing the firmware cache at wifiOn (i.e. at every host boot) forces the first join in a session to be a clean 4-way handshake, side-stepping the "AP-forgot-us-but-chip-thinks-it-knows-us" state drift.
This scenario is inferred from 802.11 fundamentals, not documented as a specific reported Pico W bug.
4.2 What clear_on_boot does NOT address
The most-reported reliability issue on Pico W is NOT a stale-PMKSA issue. From S5 (pico-sdk #2153):
Symptom: ASYNC(0000,49,0,0,0) events flood the UART; device stops responding; WLAN.isconnected() still returns True.
Event 49 = CYW43_EV_ICV_ERROR = "integrity check value error on received frame."
Root cause: under power-save mode, the chip sleeps through a group-key rekey exchange. The AP gives up. The chip wakes with an outdated GTK. Every received unicast/multicast frame fails decryption. The chip never realises the link is dead because it never received (or missed) the deauth.
Three lines. On ICV_ERROR, queue a rejoin. The poll loop executes cyw43_ll_wifi_rejoin() which re-issues WLC_SET_SSID and forces a full fresh association.
This fix is present in our reference commit (dd7568… April 2024 vintage). Whether it exists in our reference depends on the exact commit date — but regardless, our current Zig driver src/cyw43/protocol/events.zig drops event 49 via else => {} and does nothing about it. This is the most impactful reliability gap in our current driver.
4.3 Secondary issue
From S7 (pico-sdk #1373): first connection after reboot fails; second succeeds. Root cause: AP still has us in its association table; refuses a new connection until the stale entry times out or we explicitly issue cyw43_wifi_leave() before reboot.
Fix (observed): retry with 12 s timeout, OR call cyw43_wifi_leave() in a shutdown hook. The former is already the reference's happy-path under auto_reconnect. The latter is a host-level integration concern (bindings/wifi.zig).
PMKSA clear_on_boot does NOT fix this. Even with a clean PMKSA cache, the AP's own association table is the obstacle, not any key material.
// In ll/boot.zig or ctrl/state.zig near the end of wifiOn().//// Clear firmware-side PMKSA cache so the first join of this boot session// does a clean 4-way handshake. Prevents "AP-forgot-us-but-chip-cached"// state drift after AP power-cycle.fnpmksaClearOnBoot(driver: *Driver) WifiError!void {
// Legacy API payload: __le32 npmk = 0, 16 × {u8 bssid[6]; u8 pmkid[16]}// Total 356 bytes, all zero.varbuf: [356]u8=@splat(0);
// Send as plain iovar (no bsscfg prefix).tryioctl.setIovar(driver, "pmkid_info", &buf);
}
Placement: at the end of wifiOn(), after WLC_UP has completed successfully and before any join() call is permitted. Driver.state transitions from wifi_up to wifi_up_pmksa_cleared; join() requires the latter.
Error handling:
Firmware returns 0 → great, done. State advances.
Firmware returns non-zero → record the status; this is an R17-adjacent signal (either the iovar is unsupported on our blob — unexpected given S3 — or the blob is newer than expected and we need V2/V3). For the P3b hardware-verification gate: a non-zero response is a Phase 3 validation failure. For post-verification runtime: log a warning and continue (degraded reliability rather than refusing to boot).
5.2 cache_in_boot (Phase 3 P3c, STRETCH)
In-boot caching adds:
A host-side (bssid, pmkid) → slot_index map, bounded at 16 entries (matches BRCMF_MAXPMKID).
On successful joined transition: query firmware for the current PMKID via pmkid_info GET; copy into host cache.
On DEAUTH_IND / DISASSOC_IND / ICV_ERROR for the current BSSID: evict that entry from the host cache and re-send the updated list to firmware.
Important: the PMKID produced by the 4-way handshake is firmware-generated, not host-computed. The host cache is a mirror; the firmware's view is authoritative at the wire level. We send the merged list back to firmware so the two views stay consistent.
Host cache is NOT persisted across reboots in V1 — flash-write support is not yet implemented (ISSUES.md open item #3). Persistent PMKSA is Phase B material at the earliest.
5.3 Phase 3 ordering
The plan §8.3 should be updated to reflect the research-derived ordering:
Highest-priority reliability fix: ICV_ERROR (event 49) → pend_rejoin. This is a 3-line addition in the join state machine's event dispatcher. Our plan already commits to exhaustive event decoding — this specifically must include event 49 with the queue-rejoin action.
Stretch: PMKSA cache_in_boot with DEAUTH eviction. Skip for P3; revisit in Phase B if soak data suggests reconnect-latency is a user-visible issue.
All three land in the Phase 3 commit range; the first two are gates for Phase 4 cutover.
6. Verification plan (on-hardware)
6.1 Does the iovar work on our blob?
30-minute test:
Build a minimal Phase-1-ish test harness (just boot + wifiOn + issue pmkid_info).
Inspect the iovar response status.
Expected: 0 (success). Anything else → consult S3's WLC version compatibility matrix and §5.9.4 fallback in the plan.
6.2 Does clear_on_boot change observed behavior?
Evidence of effectiveness (per §11.3 negative-proof gate):
Timing: reconnect latency after AP power-cycle, new-driver + clear_on_boot vs old-driver (which never issues the iovar). Measure association-to-keyed time across ≥10 trials. New driver should show latency consistent with full 4-way handshake — not the "retry-then-succeed" pattern seen in S7.
On-wire (preferred, requires monitor-mode NIC): capture 802.11 frames during reconnect. New driver's Association Request should have no PMKID in the RSN Information Element. Old driver may include a stale PMKID.
AP-log evidence (preferred if hostapd is the AP): hostapd logs should show no "PMKSA cache entry found for ..." message for our BSSID on the reconnect.
6.3 Does the ICV_ERROR handler work?
This is separate from PMKSA but worth noting:
Reproduce the pico-sdk #2153 scenario: put the chip in aggressive PM, wait for rekey, observe event 49 flood.
With the handler present: the first event 49 triggers pend_rejoin; the driver reconnects in ~5 s.
Without the handler: event 49 floods forever (current state of our Zig driver).
7. Open questions (deliberately deferred)
Q1: Does firmware auto-populate PMKSA cache on a successful handshake, without us calling pmkid_info SET? Best guess: yes, based on brcmfmac design. Verification: issue pmkid_info GET after a join and see whether npmk > 0. If yes, firmware is self-populating; we only need clear_on_boot to prevent carry-over. Phase 3 experiment.
Q2: Does clear_on_boot need to fire on every wifiOn, or only on the first one after a host reboot? The chip stays powered across a host-reboot-via-watchdog on Pico W, so firmware state is preserved — meaning clear_on_boot on every wifiOn is the right choice. (There is no Pico W scenario where the chip has been freshly powered but we skip the clear; both cases converge on "send the iovar at wifiOn.")
Q3: For an extreme-aggressive clear_on_every_join policy, is there a cost? Semantically: every join becomes a full 4-way handshake, so fast-reauth is lost. On a scale where joins happen at the rate of "once per session, maybe once per reconnect," the cost is negligible. On a theoretical mass-roam scenario (many APs, rapid roaming between them within a boot), fast-reauth matters — but that is not Pico W's use case.
Endianness: npmk is little-endian; rest is byte-arrays
Blob compatibility: CYW43439 explicitly supported via legacy API in brcmfmac; WLC<12.0 firmware uses legacy only
License boundary: brcmfmac is GPL-2.0; this research document cites algorithmic behavior and struct layouts as protocol evidence, not copied code. Zig implementation should be written from this spec without referencing brcmfmac source directly.
Hardware verification plan defined (§6)
Reliability-context framing corrected: PMKSA complements, does not replace, the ICV_ERROR handler
Ready for Phase 3 coding. The plan's §5.9.4 fallback path is still appropriate if outcome (1) does not pan out on hardware, but the expected outcome is (1) — the iovar works on our blob and clear_on_boot is a ~20-line addition.
Research conducted 2026-04-20. See docs/CYW43-REWRITE.md §5.9 for how these findings integrate with the larger rewrite plan.
State: plan produced, peer-reviewed (GPT-5.4, conversation pico-cyw43-rewrite-plan-2026). Ready to execute in subsequent coding sessions.
Goal: replace src/cyw43/ with a reference-quality pure-Zig driver that matches or exceeds pico-sdk/lib/cyw43-driver reliability.
Non-goals (this milestone): Bluetooth, SDIO transport, lwIP glue, AP-mode as default path. All enumerated as deferred phases in §8.
Primary regression gate: -Dengine=js UF2 byte-identical to .preflight-baseline/pico-preintegration.uf2 throughout the rewrite.
Secondary regression gate: wire-byte equivalence of the new driver against captured old-driver SPI traces at init / scan / join / idle (see §7.2).
Scope in lines: audit of ~5,400 reference C; delivery of ~2,800–3,500 Zig across ~18 new files. See §4.
Effort: 5 focused coding sessions of 6–10 hours each, across 2–4 weeks of hardware-iteration calendar time. See §8.
Highest risks: wire-format misalignment (endianness, padding, unaligned access on Cortex-M0+); event-ordering assumptions that differ from firmware’s actual delivery order; PMKSA iovar discovery on our blob vintage (now a mandatory deliverable — §5.9, R6); auth-retry improvements colliding with firmware-side retry logic. See §9.
Reading order. Read this document top-to-bottom before writing any code. Sections build on each other. If you are executing the plan, do not skip §3 (Zig idiom style guide) or §10.3 (attribution mechanics). The go/no-go checklist in §11 is a hard gate, not a suggestion.
Companion references.
AGENTS.md § "CYW43 Gotchas" — 7 hard-won gotchas (#15–#21) that apply to any driver.
ISSUES.md #25 — the 180 s UART corruption burst; the rewrite must collect data sufficient to identify the offending event.
docs/CYW43-PMKSA-RESEARCH.md — first-party-verified spec for the pmkid_info iovar (pre-Phase-3 research completed).
docs/NANORUBY.md + src/ruby/nanoruby/UPSTREAM.md — the template for this plan’s structure and for the vendoring-change-tracking discipline applied in §10.
Plan revision log. Key decisions made during planning, in reverse chronological order:
Date
Decision
Record
2026-04-20
GPT-5.4 final sign-off (peer-review turn 6) with 12 targeted tightenups all applied. Plan is "proceed to Phase 1" per peer. Tightenups: (a) WPA3 bool → Wpa3Mode enum with .auto default + fallback table; (b) event-mask blob-coupling note; (c) yieldDuringLongOp non-reentrancy rules (5 explicit); (d) EventLog tuple-key precision ({event_type, status, reason, auth_type, ifidx}); (e) PMKSA clear_on_boot lifecycle pinned to 5-step wifiOn flow with race-window rationale; (f) §6.1.5 rejoin-storm coalescing rule (prevents livelock under crypto-error bursts); (g) three-flag semantics pin-downs for WPA2/WPA3/open + failure-latch clearing; (h) logger-disabled soak variant for ISSUES.md #25 (separates event existence from UART print perturbation); (i) fixture-metadata sidecar schema with mandatory SHA checks (closes R17 loop); (j) §8.0.4 no-optimization-before-parity rule; (k) §8.1 host-interrupt-pin behavioral verification task; (l) §8.0.2 checkpoints labeled required vs conditional.
This row + 12 sections named.
2026-04-20
Unknown-event logging mechanism consolidated into authoritative §6.1.4. Replaces scattered fragments in §3.2, §3.4, §4.1, §5.8, §6.1 event-table last row, §8.2, and §11.3 with a single spec: full UnknownEvent struct (10 fields), 89-entry event-name table, EventLog ring buffer (16 entries, 5-s coalescing window), concrete log-line formats, Driver.getEventLog() query API, wifi events UART shell command, and specific ISSUES.md #25 resolution procedure. This is the component that closes the loop on any remaining spurious events.
Three-flag link-state model (auth_ok, join_ok, keyed) adopted, replacing pico-sdk's wifi_join_state bitmask. Cleaner to test; aligns with Embassy+soypat.
§6.1
2026-04-20
WPA3-SAE mandated as Phase 3 deliverable, not optional. Firmware supports sae+mfp; reference driver has the join flow; no cost to include. wpa3_mode: Wpa3Mode = .auto by default per §5.10 (replaces earlier enable_wpa3: bool proposal after GPT-5.4 turn-6 review flagged bool-vs-enum conflation).
§5.10
2026-04-20
Seven additional events added to the handler: MIC_ERROR (17), UNICAST_DECODE_ERROR (50), MULTICAST_DECODE_ERROR (51), PSM_WATCHDOG (41), PMKID_CACHE (21), GTK_PLUMBED (84), BCNLOST_MSG (31). Several are ignored by every reference driver; we're a step-change.
PSK_SUP reason=14 → IGNORE rule added. This fixes a real latent bug in earlier drafts — without it, roam events trigger spurious rejoins.
§6.1, §7.1 test matrix
2026-04-20
PMKID_CACHE event used for cache_in_boot sync (not polling).
§5.9.5
2026-04-20
PMKSA reframed from time-boxed enhancement to hard Phase 3 deliverable (peer-review-approved override).
§5.9
2026-04-20
PMKSA research artifact produced with verified iovar spec.
docs/CYW43-PMKSA-RESEARCH.md
(initial)
Peer-reviewed rewrite plan committed.
§1–12 core structure
Section 1 — Executive summary
The pico project runs firmware on Raspberry Pi Pico W (RP2040 + CYW43439 2.4 GHz WiFi/BT combo). WiFi is provided today by src/cyw43/ — a ~2,200-line Zig driver that associates and carries TCP/TLS/MQTT in a happy-path workload but has documented reliability gaps:
Gap
Symptom
Root cause
Fix in this plan
No auto-reconnect on DEAUTH/DISASSOC
Dropped link stays down until next TX fails.
Event handler has else => {} on most paths.
Exhaustive event decoder + join state machine §6.1.
Minimal PSK retry
Transient key-exchange edge misses are terminal.
No pend_rejoin deferred-action mechanism.
pend-flag machinery §6.1.
Stale AP state
Repeated 4-way handshake failures after router power-cycle.
Silent link-death after ~minutes of PM2; isconnected() lies.
Event 49 dropped via else => {}.
Event 49 + MIC/UNICAST/MCAST_DECODE_ERROR + PSM_WATCHDOG in same pend_rejoin class (§6.1).
Spurious rejoin during roams
Unnecessary reassociation when AP hands off.
PSK_SUP reason=14 treated as failure.
Explicit IGNORE rule (§6.1 table).
WPA3-only APs break (AGENTS.md gotcha #28)
DEAUTH type=6 repeated on WPA3 networks.
No SAE join flow; mfp=1 hard-coded.
WPA3-SAE implementation (§5.10).
What this plan produces. Not code. A document that a subsequent coding session — with access to this plan, AGENTS.md, the reference C, and the user-ai MCP — can follow step-by-step to produce a new src/cyw43_new/ pure-Zig driver that:
is file-by-file structured around protocol layers, not around the C source’s historical decomposition (§4);
ports behavior, not bit layout, of the reference state machines (§6);
decodes every event type exhaustively from day 1 (§6.4 + §9 mitigation for ISSUES.md #25);
surfaces a clean Zig instance API over a compat façade that preserves existing bindings/wifi.zig calls during migration (§5);
is validated against golden SPI-wire traces before any hardware cutover (§7.2);
carries rigorous attribution of reference-driver lineage (§10).
Scope cuts. Out of scope for this rewrite (each is a potential future phase, explicitly not required now):
SDIO transport (Pico W is SPI-only).
Bluetooth HCI (WiFi-only rewrite).
lwIP integration (we have our own TCP/IP in src/net/).
AP-mode as a default-reachable path. Internal architecture must preserve bsscfgidx / interface-id plumbing so AP-mode can land in a future phase without an API-wide refactor.
What "done" looks like.-Dcyw43=new is the default. The old tree is deleted. bindings/wifi.zig still works. The driver survives the acceptance matrix in §7.3, including router power-cycle (H2 — PMKSA clear_on_boot is what makes this pass) and forced deauth (H3). The JS-mode UF2 is still byte-identical to the pre-rewrite baseline.
Section 2 — Deep audit of the reference C driver
The reference is misc/pico-sdk/lib/cyw43-driver/ at commit dd7568229f3bf7a37737b9e1ef250c26efe75b23 (April 2024), under LICENSE.RP. Non-SPI / non-WiFi files (cyw43_sdio.*, cyw43_bthci_uart.c, cyw43_lwip.c, cyw43_stats.*, cyw43_btbus.h) are audited to the extent needed to carve the SPI+WiFi seam; they are not ported.
2.1 cyw43.h (733 lines — public API header)
Purpose. Public C API consumed by the pico-sdk integrator (cyw43_arch_*). Defines the cyw43_t top-level state struct and the full function surface.
Top-level struct cyw43_t (lines 108–152):
cyw43_ll_t cyw43_ll; — the low-level opaque state (array of u32 words, size CYW43_LL_STATE_SIZE_WORDS).
uint8_t itf_state; — bitmask: bit 0 = STA up, bit 1 = AP up.
Global state. Reference exposes cyw43_t cyw43_state;, void (*cyw43_poll)(void);, uint32_t cyw43_sleep; as extern (lines 154–156). In Zig we do not mirror this. The new driver is an explicit Driver instance passed into every function that needs it. The cyw43_poll function pointer shape is replaced by a method on the driver; cyw43_sleep becomes a field. See §3.8 (globals) and §5.
2.2 cyw43_ll.h (322 lines — low-level API)
Purpose. The "low-level" layer: the set of operations the mid-level driver (cyw43_ctrl.c) calls into. Also defines the on-wire event struct and scan-result struct.
IOCTL commands (lines 56–64). Bottom bit encodes SET vs GET: (cmd & 1) ? SDPCM_SET : SDPCM_GET, actual WLC_* command is cmd >> 1. Constants:
Event types. The on-wire event_type field is big-endian u32. Table below enumerates every event our decoder specifically handles; the full 89-entry name table appears in §12.3. Events not in this table are decoded but log-only (with rate-limiting per §8.2).
Scope expansion from original reference. The pico-sdk C driver handles a narrow event set (~10 event types). The plan expands to 20+ after cross-referencing Embassy, soypat, and Linux brcmfmac for events the reference C ignores but which carry real reliability signal. The "Source" column indicates which reference surfaced the handling pattern:
All other events (32 ROAM_PREP, 37 ROAM_START, 36 JOIN_START, 38 ASSOC_START, 35 RESET_COMPLETE, 40 RADIO, etc.) are log-only with rate-limiting. Unknown event types fall through to the unknown-event handler per §2.4.9 and §8.2.
Event status values (lines 85–95). Generic across event types. SUCCESS=0, FAIL=1, TIMEOUT=2, NO_NETWORKS=3, ABORT=4, NO_ACK=5, UNSOLICITED=6, ATTEMPT=7, PARTIAL=8, NEWSCAN=9, NEWASSOC=10.
PSK_SUP supplicant states (lines 98–112). Carried in the event status for PSK_SUP:
Reference uses status 4 | 8 | 10 with reason 15 (SUP_WPA_PSK_TMO) as trigger for pend_rejoin. Any other non-KEYED non-timeout is terminal-BADAUTH.
Roam reasons (115–123), prune reasons (126–144), supplicant failure reasons (147–162). Complete tables — the new Zig code must include them verbatim because log decoding depends on them.
These are wire-format values. The low byte is wsec (the WLC_SET_WSEC value — 4 = AES, 2 = TKIP, 6 = AES|TKIP); the upper bytes encode WPA/WPA2/WPA3 auth flags used elsewhere.
Scan-result struct (cyw43_ev_scan_result_t, 216–227). 48 bytes including bssid[6], ssid_len, ssid[32], channel (u16, top byte is flags), auth_mode (1/2/4 bitmask = WEP/WPA/WPA2), rssi (i16). The layout has several _0[5], _1[2], _2[5], _3 padding fields — the wire order and byte offsets matter, do not re-pack.
Async event struct (cyw43_async_event_t, 230–242). Fixed header (flags, event_type, status, reason, 30 bytes reserved, interface, 1 byte reserved), then a union containing either a scan_result or (in the reference) other typed payloads. On-wire fields flags/event_type/status/reason are big-endian, decoded by the parser.
LL API functions (270–318):
cyw43_ll_init / cyw43_ll_deinit
cyw43_ll_bus_init — SPI bringup + fw/nvram/clm upload
cyw43_ll_bus_sleep — sleep/wake (KSO on SDIO; simpler on SPI)
cyw43_ll_process_packets — drain RX queue
cyw43_ll_ioctl — raw ioctl
cyw43_ll_send_ethernet — TX ethernet
cyw43_ll_wifi_on — enable country + basic iovars + WLC_UP
cyw43_ll_wifi_pm / _get_pm — power-save
cyw43_ll_wifi_scan — escan
cyw43_ll_wifi_join — start association (fills last_ssid_joined)
cyw43_ll_wifi_set_wpa_auth — switch to WPA1 (for WPA2→WPA1 fallback on PRUNE)
cyw43_ll_wifi_rejoin — re-issue SET_SSID with cached last_ssid_joined
cyw43_ll_wifi_get_bssid / _get_mac / _update_multicast_filter
cyw43_ll_wifi_ap_init / _set_up / _get_stas — AP mode (deferred)
cyw43_ll_gpio_set / _get — CYW43 GPIO (LED is gpio 0)
cyw43_ll_has_work / _bt_has_work — "anything pending?"
cyw43_ll_write_backplane_reg/mem, _read_backplane_reg/mem — BT use only in ref
Mid-level callbacks (309–312). Integrator must provide:
A join succeeds when wifi_join_state == (ACTIVE | AUTH | LINK | KEYED), i.e. 0x0e01. At that point cyw43_ctrl.c:434-438 clears the flag bits back to ACTIVE and calls cyw43_cb_tcpip_set_link_up(STA).
State transitions driven by cyw43_cb_process_async_event (333–439):
+---------+
| idle | wifi_join_state = 0
+----+----+
| cyw43_wifi_join()
v
+---------+
| ACTIVE |
+----+----+
|
+-------+-----------+-----------+-----------+
| | | |
EV_AUTH EV_LINK EV_PSK_SUP EV_SET_SSID
ok/fail (flags&1) (status) (status)
| | | |
|AUTH |LINK s=6:|KEYED s=0: noop
status=6: s=4|8|10 r=15
ignore pend_rejoin
else: BADAUTH else: BADAUTH
s=3 r=0: NONET
else: FAIL
if all four bits set (ACTIVE|AUTH|LINK|KEYED) → tcpip_set_link_up()
state = ACTIVE alone
|
v
+---------+
|JOINED | wifi_join_state = ACTIVE only, link up
+----+----+
|
+--------+--------+---------+
| | |
EV_DEAUTH_IND EV_DISASSOC EV_ICV_ERROR
(reason=2: (locally (always pend_rejoin)
wrong passwd) disassoc)
pend_disassoc=t; state=0; pend_rejoin=t
next poll issues tcpip_set_
WLC_DISASSOC link_down
| | |
v v v
... back to idle / rejoining ...
EV_PRUNE (status=0 reason=8): RSN mismatch at AP — try WPA1:
pend_rejoin = true
pend_rejoin_wpa = true (poll issues cyw43_ll_wifi_set_wpa_auth first)
The order matters: pend_disassoc first (cleans slate), then pend_rejoin_wpa (switches auth mode), then pend_rejoin (starts new SET_SSID).
What cyw43_ll_wifi_rejoin does (cyw43_ll.c:2184): re-issues WLC_SET_SSID on the 36-byte last_ssid_joined buffer cached at join time. Does not re-send the PMK, does not re-set WSEC — the firmware keeps those from the original join call.
Why this is the reliability win. Every reconnect scenario (deauth, disassoc, ICV error, router power-cycle, password-related, RSN-mismatch, mid-edge PSK timeout) is funneled through this single deferred-rejoin mechanism. Our current Zig driver lacks the machinery — a single deauth is terminal.
2.3.2 Scan state machine
Much simpler (333–351):
wifi_scan_state:
0 = idle
1 = scanning (set by cyw43_wifi_scan)
2 = complete (set when ESCAN_RESULT event arrives with status=0)
per-result (status=8):
call wifi_scan_cb(env, &ev->u.scan_result); continue
on complete (status=0):
wifi_scan_state = 2
Gotcha. The reference declares wifi_scan_state as volatile (cyw43.h:115) because it’s written from event-callback context and polled from cyw43_wifi_scan_active. In our Zig port the same must be true — use an atomic or explicit volatile load/store; the driver must not require a lock to read scan state.
A sparse 89-entry table mapping event type → string. Only populated for event types the reference handles; all others print as decimal. The new Zig code must include every event name in the table, not just handled ones — for diagnostics (ISSUES.md #25).
2.4 cyw43_ll.c (2,435 lines — wire protocol)
The heavy reading. Broken down by responsibility below.
sz is the byte length of the data phase for writes, and the requested read length for reads. fn is 0/1/2 (2 bits used).
Byte ordering on wire. Before the host switches the CYW43 SPI block to 32-bit mode (pre-SPI_BUS_CONTROL write), the chip expects WORD_LENGTH_16 + ENDIAN_BIG — which produces the pattern {b[1], b[0], b[3], b[2]} when we write a u32 host-little-endian. That’s what cyw43_put_swap32/cyw43_get_swap32 (cyw43_spi.c:40-48) do: a two-u16 byte swap rather than a full endian reverse. After the switch (host writes WORD_LENGTH_32 | ENDIAN_BIG | ...), subsequent accesses use plain cyw43_put_le32 byte order.
The endianness matrix (referenced across the plan):
Field
Direction
Order
gSPI command word, pre-32-bit-mode
host → CYW43
half-word-swapped LE ({b1,b0,b3,b2})
gSPI command word, post-32-bit-mode
host → CYW43
plain LE
Register data (post-mode-switch)
both
plain LE
Backplane register byte-addressable writes
host → CYW43
LE per byte
SDPCM header fields (size, size_com, …)
both
LE
CDC header fields (cmd, len, flags, status)
both
LE
BDC header (4 bytes of flags/priority/flags2/data_offset)
both
individual bytes
Async event flags/event_type/status/reason
CYW43 → host
BE (decoder calls be16toh / be32toh)
Iovar u32 args (bsscfg:... prefix, etc.)
host → CYW43
LE
Scan result struct fields
CYW43 → host
LE
Ethernet frame payload
both
network order (BE for ethertype)
All on-wire reads/writes in the new driver must pass through the endianness-aware helpers in §3.11; no @ptrCast of a packed struct over a raw []u8 buffer (see §3.5).
2.4.3 Backplane window management (349–393)
cyw43_set_backplane_window(self, addr):
addr=addr& ~BACKPLANE_ADDR_MASK// top bits onlyifaddr==self->cur_backplane_window: returnforeachofHIGH/MID/LOWbyteswhosevaluediffers:
writethatbyteofSDIO_BACKPLANE_ADDRESS_{HIGH,MID,LOW}
self->cur_backplane_window=addr
Critical architectural invariant. The backplane window registers are write-only from SPI (AGENTS.md gotcha #17). The software cache (cur_backplane_window) is the only authoritative record. Every backplane access in the new driver must go through a single Backplane.setWindow(addr) call-site; no ad-hoc writes to HIGH/MID/LOW elsewhere. The plan enforces this as an invariant; ll/boot.zig and the firmware-upload loop must not bypass it.
cyw43_read_backplane / cyw43_write_backplane (366/381) call setWindow, mask off window bits, set the 4-byte-access flag (SBSDIO_SB_ACCESS_2_4B_FLAG = 0x8000) on the remaining address, do the access, then restore the window to CHIPCOMMON_BASE_ADDRESS as a known baseline.
2.4.4 Bus init + firmware upload (cyw43_ll_bus_init, 1424–1794)
Step-by-step sequence (SPI path only):
Call cyw43_spi_init and cyw43_spi_gpio_setup / _reset (port-provided).
Poll SPI_READ_TEST_REGISTER (addr 0x0014) for value 0xFEEDBEAD, using the pre-mode-switch byte-swapped read (read_reg_u32_swap), up to 10 × 1 ms.
Write SPI_BUS_CONTROL (addr 0x0000) with WORD_LENGTH_32 | ENDIAN_BIG | HIGH_SPEED_MODE | WAKE_UP | (4 << (8*SPI_RESPONSE_DELAY)) | (INTR_WITH_STATUS << (8*SPI_STATUS_ENABLE)) — this is a single 32-bit write that replaces four byte-sized fields atomically. Use write_reg_u32_swap because we’re still in pre-mode-switch byte order. After this write cyw43_spi_set_polarity(self, 0) is called (port-specific PIO polarity reset).
Set SPI_RESP_DELAY_F1 = CYW43_BACKPLANE_READ_PAD_LEN_BYTES (16 for SPI on Pico W).
Clear pending SPI_INTERRUPT_REGISTER bits.
Enable a specific set of interrupts: F2_F3_FIFO_RD_UNDERFLOW | F2_F3_FIFO_WR_OVERFLOW | COMMAND_ERROR | DATA_ERROR | F2_PACKET_AVAILABLE | F1_OVERFLOW.
Set ALP: write SDIO_CHIP_CLOCK_CSR with SBSDIO_ALP_AVAIL_REQ (0x08); poll for SBSDIO_ALP_AVAIL (0x40) with 10 × 1 ms. On SPI the ALP-force bits aren’t set (unlike SDIO).
Clear ALP request (write 0 to SDIO_CHIP_CLOCK_CSR).
Firmware sanity check (cyw43_check_valid_chipset_firmware, 395–418). Read last 800 bytes of fw blob, find the 16-byte DVID trailer, find the "Version: " string in the ~500 bytes before it. This is a sanity check, not cryptographic validation.
Firmware upload via cyw43_download_resource — 64-byte chunks to backplane addr 0, with CYW43_WRITE_BYTES_PAD(len) (4-byte align on SPI). AGENTS.md gotcha #15: must use 64-byte chunks; larger silently corrupts.Gotcha #16: payload words must be LE-packed.
NVRAM upload at (CYW43_RAM_SIZE - 4 - wifi_nvram_len) = 0x8_0000 - 4 - len, then write ((~(len/4) & 0xffff) << 16) | (len/4) to (CYW43_RAM_SIZE - 4). This is the "nvram header" the firmware checks at boot.
Poll SDIO_CHIP_CLOCK_CSR for SBSDIO_HT_AVAIL (0x80) — up to 1000 × 1 ms. Firmware-dependent ~29 ms.
Write SDIO_INT_HOST_MASK = I_HMB_SW_MASK (0xf0).
SPI: lower F2 watermark to 32 (SPI_F2_WATERMARK).
Poll SPI_STATUS_REGISTER for STATUS_F2_RX_READY (bit 5) — up to 1000 × 1 ms.
KSO setup (1713–1736). Write SDIO_WAKEUP_CTRL |= SBSDIO_WCTRL_WAKE_TILL_HT_AVAIL, set SDIOD_CCCR_BRCM_CARDCAP = CMD_NODEC, write SDIO_CHIP_CLOCK_CSR = SBSDIO_FORCE_HT (keep HT), set SDIO_SLEEP_CSR |= SBSDIO_SLPCSR_KEEP_SDIO_ON. Then write SDIO_PULL_UP = 0xf to put SPI interface block to sleep.
Clear pad pulls. Write SDIO_PULL_UP = 0, read back.
Clear residual DATA_UNAVAILABLE bit in SPI_INTERRUPT_REGISTER.
cyw43_ll_bus_sleep(false) — the first wake-before-access to transition bus_is_up to true.
CLM upload — call cyw43_clm_load (see §2.4.5).
Iovar writes: bus:txglom = 0, apsta = 1.
If mac provided, cyw43_write_iovar_n("cur_etheraddr", 6, mac, STA).
Returns 0 on success; ~400 ms typical.
Startup timing invariant (1886–1891). cyw43_ll_wifi_on enforces cyw43_hal_ticks_us() - self->startup_t0 >= 150000 (150 ms). Reference comments say missing this causes SDIOIT/OOB WL_HOST_WAKE IRQs to misbehave in bus-sleep mode. Preserve this delay; treat it as "early-bringup stability timing not fully characterised" and don’t drop below it.
2.4.5 CLM upload (cyw43_clm_load, 1351–1396)
CLM (Country Locale Module) is a ~8–9 KB blob appended to the firmware blob, loaded via the clmload iovar. Upload in 1024-byte chunks (1024+512 on SDIO). Each chunk is preceded by a 20-byte header:
offset size field
0 8 "clmload\x00"
8 2 flag (u16 LE)
10 2 type = 2 (u16 LE)
12 4 len (u32 LE)
16 4 CRC (always 0)
20 N chunk bytes
Flag bits: DLOAD_HANDLER_VER = 1<<12, DL_BEGIN = 2, DL_END = 4. First chunk sets DL_BEGIN, last chunk sets DL_END, all chunks set DLOAD_HANDLER_VER.
After upload, issue clmload_status as GET_VAR with a 19-byte buffer; first u32 of response should be 0 on success.
Host TX sequence number is wwd_sdpcm_packet_transmit_sequence_number, increments per packet.
Device publishes bus_data_credit in the header of every RX packet (byte 9).
Stall condition:wlan_flow_control != 0 OR last_bus_data_credit == tx_seq. In other words, credits are one-byte unsigned modular; when they catch up we stall.
Stall recovery: enter 1-second busy-wait loop. On SDIO, poke SDIO_TO_SB_MAILBOX with bit 3 every 100 ms to kick the device. On SPI this poke is a no-op — the only way credits come back is via RX packet decoding. Therefore the SPI stall-loop must:
repeatedly call cyw43_ll_sdpcm_poll_device to drain RX;
for each RX ASYNCEVENT, dispatch cyw43_cb_process_async_event (do not dispatch DATA — reentrancy hazard: sending another ethernet frame in response while in the middle of sending this one would corrupt the TX buffer);
timeout after 1 s, return -ETIMEDOUT.
Credit arithmetic is modulo 256 (credit = header->bus_data_credit - last_bus_data_credit; accept if credit <= 20, reject otherwise to tolerate out-of-order or stale credits). Preserve this in Zig using &% 0xFF.
Header packing for TX:
size = SDPCM_HEADER_LEN + payload.len
size_com = ~size & 0xffff
sequence = tx_seq++
channel_and_flags = kind (CONTROL or DATA)
next_length = 0
header_length = 12 + (DATA ? 2 : 0) // the 2 bytes are BDC-align padding for DATA
wireless_flow_control = 0
bus_data_credit = 0
reserved[2] = 0
Writes go to WLAN_FUNCTION (F2), addr 0, with CYW43_WRITE_BYTES_PAD(size) (4-byte align for SPI, 64-byte align for SDIO).
2.4.7 CDC (IOCTL) header
Header layout (struct ioctl_header_t, 719–724):
offset size field
0 4 cmd (WLC_*, LE)
4 4 len (lower 16: output len, upper 16: input len) LE
8 4 flags LE: [31:16] ioc_id, [15:12] interface, [2] SET/GET
12 4 status LE, 0 from host; device sets on response
Response matching.wwd_sdpcm_requested_ioctl_id is incremented per send; sdpcm_process_rx_packet extracts id = (ioctl_header->flags & 0xffff0000) >> 16 and matches against the last-sent id, dropping mismatches.
IOCTL dispatch (cyw43_do_ioctl, 1154–1185):
send_ioctl(kind, cmd, payload, iface)
start = now()
while (now() - start < CYW43_IOCTL_TIMEOUT_US /* 500 ms */):
ret = poll_device()
if CONTROL matching id:
copy response back to payload
return 0
elif ASYNCEVENT:
dispatch event (recursion-safe)
elif DATA:
dispatch ethernet RX (recursion-safe)
else: warn
return -ETIMEDOUT
Reentrancy hazard. A dispatched ASYNCEVENT can (indirectly) call back into the driver’s ioctl surface (e.g. PRUNE → pend_rejoin → poll_func → cyw43_ll_wifi_set_wpa_auth → another ioctl). The reference avoids this by queueing pend flags rather than calling directly. The new Zig driver must preserve this queue-not-call discipline — no event handler is allowed to call doIoctl synchronously.
The data_offset is firmware-variable; the decoder must honour it.
2.4.9 Async event pipeline
Wire framing. Events ride in ASYNCEVENT channel SDPCM packets. Payload after BDC is an ethernet-format frame with ethertype 0x886C (Broadcom custom). First 24 bytes of the ethernet payload are the Broadcom header:
offset 12..14 ethertype = 0x886C (BE)
offset 19..22 Broadcom OUI = 0x00_0010_18 (BE)
offset 24.. event_header_t:
[0] uint16 be version
[2] uint16 be flags
[4] uint32 be event_type
[8] uint32 be status
[12] uint32 be reason
[16] uint32 be auth_type
[20] uint32 be datalen
[24] uint8[6] src_addr
[30] uint16 be datalen2
... (more fields not used by reference)
Reference code (592–621, cyw43_ll_parse_async_event) does something subtle:
// buf = &spid_buf[46], alignment only 2 bytes.// Copy word-by-word 2 half-words into each u32 slot of buf[-2..]foriin ((len+3) >> 2) downto1:
*d++=s[0] | s[1] << 16s+=2// After relocation, buf[-2..] is word-aligned.ev=&buf[-2]
ev->flags=be16toh(ev->flags)
ev->event_type=be32toh(ev->event_type)
ev->status=be32toh(ev->status)
ev->reason=be32toh(ev->reason)
Do not port the relocation trick. The new Zig code must parse bytes directly using LE/BE helpers; no struct-cast over unaligned buffer. This is the correct path on Cortex-M0+ which does hard-fault on unaligned word access (unlike the reference’s apparent tolerance on host-test builds). See §3.5.
2.4.10 Scan result IE parsing (cyw43_ll_wifi_parse_scan_result, 538–590)
The escan result wraps a larger cyw43_scan_result_internal_t (503–528) at offset 48 from the event. After the fixed fields, IEs live at offset ie_offset with length ie_length. Walk them:
The bounds check is mandatory. Malformed IE lists (length fields that overrun ie_top) must not crash the decoder. In Zig this is a while with explicit if (ie_ptr + 2 + ie_len > ie_top) break;.
Set country (20-byte payload: "country\0" + country & 0xffff + rev + country & 0xffff). Reference uses specific rev override for CYW43_COUNTRY_WORLDWIDE on SDIO.
Set event mask. 19 bytes of 0xff, then clear specific bits (events 19, 20, 40, 44, 54, 71). Sent via bsscfg:event_msgs iovar with u32 bsscfgidx = 0 prefix. The wire format: "bsscfg:event_msgs\0" + <4-byte LE u32 bsscfgidx> + <19-byte mask>, for a total of 41 bytes. The 18 + 4 + 19 in the C source looks like an off-by-one but it’s correct: the name buffer is declared 18 bytes in spid_buf (including the NUL), 4 bytes for bsscfgidx, 19 for mask.
cyw43_delay_ms(50).
WLC_UP ioctl with no payload.
cyw43_delay_ms(50).
2.4.12 Join (cyw43_ll_wifi_join, 2051–2177)
Pre-conditions: wifi already on.
For WPA2-PSK (typical):
ampdu_ba_wsize = 8 (iovar)
WLC_SET_WSEC = auth_type & 0xff // 4 = AES
"bsscfg:sup_wpa" = {0, 1} // bsscfgidx=0, supplicant on
"bsscfg:sup_wpa2_eapver" = {0, -1} // EAP version: auto
"bsscfg:sup_wpa_tmo" = {0, 5000} // supplicant timeout 5s
// Set PMK (actually passphrase in PMK format):
WLC_SET_WSEC_PMK with 68-byte buf:
[0..2] LE u16: key_len
[2..4] LE u16: 1 (WSEC_PASSPHRASE flag)
[4..64] key bytes
// 2ms delay before this ioctl — firmware-required
WLC_SET_INFRA = 1
WLC_SET_AUTH = 0 (open; SAE would be 3 for WPA3)
"mfp" = 1 (MFP_CAPABLE for WPA2/WPA3; MFP_NONE for WPA1/open)
WLC_SET_WPA_AUTH = 0x80 (WPA2_AUTH_PSK)
// Cache ssid for rejoin:
last_ssid_joined[0..4] = LE u32 ssid_len
last_ssid_joined[4..] = ssid bytes
if bssid specified:
use "join" iovar with 70-byte payload including chanspec for channel
else:
WLC_SET_SSID = 36-byte payload from last_ssid_joined
For WPA3-SAE-PSK: use "sae_password" iovar (130 bytes) instead of WLC_SET_WSEC_PMK. Set WLC_SET_AUTH = 3 (AUTH_TYPE_SAE).
For WPA1 fallback (triggered by PRUNE reason=8): cyw43_ll_wifi_set_wpa_auth just writes WLC_SET_WPA_AUTH = 4 (WPA_AUTH_PSK).
No PMKID caching in the reference. The reference driver does not implement PMKSA caching. Every reconnect goes through the full 4-way handshake, which is also why a stale-PMK scenario (AP power-cycle while firmware still has our old PMKID) causes repeated handshake failures in practice. The new driver mandates PMKSA management as a Phase 3 deliverable (§5.9) — clear_on_boot as the default, cache_in_boot as a stretch. See §5.9 and §9 risk R6.
if not had_successful_packet:
if host_interrupt_pin != active: return -1
cyw43_ll_bus_sleep(false)
if not had_successful_packet:
spi_int = read_u16(SPI_INTERRUPT_REGISTER)
if spi_int != last_spi_int:
if spi_int & BUS_OVERFLOW_UNDERFLOW: warn; stat++
// (optional CYW43_CLEAR_SDIO_INT block)
if spi_int: write_u16(SPI_INTERRUPT_REGISTER, spi_int) // clear
last_spi_int = spi_int
if not (spi_int & F2_PACKET_AVAILABLE): return -1
// Read bus status, retry up to 1000x on 0xFFFFFFFF (bus not ready)
bus_gspi_status = read_u32(SPI_STATUS_REGISTER) (retry loop)
if bus_gspi_status & GSPI_PACKET_AVAILABLE:
bytes_pending = (bus_gspi_status >> 9) & 0x7FF
if invalid (0, oversize, underflow):
write_u8(SPI_FRAME_CONTROL, 1) // reset frame state
had_successful_packet = false
return -1
else: return -1
read_bytes(WLAN_FUNCTION, 0, bytes_pending, spid_buf)
// First 4 bytes are hdr[0]=size, hdr[1]=size_com with XOR check
check hdr[0] ^ hdr[1] == 0xffff
return sdpcm_process_rx_packet(spid_buf, ...)
2.4.14 cyw43_ll_process_packets (1126–1150)
Drains RX until poll_device returns no-packet, dispatching events/ethernet per-packet. This is the main loop call-path invoked by cyw43_poll_func whenever cyw43_ll_has_work() is true.
2.4.15 KSO / bus sleep (1248–1343)
KSO mode is the "keep SDIO on" protocol. Required on SDIO for bus-sleep integration with chip clock gating; on SPI it has weaker applicability but is still used in the reference.
For the new Zig driver, the plan is conservative: preserve wake-before-access semantics on every TX ioctl and RX poll, without literally tying it to the KSO register sequence. If a reduced sleep protocol (e.g. just the SPI WAKE_UP bit toggle) works on Pico W, that’s an acceptable implementation of the contract. But the contract is "the device must be awake before F2 access" — and the burden of proof is on "it works without KSO" (observe stability under bus-sleep workloads in §7.3).
cyw43_spi.h: SPI register addresses (0x0000–0x001f for function-0 regs), SPI_STATUS bit layout, SPI interrupt bit layout. Most of this is already mirrored in our src/cyw43/regs.zig — the new tree will consolidate it into src/cyw43_new/bus/regs.zig.
On the byte-swap accessors.read_reg_u32_swap / write_reg_u32_swap (cyw43_spi.c:62, 76) are used exactly twice in the init path: once to read the test register before mode switch, and once to write the mode-switch value itself. After that, all access uses the plain cyw43_read_reg_u32. The new Zig code should name these readReg32Swapped / writeReg32Swapped and restrict them to the boot path. Everywhere else uses plain LE.
2.6 cyw43_config.h (223 lines — tunables)
Central port-integration header. Defines defaults for every tunable; the port overrides via CYW43_CONFIG_FILE / cyw43_configport.h. Key tunables to preserve (by name) in our Zig Config struct (§5.1):
C macro
Default
Zig Config field
Use
CYW43_USE_SPI
0
hard-coded true (pico W only)
transport selection
CYW43_IOCTL_TIMEOUT_US
500000
ioctl_timeout_us: u32 = 500_000
ioctl wait
CYW43_SLEEP_MAX
50
sleep_max_ticks: u32 = 50
bus-sleep countdown
CYW43_RESOURCE_VERIFY_DOWNLOAD
0
verify_firmware: bool = false
debug
CYW43_BACKPLANE_READ_PAD_LEN_BYTES
16 (SPI)
compile-time constant
read pad size
CYW43_BUS_MAX_BLOCK_SIZE
64 (SPI)
compile-time constant
fw upload chunk
CYW43_USE_OTP_MAC
0
use_otp_mac: bool = false
MAC source
CYW43_GPIO
0
enable_gpio: bool = true
LED iovar on
Logging/debug macros (CYW43_DEBUG, CYW43_VDEBUG, CYW43_PRINTF, CYW43_WARN) in the C reference map in the Zig port to functions on the Config.logger interface — see §5.2.
2.7 cyw43_country.h
117 lines, trivial — defines CYW43_COUNTRY(A, B, REV) as a 3-byte packed value and enumerates common country codes. Port verbatim into src/cyw43_new/ctrl/country.zig.
2.8 Summary of reliability/recovery gaps in our current Zig driver
The audit surfaces the following behaviors present in the reference (or in Embassy/soypat/brcmfmac for events the C reference doesn't handle) but absent or defective in src/cyw43/:
#
Gap
Source
Severity
G1
Join state machine (three-flag model per §6.1; originates in bit-flag form at cyw43_ctrl.c:56-65, 383-438)
Legend for source column: rows G1–G16 originate in the pico-sdk C reference and are the core port. Rows G17–G28 are additions sourced from cross-referencing Embassy Rust + soypat Go + Linux brcmfmac (details in §10.4.1). G17 and G18 are the two most impactful reliability fixes; G26 is a real bug in our earlier draft caught by reading Embassy.
Section 3 — Impedance-mismatch catalog
Each category: C idiom → Zig 0.16 idiom. Every concrete example is a pattern to replicate across the rewrite. References to ZIG-0.16.0-REFERENCE.md are by section heading, not page.
Rule: never carry a [*]u8 across function boundaries. If you need a subrange of a buffer, pass a []u8 slice; the compiler preserves length; indexing is bounds-checked in Debug. For multi-hop functions where the inner function needs to know the offset origin, pass (buf: []u8, origin: usize) and let the inner compute buf[origin..].
The *anyopaque context pointer is typed to the opaque "environment" owning the callback. Callees do const typed_ctx: *MyStuff = @ptrCast(@alignCast(ctx));.
For simple single-method hooks (e.g. Config.logger):
Avoid *const fn (...) anyerror!void in hot paths — error union type inference across boundaries is a compilation-time cost; prefer explicit error sets (WifiError).
3.4 enum with explicit wire values
C:
#defineCYW43_EV_ESCAN_RESULT (69)
Zig:
pubconstEventKind=enum(u16) {
set_ssid=0,
join=1,
auth=3,
deauth=5,
deauth_ind=6,
assoc=7,
disassoc=11,
disassoc_ind=12,
link=16,
prune=23,
psk_sup=46,
icv_error=49,
escan_result=69,
csa_complete_ind=80,
assoc_req_ie=87,
assoc_resp_ie=88,
_, // non-exhaustive — other values are decoded as Event.unknown
};
The _ trailing discriminant makes the enum non-exhaustive — indispensable for event types where firmware may emit values we don’t name.
3.5 Bit-packed registers / wire structs
DO NOT use packed struct over raw SPI RX buffers. Two reasons:
Alignment. ARM Cortex-M0+ hard-faults on unaligned u16/u32 loads. The CYW43 RX buffer frequently delivers event payloads at odd offsets (e.g. spid_buf + 46). @ptrCast([*]u8, buf) + @ptrCast(*EventHeader, ...) + field access = potential fault.
Endianness. Many fields are big-endian on the wire (event headers). packed struct in Zig assumes host endianness; you’d need separate Be mirrors and manual conversion, which negates the benefit.
std.mem.readInt(T, buf[n..m], .big | .little) is the Zig 0.16 idiom. It takes a compile-time-known slice length and emits efficient code with no unaligned access.
packed struct is acceptable for host-side register composition (e.g. SPI command word) where the value starts as a u32 and the bit layout helps readability. But even then prefer bit-shift composition — it’s clearer and is exactly what the reference uses.
3.6 void * opaque state → generic parameter or concrete driver struct
C:
void*cb_data;
int (*cyw43_cb_process_async_event)(void*cb_data, constcyw43_async_event_t*);
A single Driver struct replaces the opaque cyw43_t; callbacks become methods on that struct.
3.7 printf-style logging → project logger
Freestanding build forbids std.debug.print. The driver gets a Logger interface in Config (§5.2). Minimum operations: puts([]const u8), putHex32(u32), putDec(u32), putBytes([]const u8). The log call-sites in src/cyw43/ already use this pattern via fmt.puts etc.; preserve the convention.
No formatted output (std.fmt.bufPrint) in the driver — call the primitives directly. Format strings drag in the full formatter and explode code size.
3.8 Global mutable state → file-scope var on the driver struct
Zig: the driver is an instance type. For the pico integration which has exactly one CYW43 chip, the binding site (src/bindings/wifi.zig) owns a single var driver: Driver = undefined; and passes &driver into every call. The driver itself does not declare a module-scope singleton.
Rationale: easier testing, cleaner dependency graph, no implicit ordering between the driver and the integrator’s init.
Compile-time behavior (e.g. whether to compile verification code) is a comptime field accessed via if (config.verify_firmware) { ... } inside the driver functions. With config stored in the driver struct, the runtime check is a branch on a bool field; for zero-cost dispatch the field can be promoted to a comptime parameter of a generic wrapper — but that’s an optimisation, not required.
3.10 Volatile access + memory barriers
SPI register reads return device-observable state; the PIO peripheral MMIO backing them is already volatile in src/platform/hal.zig. At the driver layer we do not need @as(*volatile T, ...) — that’s a transport-layer concern. The PIO SPI transport in src/cyw43_new/transport/pio_spi.zig is responsible for volatile access; the driver-layer code treats the returned value as a plain u32.
No memory barriers required on M0+ single-core; existing HAL regWrite is sufficient.
pubconstWifiError=error{
// I/O / busSpiTestFailed,
BusInitTimeout,
BackplaneUnreachable,
HtClockTimeout,
F2NotReady,
FirmwareSanityFailed,
ClmLoadFailed,
// IOCTLIoctlTimeout,
IoctlResponseMismatch,
IoctlKindInvalid,
// SDPCMSdpcmCreditStall,
SdpcmFrameTooLarge,
SdpcmFrameMalformed,
// WiFi controlNotInitialized,
NotStaActive,
InvalidSsidLen,
InvalidKeyLen,
UnsupportedAuthType,
PmksaNotCleared, // join() called before PMKSA clear completed (§5.9)PmksaIovarUnsupported, // blob vintage does not support the iovar; see §5.9.4// JoinJoinTimeout,
JoinBadAuth,
JoinNoNetwork,
JoinFail,
// ScanScanInProgress,
ScanNotStarted,
ScanResultTruncated,
// EthernetEthFrameTooLarge,
};
Error union return: fn join(self: *Driver, cfg: JoinConfig) WifiError!void. Call-sites use try.
3.12 Byte-order conversions
std.mem.readInt(T, slice, .big | .little) and std.mem.writeInt(T, slice, value, .big | .little) are the 0.16 idioms. For the occasional inline nibble swap (pre-mode-switch gSPI command words), write a small helper fn writeSwapped32(dst: *[4]u8, val: u32) void that produces {b1, b0, b3, b2}.
Section 4 — Proposed Zig module architecture
Parallel tree at src/cyw43_new/. Existing src/cyw43/ is frozen during the rewrite (§8.1) — no changes except the minimum needed to keep the -Dcyw43=old build green.
Total estimated size: ~2,800–3,500 Zig lines across 18 files.
src/cyw43_new/
├── cyw43.zig (~120 LOC — public API re-export; compat façade)
├── types.zig (~180 LOC — public enums, Event, ScanResult, LinkState)
├── config.zig (~120 LOC — Config struct, Logger, HostHooks interfaces)
├── errors.zig (~60 LOC — WifiError union, Result helpers)
├── firmware.zig (~50 LOC — @embedFile split; blob split helpers)
├── UPSTREAM.md — C-driver SHA tracking + M1..Mn local mod log
├── LICENSE-REFERENCE.md — LICENSE.RP copy + attribution notes
│
├── transport/
│ ├── spi.zig (~80 LOC — Spi vtable interface)
│ └── pio_spi.zig (~420 LOC — RP2040 PIO implementation; replaces src/cyw43/transport/pio_spi.zig)
│
├── bus/
│ ├── regs.zig (~220 LOC — SPI/backplane/SDPCM/CDC/event register & field defs)
│ ├── cmd.zig (~40 LOC — gSPI command word packing + byte-swap helpers)
│ ├── bus.zig (~170 LOC — u8/u16/u32 reg access, readBytes/writeBytes with pad handling)
│ └── backplane.zig (~120 LOC — window cache, bpRead/bpWrite, bpReadBlock/bpWriteBlock)
│
├── ll/
│ ├── frame.zig (~260 LOC — SDPCM + CDC + BDC headers pack/parse; credit arithmetic)
│ ├── ioctl.zig (~200 LOC — doIoctl, setIovar/getIovar, iovar_u32, bsscfg helpers)
│ ├── events.zig (~320 LOC — exhaustive event decoder; Event tagged union; name table)
│ ├── scan.zig (~200 LOC — escan request + IE walker + result dispatch)
│ ├── boot.zig (~420 LOC — cyw43_ll_bus_init equivalent: SPI setup, fw/nvram upload, core reset, ALP/HT wait)
│ ├── clm.zig (~80 LOC — CLM chunked upload)
│ └── power.zig (~140 LOC — bus sleep/wake; KSO implementation hidden behind "wake-before-access" contract)
│
├── ctrl/
│ ├── state.zig (~100 LOC — Driver struct, core mutable state, pend flags)
│ ├── poll.zig (~140 LOC — poll loop: drain RX, process pend_disassoc/rejoin_wpa/rejoin)
│ ├── join.zig (~280 LOC — wifi_join + state machine + last_ssid cache + timeout)
│ ├── link.zig (~100 LOC — link up/down transitions, integrator callback dispatch)
│ └── country.zig (~130 LOC — country code table, direct port of cyw43_country.h)
│
└── hal.zig (~100 LOC — HostHooks interface: readIrqPin, ensureAwake, delayMs, ticksUs)
4.1 Per-file responsibility notes
cyw43.zig — pub const Driver, pub const init, pub const deinit. Re-exports common types. Defines the compat façade: pub fn joinWpa2Compat(...) wrapping the new instance API to match the old module-level API shape. Kept intentionally thin.
types.zig — public-facing. Event, EventKind, ScanResult, LinkState, Country, AuthType, PmValue, ItfId. Internal wire structs do not live here — they live next to their encoders/decoders (e.g. SDPCM header in ll/frame.zig).
config.zig — Config, Logger, HostHooks, Transport. The public "how to wire up the driver" surface.
errors.zig — WifiError. Isolated so other modules can @import("errors.zig").WifiError.
firmware.zig — @embedFiles of 43439A0_combined.bin and 43439A0_nvram.bin. Splits the combined blob into firmware-prefix + CLM-suffix with a known-at-compile-time offset.
transport/spi.zig — interface: transferRx, transferTx, setPolarity, reset. Per §3.3 uses the ctx: *anyopaque + vtable pattern.
transport/pio_spi.zig — the one concrete implementation we ship. Directly ports the existing src/cyw43/transport/pio_spi.zig with cleanups (that file is 406 LOC and sound; ~80% line-for-line preservation).
bus/regs.zig — all on-wire constants. Replaces src/cyw43/regs.zig and expands coverage (event-type enum, CDC flag shifts, SDPCM constants).
bus/cmd.zig — packCmd(write, incr, fn, addr, sz) u32, writeSwapped32, readSwapped32. Isolated so the swap pattern has a single owner.
bus/backplane.zig — Backplane struct with cur_window: u32 cache. Methods: setWindow, read32, write32, readBlock, writeBlock. Single owner of window state per §2.4.3 invariant.
ll/ioctl.zig — doIoctl(driver, kind, cmd, iface, payload) WifiError!usize, plus convenience wrappers: setIoctlU32, setIovar, setIovarU32, setBsscfgIovarU32, readIovarU32. Implements the 500 ms timeout + RX drain loop.
ll/events.zig — decodeEvent(bytes) Event. Includes the full event-name table (all 89 slots in the reference + any additional observed names; unknowns return Event.unknown{ raw_type, status, reason, len }). Event is the tagged union in §3.2.
ll/power.zig — ensureAwake(driver), allowSleep(driver). Implements the KSO sequence internally but the public contract is "wake before access, allow sleep when idle". Gives us freedom to drop KSO if SPI-only behavior proves sufficient (§2.4.15).
ctrl/poll.zig — pollOnce(driver). The port of cyw43_poll_func. Drives pend-action processing and RX drain.
ctrl/join.zig — joinWpa2(driver, JoinConfig), rejoin(driver), plus the state-machine transitions driven by events. The events module dispatches into functions here; callers never interact with raw states.
ctrl/link.zig — thin module. Derived link-status projection + callback emission only. No independent association policy lives here; all state transitions are owned by ctrl/join.zig + ctrl/state.zig. link.zig translates the current join_state + itf_state + IP-readiness flag into the LinkState enum consumed by HostHooks.onLinkUp / onLinkDown. If a future contributor is tempted to put an "if link degraded and no event in N seconds, do X" policy here, that policy belongs in ctrl/join.zig instead.
ctrl/country.zig — country code table. Port of cyw43_country.h as a Zig Country enum(u24) with a helper toWire(self) u32.
hal.zig — the HostHooks interface: readIrqPin(), ensureAwake(), delayMs(n), ticksUs(), ticksMs().
4.2 Comparison against current src/cyw43/ structure
Current path
Target path
Disposition
cyw43.zig (69 LOC)
cyw43.zig + types.zig
Rewritten. Public API expands.
device.zig (375 LOC)
Split across ll/* and ctrl/state.zig
Deleted. Responsibilities cleanly factored.
regs.zig (182 LOC)
bus/regs.zig (expanded to ~220 LOC)
Ported and extended.
types.zig (23 LOC)
types.zig + errors.zig
Ported, expanded, split.
board.zig (65 LOC)
Keep in pico integration layer. Not a driver concern.
Moves to src/bindings/wifi.zig or src/platform/boards.zig.
transport/bus.zig (162 LOC)
bus/bus.zig
Ported with cleanups.
transport/pio_spi.zig (406 LOC)
transport/pio_spi.zig
Ported ~80% as-is.
control/boot.zig (307 LOC)
ll/boot.zig
Rewritten to match C reference sequence exactly.
control/ioctl.zig (219 LOC)
ll/frame.zig + ll/ioctl.zig
Split. Credit arithmetic and frame shape cleanly separated.
control/join.zig (72 LOC)
ctrl/join.zig (~280 LOC)
Expanded ~4×. Adds state machine + retry + last_ssid cache.
control/scan.zig (138 LOC)
ll/scan.zig
Ported + IE walker added.
control/gpio.zig (24 LOC)
Merged into ctrl/state.zig or a small ctrl/gpio.zig
No cycles. bus/* does not depend on ll/*; ll/* does not depend on ctrl/* (events dispatch up via callback). This is the dependency-inversion the current tree gets mostly right; the rewrite tightens it.
Section 5 — Public API design
The public surface exposed to src/bindings/wifi.zig, src/net/*.zig, and any other pico code. Designed from first principles; §5.10 gives the compat façade preserved for migration.
5.1 Config struct
pubconstConfig=struct {
country: Country=.worldwide,
default_pm: u32=PmValue.performance,
ioctl_timeout_us: u32=500_000,
sleep_max_ticks: u32=50,
verify_firmware: bool=false, // verify blob upload after write; adds ~40 ms to bootuse_otp_mac: bool=false, // if true, use MAC from OTP instead of caller-suppliedenable_gpio: bool=true, // enable CYW43 GPIO iovar (required for LED on Pico W)wpa3_mode: Wpa3Mode=.auto, // WPA3-SAE policy; see §5.10 for fallback semanticsauto_reconnect: bool=true, // use join state machine's pend_rejoin on transient failurespmksa_policy: PmksaPolicy=.clear_on_boot, // mandatory Phase 3 deliverable — see §5.9log: Logger,
hooks: HostHooks,
transport: *anyopaque, // Spi vtable pointertransport_vt: *constSpi.VTable,
};
pubconstHostHooks=struct {
ctx: *anyopaque,
vt: *constVTable,
pubconstVTable=struct {
readIrqPin: *constfn (ctx: *anyopaque) bool, // returns "active"delayMs: *constfn (ctx: *anyopaque, ms: u32) void,
delayUs: *constfn (ctx: *anyopaque, us: u32) void,
ticksMs: *constfn (ctx: *anyopaque) u32,
ticksUs: *constfn (ctx: *anyopaque) u32,
onLinkUp: *constfn (ctx: *anyopaque, itf: ItfId) void,
onLinkDown: *constfn (ctx: *anyopaque, itf: ItfId) void,
onEthernetRx: *constfn (ctx: *anyopaque, itf: ItfId, frame: []constu8) void,
onEvent: ?*constfn (ctx: *anyopaque, event: *constEvent) void=null,
/// Called during long-running driver operations (firmware upload,/// CLM load, credit-stall-wait, ioctl timeout loop). Replaces the/// reference C driver's CYW43_EVENT_POLL_HOOK macro (cyw43_ll.c:434)./// The host must feed watchdog, run its own cooperative scheduler,/// and return quickly — target budget ≤ 1 ms per call./// Called from driver context; must not call back into the driver.yieldDuringLongOp: ?*constfn (ctx: *anyopaque) void=null,
};
};
The onEvent hook is optional — if set, the driver dispatches every decoded event to the host. The pico integration sets it so DEAUTH-triggered reconnect logic can live in the pico layer (as opposed to only inside the driver); see §5.8.
yieldDuringLongOp is also optional but strongly recommended for embedded integrators. Call sites inside the driver:
ll/boot.zig::busInit — once per 64-byte firmware chunk (~3,500 calls over ~400 ms).
ll/clm.zig::clmLoad — once per 1024-byte CLM chunk (~10 calls).
ll/frame.zig::waitForCredit — every poll iteration during credit stall.
ll/ioctl.zig::doIoctl — every 1 ms wait iteration.
Pico integration implements this as watchdog.feed(); scheduler.poll(); led.poll(); — same set of work as the superloop's main iteration per docs/NANORUBY.md §A4. Without this hook, firmware upload can starve the watchdog on a Pico W configured for aggressive (<1 s) watchdog timeout.
Non-reentrancy rules (MANDATORY — integrators will get this wrong otherwise):
Must not call back into any CYW43 driver API. No driver.sendEthernet(), no driver.doIoctl(), no driver.pollOnce(), no driver.getEventLog(). The driver is in a partial-state inside its long op; reentry corrupts state.
Must not mutate driver-owned buffers or state. The integrator's ctx is the integrator's state; touching it is fine. Touching anything reachable through the driver instance is not.
Allowed operations only: feed watchdog, poll a non-driver scheduler, update a GPIO (LED), service a non-driver IRQ handler's posted-work queue, increment host-side counters.
Must return quickly. Budget ≤ 1 ms per call. Not a place for network polling or flash I/O.
Idempotent on spurious call. The driver may call this more often than strictly necessary (e.g. once per chunk even if some chunks are fast). Integrator must tolerate 3,500+ calls over 400 ms of firmware upload.
Violating rule 1 is the most likely integration bug. A reviewer checking new integrator code should grep yieldDuringLongOp implementations for any symbol that starts with cyw43. / driver. and reject immediately.
join is synchronous: it polls internal state until join_state == joined or timeout. During the polling, it drives hooks.delayMs(1) and internal RX drain. This is fine for pico’s cooperative model.
5.7 State queries
pubfnlinkStatus(self: *constDriver) LinkState; // LinkState enum belowpubfnrssi(self: *Driver) WifiError!i32;
pubfnmacAddr(self: *constDriver) [6]u8;
pubfnbssid(self: *Driver) WifiError![6]u8;
pubfncurrentAuth(self: *constDriver) ?AuthType;
pubconstLinkState=enum {
down,
associating,
associated_no_ip, // up at L2; integrator knows about L3up, // integrator has called markIpReadydegraded, // beacon loss / signal warning but still associatedreconnecting,
fail_badauth,
fail_nonet,
fail_general,
};
5.8 Event subscription
The HostHooks.onEvent callback fires for every decoded event (not just those the driver handles internally). Pico uses this for:
Externalizing DEAUTH-triggered reconnect policy beyond what auto_reconnect implements.
Surfacing RSSI warnings and roam events to user-facing JS/Ruby.
The hook runs in poll context, not IRQ — safe to allocate, log, etc.
5.9 PMKSA cache (mandatory Phase 3 deliverable; improvement over reference)
The reference C driver does not materially implement host-side PMKSA cache management. Neither does our current Zig driver. Research (see docs/CYW43-PMKSA-RESEARCH.md) establishes both the mechanism and the scope of what PMKSA addresses.
PMKSA handling is a hard Phase 3 deliverable. Cutover (Phase 4) does not flip the default build until clear_on_boot is landed and verified on hardware (§7.3 H2).
Framing correction from research. The most-reported Pico W reliability issue (pico-sdk #2153) is not a PMKSA-cache problem. It is an ICV_ERROR (event 49) flood triggered by missed rekey exchanges under power-save mode. The fix is a 3-line addition to the event handler (queue pend_rejoin on event 49) — exactly the pattern our join state machine (§6.1) already commits to. This is the highest-value reliability fix in the rewrite, and it is orthogonal to PMKSA. The rewrite gets both.
PMKSA clear_on_boot addresses an independent failure mode: when the CYW43 firmware retains a PMKSA cache entry across a host watchdog-reset (chip stays powered, only RP2040 restarts) and then the AP has since forgotten the PMKSA (AP reboot, session timeout, config change), the chip attempts fast-reauth with stale state and the AP refuses. Clearing the firmware cache at every wifiOn forces the first join to be a clean 4-way handshake, side-stepping this drift.
Together, both mechanisms cover the reliability surface:
Fix
Cost
Addresses
Evidence
Exhaustive event decoder + event 49 → pend_rejoin
3 lines in event dispatcher
pico-sdk #2153 (ICV_ERROR flood under PM)
cyw43-driver PR #130 (Jan 2025)
PMKSA clear_on_boot
~20 LOC, one 356-byte iovar at wifiOn end
AP-side PMKSA drift across host reboot (802.11-fundamental)
Inferred from 802.11 spec + brcmfmac behavior
Both ship in Phase 3. See §5.9.1 for PMKSA API surface; §6.1 for the event handler (which includes event 49 per the state-machine success criteria).
5.9.1 API surface
pubconstPmksaPolicy=enum {
// Active management modes — one of these must be chosen for Phase 4 cutover.clear_on_boot, // default. Wipe firmware PMKID cache at every wifiOn so the// first join is a clean 4-way handshake, preventing// firmware-vs-AP state drift after router power-cycle.cache_in_boot, // clear_on_boot + host maintains a (BSSID → PMKID) cache for// the remainder of this boot. Enables fast reauth (skip 4-way// handshake on reconnect to same AP within a boot session).// Evicts on DEAUTH/DISASSOC per §5.9.3.// Escape hatch only — present so a debug build can A/B against reference// behavior. Not for production.disabled, // no explicit PMKSA management; rely on firmware defaults.// Behavior matches the reference C driver; expect stale-PMK// failures after AP power-cycle.
};
// in Config:pmksa_policy: PmksaPolicy=.clear_on_boot,
// Additional config for cache_in_boot mode:pmksa_cache_capacity: usize=4, // number of (BSSID, PMKID) entries retained
Default is clear_on_boot. cache_in_boot is a Phase 3 stretch goal (ship if budget allows; defer to Phase B if not) — but clear_on_boot is non-negotiable.
5.9.2 Iovar research (COMPLETED — see docs/CYW43-PMKSA-RESEARCH.md)
The pre-Phase-3 research mandated here has been completed in this planning session and is captured at docs/CYW43-PMKSA-RESEARCH.md. Summary of verified findings (Phase 3 coding may proceed from these facts without further research):
Iovar name:pmkid_info (plain, NOT bsscfg:-prefixed). Confirmed from Linux kernel brcmfmac/cfg80211.c at function brcmf_update_pmklist.
Chip compatibility: CYW43439 is explicitly handled by brcmfmac as CY_CC_43439_CHIP_ID alongside legacy-family siblings BCM43430 / BCM4345 / BCM43454 (confirmed from brcmfmac/feature.c).
API version: our blob is firmware 7.95.61 from 2023-01-11 (verified via strings src/cyw43/firmware/43439A0_combined.bin). Firmware 7.x pre-dates WLC version 12.0 — so we use the legacy API. V2 is never implemented in brcmfmac; V3 requires WLC ≥ 13.0. The 43439A0_combined.bin filename is a label; the inside is 7.95.61 (identical to Embassy's bundled firmware). Earlier drafts of this plan incorrectly said 7.95.49.00 — that was the pico-sdk reference's bundled version at our dd7568 audit SHA, not what our repo actually ships. Corrected 2026-04-20.
MAXPMKID = 16, PMKID_LEN = 16. 802.11 standard values.
Flush operation: zero the whole 356-byte buffer and send it. npmk = 0 means "clear all entries."
Endianness: only npmk is LE-encoded; everything else is byte-arrays.
License boundary: brcmfmac is GPL-2.0. The research document reads brcmfmac behavior as protocol evidence (not copied code). Zig implementation written from the research spec without referencing brcmfmac source directly. §10 rules apply.
Commit docs/CYW43-PMKSA-RESEARCH.md is the Phase 3 P3a pre-work artifact. §11.3 go/no-go checkbox "pre-work artifact present" is satisfied by its existence in the repo.
5.9.3 Runtime behavior spec
clear_on_boot (the default, mandatory):
Lifecycle point (precise). The clear runs in this exact position within wifiOn():
wifiOn(country) flow:
1. cyw43_ll_wifi_on prerequisites (country, antenna, iovars, 150ms gate)
2. Event mask set via bsscfg:event_msgs ──── events can now arrive
3. WLC_UP ioctl ──── interface is up
4. cyw43_delay_ms(50) ──── reference's post-UP settle
5. >>> PMKSA clear_on_boot here <<< ──── our insertion point
6. Driver state transitions to wifi_up_pmksa_cleared
7. wifiOn() returns
Too early (before WLC_UP): firmware may not accept the iovar before the interface is up; response status 0xffffffe2 (NOTASSOCIATED) observable.
Too late (after first join attempt): the whole point is to ensure the first join sees an empty cache. If the clear lands mid-join or post-join, it's worse than useless — it discards the PMKID we just established.
Our chosen point: after WLC_UP + 50ms settle (firmware is fully up and accepting iovars) but before wifiOn() returns (so join() cannot possibly be called yet — caller doesn't even have control).
join() checks state == wifi_up_pmksa_cleared and errors with WifiError.PmksaNotCleared if the clear has not completed — this prevents a caller from racing wifiOn() and join() in a way that would allow stale PMK reuse on the first association.
Issue the PMKSA-clear iovar (zero-length list, per research doc).
Failure policy on a supposedly-supported blob. A non-zero response from the iovar call is a Phase 3 validation failure — it must not be silently tolerated during the P3b hardware-verification gate. For runtime after verification has passed, an unexpected non-zero response logs a warning and continues (degraded reliability rather than refusing to boot — devices need to remain debuggable). If the failure is on a blob that research already flagged as unsupported, fall through to §5.9.4 fallback; do not silently continue.
No per-join action required in this mode; firmware starts each join with an empty PMKID cache.
cache_in_boot (stretch):
On every successful joined transition (§6.1), compute or request the PMKID for the current BSSID and add (bssid, pmkid) to the host-side cache.
PMKID is produced by the firmware as part of the 4-way handshake. The iovar to read the current-session PMKID is (per brcmfmac) pmkid_info with GET direction, returning the whole cache list.
On DEAUTH_IND / DISASSOC_IND / EV_ICV_ERROR for the current BSSID: evict that BSSID's entry from the host cache and from firmware (via del_pmksa iovar, or set-list with that entry removed). Eviction is the critical correctness step — without it, the cache causes the reliability issue it was supposed to solve.
Cache is bounded (pmksa_cache_capacity, default 4). Eviction policy on overflow: LRU (last-added is always kept; oldest is dropped).
Cache is not persisted across reboots in V1. Persistent PMKSA requires flash-write support and bindings/storage.zig flash-write isn't implemented yet (ISSUES.md open item #3). Mark as future work in UPSTREAM.md.
disabled (debug only):
No iovars sent. Behavior identical to cyw43-driver. Used only for A/B debugging against the reference.
5.9.4 Fallback if the iovar is absent on our blob vintage
Pre-Phase-3 research might reveal that our 2023-era 7.95.61 blob lacks the pmkid_info iovar entirely. (Extremely unlikely — 7.95.61 is recent, the chip family has had pmkid_info for years per brcmfmac, and Embassy's shipped firmware is identical to ours.) The Phase 3 task must still test this on real hardware (issue the iovar, check response status; ~30 minutes of work). Three outcomes:
Iovar works as specified — land §5.9.3 as designed. (Expected outcome.)
Iovar returns firmware error — upgrade the blob vintage. soypat ships 7.95.62 (Apr 2023), which is the newest public vintage for this chip family. This triggers risk R17 (re-run golden traces + hardware matrix against the new blob) but unblocks PMKSA. Document the vintage upgrade in UPSTREAM.md.
Blob upgrade is infeasible — in this specific failure mode only, the plan may fall back to an alternate primitive that must itself be documented in brcmfmac or WHD as clearing the same supplicant-side cache state the missing iovar would have cleared. The research document must name the alternate iovar/command and cite the source evidence before Phase 3 accepts this fallback path. Ad-hoc sequences invented locally (e.g. "maybe toggling bsscfg:sup_wpa flushes the cache") are not acceptable as an outcome-(3) implementation — they re-introduce the exact "invent-undocumented-behavior" risk that the pre-work research was meant to eliminate. If no alternate primitive exists in either reference, the correct answer is outcome (2): upgrade the blob.
cache_in_boot is deferred to Phase B in the outcome-(3) scenario.
Outcome (1) is expected. Outcome (2) is acceptable but triggers R17. Outcome (3) exists only as a documented escape hatch backed by external evidence; the researcher must not preemptively assume it.
5.9.5 cache_in_boot synchronization via PMKID_CACHE event
Original §5.9.3 design polled firmware via pmkid_info GET after each join to refresh the host-side cache. After examining Embassy and soypat event enums, a cleaner design uses the firmware-originated PMKID_CACHE event (type 21) as the sync trigger:
// ctrl/pmksa.zig (sketch):// Firmware emits EV_PMKID_CACHE whenever its internal PMKSA state changes// (new cache entry after 4-way handshake, entry expired, etc.). We mirror// to host-side cache reactively rather than polling.fnonPmkidCacheEvent(self: *Driver, ev: *constEvent) void {
// Read firmware's current cache via pmkid_info GET.varbuf: [356]u8=undefined;
self.ioctl.getIovar("pmkid_info", &buf) catchreturn;
self.pmksa_cache.syncFromFirmware(&buf);
self.stats.pmksa_cache_syncs+=1;
}
Benefits over polling:
No redundant GETs after non-state-changing events.
Cache is exactly consistent with firmware state (no race window).
Observable in logs: every PMKSA cache change is traceable.
Still ships DEAUTH_IND / DISASSOC_IND / ICV_ERROR → evict-and-SET to proactively remove entries on known-bad sessions (belt + suspenders).
This adjustment is recorded here but the cache_in_boot mode remains a Phase 3 stretch (§5.9.3). clear_on_boot does not require this event; it's a Phase-3-P3c-only consideration.
5.10 WPA3-SAE support (Phase 3 deliverable)
Decision: YES, implement WPA3-SAE as a shipped feature in Phase 3.
5.10.1 Feasibility check — all four preconditions satisfied
Same string contains mfp (Management Frame Protection — mandatory for WPA3).
Protocol knowledge
pico-sdk C reference implements SAE join at cyw43_ll.c:2065-2111 via sae_password iovar + WLC_SET_AUTH=3 (AUTH_TYPE_SAE) + wpa_auth = CYW43_WPA3_AUTH_SAE_PSK (0x40000).
Cross-verified in other drivers
Embassy added WPA3 support in PR #3323 (merged). Capability observed in misc/embassy/cyw43/src/control.rs via grep for sae.
No blob upgrade needed. No chip feature gap. The only reason earlier drafts marked WPA3 as "optional" was conservative scope-cutting, not a feature gap.
5.10.2 Wire-level implementation spec
Extend the join flow (§2.4.12) with a WPA3 branch:
if auth_type ∈ {WPA3_SAE_AES_PSK, WPA3_WPA2_AES_PSK}:
wpa_auth = CYW43_WPA3_AUTH_SAE_PSK // 0x40000
(for WPA3_WPA2 transition mode: wpa_auth |= CYW43_WPA2_AUTH_PSK)
auth_cmd = AUTH_TYPE_SAE // 3 for WLC_SET_AUTH
mfp_val = MFP_REQUIRED // 2 for pure WPA3
// MFP_CAPABLE (1) for WPA3+WPA2 mixed
key_length_max = 128 // CYW43_WPA_SAE_MAX_PASSWORD_LEN
# Use "sae_password" iovar, not WLC_SET_WSEC_PMK:
buf[0..2] = LE u16 key_len
buf[2..130] = key bytes (zero-padded to 128)
cyw43_delay_ms(2) # firmware needs prep time
iovar_set("sae_password", buf, 130, STA)
For WPA3-SAE: WLC_SET_WSEC gets auth_type & 0xff (same as WPA2 path — AES is 4).
5.10.3 Config surface
// in Config:pubconstWpa3Mode=enum {
off, // skip WPA3 iovars entirely; reject wpa3_* AuthTypes at join()wpa2_only, // compile WPA3 code but default to WPA2 for all joins unless// caller explicitly passes a wpa3_* AuthTypeauto, // DEFAULT: try SAE first if AuthType is wpa3_*; on specific// failure signals (AUTH FAIL reason=16 auth_type=3, or// sequence of 3 SAE timeouts), attempt WPA2 fallback IFF// wpa3_wpa2_aes_psk (transition-mode) was the AuthType.// Pure wpa3_sae_aes_psk does NOT fall back to WPA2.prefer_sae, // always prefer SAE; never fall back (for WPA3-only AP)
};
wpa3_mode: Wpa3Mode=.auto,
Fallback semantics (.auto mode):
AuthType
Initial attempt
On WPA3 failure
Failure signals that trigger fallback
open / wpa_tkip_psk / wpa2_aes_psk / wpa2_mixed_psk
Always WPA2 path
n/a
n/a
wpa3_wpa2_aes_psk (transition)
SAE
WPA2
AUTH FAIL reason=16 auth_type=3, or 3 consecutive SAE timeouts in one join call
wpa3_sae_aes_psk (pure)
SAE
No fallback — returns JoinBadAuth
Same signals, but fallback not attempted
Fallback decision is made entirely inside ctrl/join.zig without exposing the mode-switch to the caller (the caller asked for wpa3_wpa2_aes_psk specifically because they want either to work). Fallback counts against the same join timeout_ms budget; only one fallback attempt per join call.
Rationale for .auto as default, not a plain bool = true: WPA3 support is binary at the firmware level but trinary at the policy level (do we force SAE, allow fallback, or refuse WPA3 entirely?). GPT-5.4 turn-6 review flagged that a plain boolean conflates the two — .auto gives integrators explicit control without requiring them to reason about WPA3 semantics.
AuthType enum in types.zig already enumerates WPA3 (per §2.2 and §5.1):
When wpa3_mode = .off, passing one of the WPA3 values to join() returns WifiError.UnsupportedAuthType. This provides a flash-saving escape hatch (~1–2 KB of SAE code+config) for WPA2-only deployments. When .wpa2_only, the code is compiled but the driver defaults to the WPA2 path unless the caller explicitly passes a wpa3_* AuthType.
5.10.4 AGENTS.md gotcha #28 status
AGENTS.md §CYW43 gotchas #28 currently documents: "WPA3/mixed-mode APs break WPA2 join." This was accurate for the old driver which has mfp=1 (MFP_CAPABLE) hard-coded and doesn't do the SAE handshake. The new driver implements SAE properly and should close this gotcha — WPA3 APs become first-class supported. Plan a final gotcha-list update in Phase 4.
Join AP configured WPA3-only; 4-way handshake completes using SAE. Existing table entry; now mandatory.
H13
WPA3 failed-password detection
AUTH FAIL reason=16 auth_type=3 observed; returns JoinBadAuth within 5 s (§6.1 WPA3-specific rule).
H14
WPA3-WPA2 transition AP
Join AP in transition mode; observe which auth path the firmware chose; both WPA2 fallback and WPA3 primary are acceptable.
WPA3 validation is NOT a Phase 4 cutover blocker (phasing: ships with Phase 3 code + validated in Phase 3 soak; additional hardening in Phase 4). -Dcyw43=new_shadow builds must compile WPA3 code by default; any regressed WPA2-only test indicates a WPA3 implementation leaking into the WPA2 path and must be fixed.
RX goes through HostHooks.onEthernetRx — no recv() API in the driver (async / poll-driven).
5.13 Country / regulatory
pubfnsetCountry(self: *Driver, country: Country) WifiError!void;
// exposed as a field in Config; setCountry is only for runtime changes
5.14 Compat façade
File cyw43.zig exports (alongside the new API) the pre-rewrite module-level surface so src/bindings/wifi.zig compiles unchanged during migration:
// Legacy module-level API kept as compatibility facade.// Delegates to the default-configured Driver instance owned by this module.// Phase 4 of the migration removes this section.pubvardefault_driver: Driver=undefined;
pubvardefault_initted: bool=false;
pubfninit(board: Board) WifiError!void {
if (!default_initted) {
default_driver=tryDriver.init(...);
default_initted=true;
}
}
pubfnledSet(on: bool) WifiError!void { returndefault_driver.gpioSet(CYW43_GPIO_LED, on); }
pubfnjoinWpa2(ssid: []constu8, key: []constu8) WifiError!void { /*delegate*/ }
pubfnservice() void { default_driver.pollOnce(); }
pubfngetIpAddress() [4]u8 { /*unchangedbehavior*/ }
pubfnhasIpAddress() bool { /*unchangedbehavior*/ }
// etc.
Section 6 — State-machine design
Four state machines, each with ASCII diagram + transition enumeration.
6.1 Join state machine
Model choice — three-flag, not bitmask. After comparing Embassy and soypat drivers (misc/embassy/cyw43/src/runner.rs:1171-1240, misc/cyw43439/ioctl.go:520-610), the plan adopts the three-flag link-state model rather than pico-sdk's wifi_join_state bitmask. The three flags independently track each phase of association; computed link state is simply:
link_up=join_okand (!secure_networkorkeyed)
Cleaner to reason about, easier to test, and matches how two independent clean-room ports converged. The pico-sdk bitmask approach (§2.3.1) is the behavioral reference for which events transition which flag — not the storage model.
pubconstJoinState=enum {
idle,
scanning,
associating, // join() issued, awaiting eventsjoined, // auth_ok && join_ok && (!secure || keyed)rejoining, // pend_rejoin processingdisassoc_pending, // pend_disassoc processingfailed_badauth, // terminal unless retry_badauth enabledfailed_nonet, // terminal unless retry_nonet enabledfailed_general, // retried on transient-backoff schedule
};
pubconstJoinFlags=struct {
/// auth_ok: 802.11 authentication / SAE handshake succeeded./// - WPA2: EV_AUTH status=SUCCESS./// - WPA3 SAE: EV_AUTH status=SUCCESS (emitted after SAE, before 4-way)./// - Open: EV_AUTH status=SUCCESS (firmware still emits for open nets).auth_ok: bool=false,
/// join_ok: association/link-layer complete (EV_JOIN status=SUCCESS)./// NOT "full success" — use isJoined().join_ok: bool=false,
/// keyed: 4-way handshake + GTK installed./// - WPA2/WPA3: EV_PSK_SUP status=UNSOLICITED flags=0 reason=0./// - Open: NEVER set (no PSK_SUP for open networks) — isJoined()/// accounts for this via `secure` parameter.keyed: bool=false,
pubfnisJoined(self: JoinFlags, secure: bool) bool {
returnself.join_okandself.auth_okand (!secureorself.keyed);
}
pubfnclear(self: *JoinFlags) void {
self.*= .{};
}
};
Open network join.secure=false → isJoined = join_ok && auth_ok. PSK_SUP is never received; keyed stays false forever and that is correct. §7.1 test matrix item 4 includes an open-network synthetic event sequence that MUST reach joined without any PSK_SUP event.
WPA3 vs WPA2 auth_ok. Both paths use EV_AUTH status=SUCCESS to set the flag; the sequence before it differs but the outcome event is the same. Integrator-visible AuthType tells us which path is in play; auth_ok is the same bool either way.
Out-of-order flag updates (JOIN before AUTH): order-independent by design. Test matrix includes the permutation.
Failure-latch clearing on new join. Critical. join() entry point clears ALL stale state (flags + pend-flags + threshold counters) before accepting new join intent:
Missing any of these clears causes "previous session's failure contaminates new session" intermittent bugs. Regression guard: test matrix exercises failed-then-retried join and asserts flags == {} + all pend-flags false at start of second attempt.
Link-down event while already rejoining → coalesced by §6.1.5 below; does not restart backoff timer.
State diagram:
[idle]
| join(...)
▼
[associating]
events drive flag updates (order-independent):
auth_ok ← (EV_AUTH status=SUCCESS) OR (EV_AUTH status=UNSOLICITED ignored)
join_ok ← EV_JOIN status=SUCCESS
keyed ← EV_PSK_SUP status=UNSOLICITED flags=0 reason=0 (AND auth_ok)
when flags.isJoined(secure_network) → [joined]
|
├── "link-down class" events → clear flags, [rejoining] or [idle]
│ ├── EV_LINK status=0 flags=0 (reason=1: loss of signal; reason=2: controlled shutdown)
│ ├── EV_DEAUTH status=SUCCESS (AP deauthed us)
│ ├── EV_DISASSOC
│ ├── EV_DEAUTH_IND reason=2 (bad password → pend_disassoc, not rejoin)
│ └── [WPA3] EV_AUTH status=FAIL reason=16 auth_type=3
│
├── "crypto drift class" events → pend_rejoin
│ ├── EV_ICV_ERROR (49) ── single event is trigger
│ ├── EV_MIC_ERROR (17) ── single event is trigger
│ ├── EV_UNICAST_DECODE_ERROR (50) ── single event is trigger
│ └── EV_MULTICAST_DECODE_ERROR (51) ── threshold (3 within 5 s)
│
├── "supplicant failure class" → pend_rejoin
│ └── EV_PSK_SUP status∈{4,8,10} reason=15 (timeout)
│
├── "firmware distress" → pend_rejoin + log warn
│ └── EV_PSM_WATCHDOG (41)
│
├── "RSN mismatch" → pend_rejoin_wpa (WPA1 fallback) + pend_rejoin
│ └── EV_PRUNE status=0 reason=8
│
├── positive heartbeat — NO state change, update health counter
│ └── EV_GTK_PLUMBED (84): group key successfully installed
│
└── IGNORE — must not trigger rejoin
├── EV_PSK_SUP reason=14 (CCX_FAST_ROAM — common during roams)
└── EV_AUTH status=UNSOLICITED (unsolicited auth packet noise)
[rejoining]
| pend_rejoin_wpa processed first (if set) → set_wpa_auth(WPA1)
| pend_rejoin processed → WLC_SET_SSID w/ cached last_ssid_joined
| backoff_index++ if reconnect.enabled; otherwise [failed_general]
▼
[associating] (reconnect flow)
Special: AUTH bad→good recovery
EV_AUTH status=SUCCESS while state==failed_badauth → back to associating, preserve flags.
Special: EV_SET_SSID decoding at top level
status=SUCCESS → join_ok=true (open/WPA3)
status=NO_NETWORKS (3) reason=0 → [failed_nonet]
any other status → [failed_general]
Event-to-transition table (authoritative; every row is specified in the event decoder per §6.4 ban on else => {}):
Event
Status
Reason
Flags
Auth type
Action
AUTH
SUCCESS
—
—
—
auth_ok = true; if in failed_badauth → associating
EventLog.record(UnknownEvent{…}, …) — see §6.1.4 authoritative spec
6.1.1 Multicast decode-error rate limiting
Embassy comment (paraphrased): single mcast decode failures occur naturally on mixed-client networks (different clients using different encryption, broadcast traffic encrypted for the original connecting client). A single event is not a reliable rejoin trigger. Implementation:
// Simple token-bucket style: 3 errors within 5 s triggers pend_rejoin.mcast_err_count: u8=0,
mcast_err_window_start_ms: u32=0,
fnonMulticastDecodeError(self: *Driver, now_ms: u32) void {
if (now_ms-self.mcast_err_window_start_ms>5_000) {
self.mcast_err_count=1;
self.mcast_err_window_start_ms=now_ms;
} else {
self.mcast_err_count+%=1;
if (self.mcast_err_count>=3) {
self.pend_rejoin=true;
self.mcast_err_count=0;
}
}
}
Unicast decode errors DO trigger pend_rejoin on first event — they indicate our session key is wrong for traffic directed specifically at us.
6.1.2 PSM_WATCHDOG handling
Firmware's internal Protocol State Machine watchdog fires when the microcode is stuck. None of Embassy/soypat/pico-sdk handle this event explicitly — all three define it in their enum and ignore it. Our plan treats it as a loud diagnostic AND a pend_rejoin trigger:
// Increment stats counter, log at WARN level with uptime.stats.psm_watchdog_events+=1;
log.puts("[cyw43] WARN: firmware PSM_WATCHDOG event — forcing rejoin\n");
self.pend_rejoin=true;
Rationale: if firmware has hit its own watchdog, we cannot rely on it recovering on its own. A forced rejoin is the safest response.
6.1.3 GTK_PLUMBED as positive health signal
This event fires after every successful group-key rekey. Most embedded drivers don't track it, but it is the single most reliable "firmware and AP are still talking correctly" signal available. Track as:
last_healthy_ms: u32=0, // updated on GTK_PLUMBED or successful ioctlfnonGtkPlumbed(self: *Driver) void {
self.last_healthy_ms=self.hooks.ticksMs();
self.stats.gtk_rekeys+=1;
}
The last_healthy_ms counter complements the crypto-error-class events — if we see MIC/ICV errors but last_healthy_ms is recent, the error might be an ordinary glitch; if last_healthy_ms is old, the errors are more concerning.
This subsection is the single source of truth for how the driver handles events not in the §2.2 specific-handler table. It supersedes earlier scattered fragments in §3.2, §3.4, §4.1, §5.8, §6.1 event-table's last row, §8.2 P2 validation, and §11.3 go/no-go — those remain valid pointers, but the spec lives here.
Design goals (ranked):
Never drop an event silently — every unknown arrives somewhere observable.
Never let unknown events flood UART (they carry the potential to cause the issue they're meant to diagnose, per ISSUES.md #25's 180 s UART-corruption hypothesis).
Make ISSUES.md #25 resolvable: the next session's Phase 2 soak must be able to identify which event fires at ~180 s cadence by running one shell command.
Integrator-controllable: pico may want to push events to JS/Ruby, persist to flash, or dump on shell command. The driver records locally and exposes a query API.
Event.unknown struct (single source of truth)
/// Replaces the earlier §3.2 placeholder with a full field set.pubconstUnknownEvent=struct {
event_type: u32, // raw BE→host-decoded event type valueevent_name: []constu8, // from 89-entry name table; "unknown" if >= 89status: u32,
reason: u32,
flags: u16,
auth_type: u32,
ifidx: u8, // interface index from the event wrapperbsscfgidx: u8, // bsscfg index if present; 0 otherwisepayload_len: u16, // length of wrapper payload past the 32-byte headerpayload_prefix: [16]u8, // first 16 bytes of wrapper payload (zero-padded)
};
pubconstEvent=union(EventKind) {
// ... specific variants for handled events ...unknown: UnknownEvent,
};
Every unknown event gets this full field set. The fields match exactly what GPT-5.4's peer review (turn 1, landmine D) specified as the minimum for resolving R14 + ISSUES.md #25.
pubconstEventLog=struct {
pubconstCAP=16; // max distinct {type,status,reason} tuplespubconstWINDOW_MS: u32=5_000; // rate-limit coalesce windowpubconstEntry=struct {
unknown: UnknownEvent, // full data from the first occurrencefirst_seen_ms: u32,
last_seen_ms: u32,
count: u16, // total occurrences (including first)emitted_in_window: bool, // true after the first-seen log line fired
};
entries: [CAP]Entry=undefined,
count: u8=0,
total_unknown_count: u32=0, // lifetime; never resetwindow_start_ms: u32=0,
pubfnrecord(self: *EventLog, ev: UnknownEvent, now_ms: u32, log: Logger) void;
pubfnrollWindow(self: *EventLog, now_ms: u32, log: Logger) void;
pubfndumpAll(self: *constEventLog, log: Logger) void;
pubfnclear(self: *EventLog) void;
};
Dedupe key — precise definition:
constDedupeKey=struct {
event_type: u32,
status: u32,
reason: u32,
auth_type: u32,
ifidx: u8,
};
// flags, bsscfgidx, payload_prefix are NOT part of the key.// Rationale: flags/bsscfgidx may vary slightly across otherwise-identical// events; payload bytes definitely vary. Including them would defeat// coalescing. Auth_type IS included because a WPA2 vs WPA3 event with// otherwise-identical fields is meaningfully different diagnostic data.// Ifidx is included because STA vs AP events should never coalesce.
record() algorithm:
1. total_unknown_count += 1
2. Compute key = DedupeKey{ event_type, status, reason, auth_type, ifidx }.
3. Search entries[0..count] for a matching key:
- match found:
- entry.last_seen_ms = now_ms
- entry.count += 1 (saturating at u16 max)
- (silent — no log line for duplicates within window)
- no match, and count < CAP:
- Append new entry with first_seen_ms = last_seen_ms = now_ms, count = 1
- Store the full UnknownEvent (including flags/bsscfgidx/payload_prefix)
for later inspection — only the KEY is shared across duplicates
- Emit first-seen log line (see format below)
- entry.emitted_in_window = true
- no match, and count >= CAP:
- Emit one "log full" line once per window (guarded by total_unknown_count)
- Increment total_unknown_count but don't store
rollWindow() algorithm (called once per Driver.pollOnce()):
1. If (now_ms - window_start_ms) < WINDOW_MS: return.
2. For each entry with count > 1 AND emitted_in_window:
Emit summary line: "coalesced (<count>×) over last 5s"
3. Reset all entries (or: compact — keep high-count entries for visibility).
4. window_start_ms = now_ms.
Emitted log-line format (single-line, ≤ 112 chars):
[cyw43] evt ??? type=41(PSM_WATCHDOG) status=0 reason=0 flags=0x0000 ifidx=0 bsscfg=0 plen=6 payload=01 00 00 00 ab cd ...
[cyw43] evt ??? type=41(PSM_WATCHDOG) coalesced 3× over last 5s
Ring-buffer full:
[cyw43] evt ??? event log full (16 distinct tuples); total unknowns this boot: 142
Integration points
Logger: record() always emits to Logger if provided; otherwise silently accumulates in the ring buffer. No-logger mode is supported (freestanding-strict builds may skip the pretty-formatter to save flash).
HostHooks.onEvent: fires for every decoded event includingEvent.unknown. Receives the full UnknownEvent struct. Integrators who want richer handling (persist to flash, push to MQTT, expose via HTTP) hook here.
Pico UART shell command (Phase 3 deliverable): wifi events dumps the ring buffer via EventLog.dumpAll(). Implementation goes in src/bindings/wifi.zig (not the driver itself — the driver exposes the API; the shell is pico-integration layer).
Specific use in ISSUES.md #25 resolution
The 180 s UART-corruption burst is hypothesized to be an unhandled event firing periodically. With §6.1.4 in place, Phase 2 validation becomes mechanical:
Run 30-min soak under active TCP workload (reproduces pico-sdk ISSUES.md #25). Logger enabled.
Issue wifi events shell command after soak.
Look at ring buffer for entries with count in the range 5–10 (for a 30-min run at 180 s cadence).
Those entries' type fields are the candidate events.
Cross-reference against event_names table to identify. Likely candidates based on research: ROAM (19), BCNLOST_MSG (31), PSM_WATCHDOG (41), or TXFAIL (20).
Logger-disabled confirmation run (per GPT-5.4 turn-6 review, addresses the circularity of using UART to diagnose UART corruption):
a. Rerun the same 30-min soak with Logger set to a no-op but EventLog still recording.
b. Observe: does the UART still corrupt at 180 s cadence? (Note: something other than our event decoder is using UART — puts from pico-level superloop, etc. — so UART still carries normal traffic we can observe for bursts.)
c. If corruption persists: the event exists (ring buffer has the data) but it is not the cause of UART corruption — SOMETHING ELSE is happening on the SPI bus around the same cadence. Pivot to alt-instrumentation paths below.
d. If corruption disappears: our own log-line emission at first-seen was the proximate cause. Ring buffer still tells us which event; we decide whether to log it differently or ignore it silently.
Once identified, decide: (a) decode specifically and ignore, (b) decode specifically and act, or (c) disable via event-mask bit clear. Document decision in UPSTREAM.md.
If the ring buffer is empty after soak, ISSUES.md #25's "unhandled-event hypothesis" is falsified — the burst is something else (e.g. SPI bus re-sync pattern, flash-XIP contention with CYW43 traffic, UART DMA underrun). Phase 2 gate then flips from "identify the event" to "identify the actual cause via alternative instrumentation." Either outcome is progress.
Invariants a reviewer must check
Decoder never falls through to else => {}. Unknown events always land in EventLog.record().
record() never allocates, never blocks, never calls back into the driver. Safe to call from any decode context.
Ring buffer is bounded by CAP — cannot overflow memory regardless of event flood rate.
Rate-limiter is bounded by WINDOW_MS — UART output rate ≤ 1 line per distinct tuple per 5 s window + 1 coalescing summary per window.
With the event-handler expansion in §6.1 (MIC_ERROR, ICV_ERROR, UNICAST_DECODE_ERROR, MULTICAST_DECODE_ERROR threshold, PSM_WATCHDOG, PSK_SUP timeout, and multiple link-down-class events), a flaky link can fire many pend_rejoin triggers in quick succession. Without coalescing, each trigger could reset the backoff timer or stack another pending rejoin — livelocking the reconnect policy under noise.
Rule: while join_state == .rejoining, additional pend_rejoin triggers are coalesced, not stacked. Concretely:
fnrequestRejoin(self: *Driver, trigger: RejoinTrigger) void {
self.stats.rejoin_triggers_by_class[@intFromEnum(trigger)] +=1;
self.last_rejoin_trigger=trigger; // for diagnosticsif (self.pend_rejoin) return; // already pending; coalesceif (self.join_state==.rejoining) return; // already processingself.pend_rejoin=true;
// Backoff timer is NOT reset by this call — it only advances when the// previous rejoin attempt actually completes (success or failure).
}
pubconstRejoinTrigger=enum {
icv_error, mic_error, unicast_decode_error, multicast_decode_error_threshold,
psk_sup_timeout, psm_watchdog, deauth, disassoc, ap_reboot_detected,
app_requested,
};
What this preserves:
stats counters still record EVERY trigger event (so diagnosis can see "we got 47 ICV_ERRORs in the last 5 minutes" even if only one rejoin was attempted).
last_rejoin_trigger shows which event class caused the currently-processing rejoin — useful for UPSTREAM.md M-entries when we see new failure patterns.
Backoff schedule (§6.1 ReconnectPolicy.backoff_ms) advances cleanly: one entry per completed rejoin attempt, not one per trigger event.
What this prevents:
Livelock under noisy crypto errors.
Backoff timer getting repeatedly stomped back to 1 s.
Multiple pend_rejoin being "true" when only one rejoin is semantically possible at a time.
Test matrix §7.1 item 4 regression: drive a sequence of 10 ICV_ERROR events in rapid succession and assert exactly ONE rejoin attempt issued, with stats.rejoin_triggers_by_class[icv_error] == 10.
Transition-order discipline. Success flags (auth_ok, join_ok, keyed) are order-independent — the state machine accepts them in any order and promotes to joined when the composite condition is met. Failure classification and deferred-action scheduling remain event-specific: a DEAUTH_IND reason=2 always maps to pend_disassoc, not pend_rejoin; a PRUNE reason=8 always sets pend_rejoin_wpa first. Test matrix §7.1 item 4 includes ordered, out-of-order, AUTH-recovery, crypto-drift, and roam-noise (reason=14) sequences.
Timeouts.join() with a timeout_ms (default 15 s) polls until joined or failed_*, then returns. Internally, assoc_pending + rejoin_pending both count against the same timeout budget. Three rejoin_pending entries inside a single join() call exhausts auto-retry; further recovery is only triggered by new events from within another join() call.
Auto-reconnect policy. Fully configurable:
pubconstReconnectPolicy=struct {
enabled: bool=true,
backoff_ms: []constu32= &.{ 1_000, 2_000, 4_000, 8_000, 16_000 }, // retry scheduleretry_transient: bool=true, // DEAUTH_IND, DISASSOC, ICV_ERROR, PSK timeout → retryretry_badauth: bool=false, // permanent auth failure — default OFF to avoid retry spamretry_nonet: bool=false, // SSID not found — default OFF (probably out of range)
};
// in Config:pubreconnect: ReconnectPolicy= .{},
Backoff reset triggers:
Any successful transition to joined clears the backoff index to 0.
A manual join(...) call with a new SSID clears the index (treated as a fresh session).
A manual leave() → future join(...) clears the index.
Special-cased failures:
failed_badauth / failed_nonet: gated behind retry_badauth / retry_nonet respectively. Default OFF — a permanent auth failure should not spin the CPU retrying. User code can respond to the onLinkDown callback with explicit reconfiguration.
failed_general: retried on the standard backoff schedule (transient IO/timing faults).
Exhausted backoff (past last entry): driver stays in reconnecting, heartbeat-pings once per last-backoff-interval. Host retains ability to trigger manual recovery.
associating → associated_no_ip: internal join_state == .joined AND flags.isJoined(secure_network) (see §6.1 three-flag model).
associated_no_ip → up: host calls markIpReady().
up → degraded: EV_BCNLOST_MSG (31) — beacon-loss event from firmware. This is the specific trigger (was "(future) heuristic" in earlier drafts; research identified the concrete event).
degraded → up: any successful RX frame OR EV_GTK_PLUMBED event (positive health signal per §6.1.3).
up|degraded → reconnecting: any "link-down class" event (§6.1 table) with auto_reconnect enabled.
* → down: leave() called, or reconnect exhausted and auto_reconnect disabled.
6.3 SDPCM TX queue state machine
[idle_no_credits] (tx_seq == last_credit)
| event/data RX updates last_credit
▼
[has_credits]
| cyw43_send_ioctl / _send_ethernet called
| increment tx_seq
| write frame to WLAN_FUNCTION
▼
[waiting_credits_or_flow]
| poll_device drains RX packets
| each RX updates last_credit and wlan_flow_control
▼
[has_credits] (if last_credit != tx_seq and !wlan_flow_control)
|
▼
[idle]
Stall recovery. If waiting_credits_or_flow exceeds 1 s, the stall times out with WifiError.SdpcmCreditStall. During the wait, RX drain processes events only — no data RX callback invocation (reentrancy hazard — §2.4.6).
Credit arithmetic.tx_seq and last_credit are u8 with wraparound. hasCredit() = (last_credit -% tx_seq) != 0. The "accept only if credit_delta <= 20" check in the reference (cyw43_ll.c:845-848) protects against stale/misordered credit headers; replicate it.
6.4 IOCTL dispatch state machine
[send]
| build SDPCM+CDC+payload, increment ioctl_id
| wait for credits (§6.3)
| transmit via bus.writeBytes
▼
[pending_response]
| poll_device in 1 ms tick up to 500 ms
|
├─ RX CONTROL with matching ioc_id ─► copy response, [complete]
├─ RX CONTROL with mismatch ─────── ─► drop, keep waiting
├─ RX ASYNCEVENT ──────────────────► dispatch via events module
├─ RX DATA ────────────────────────► dispatch via eth RX hook
└─ timeout 500 ms ────────────────── ─► [timeout]
▼
[complete] or [timeout]
Reentrancy discipline. The event dispatch inside pending_response is allowed to update driver state (including pend_rejoin flags) but must not call doIoctl synchronously. Reentrant-ioctl is forbidden. Events that want to trigger an ioctl use the pend-flag mechanism; the poll loop processes pends after returning from the outer ioctl.
Section 7 — Testing strategy
Three-layer test strategy: host, mock-transport, hardware.
7.1 Host-side tests (zig test)
Target host, not firmware. Exercises the decoder/encoder logic that doesn’t require hardware:
Endianness round-trips. For each header kind (SDPCM, CDC, BDC, event wrapper), encode a struct → bytes → decode → compare fields. Specifically:
SDPCM header with size=0x1234: bytes should be {0x34,0x12,0xcb,0xed,...} (size_com = ~size & 0xffff).
Event wrapper: event_type=69 encoded as BE {0x00,0x00,0x00,0x45}.
IE walker. Feed: (a) empty IE list, (b) single RSN IE (type=48), (c) WPA vendor-specific IE (type=221 with \x00\x50\xF2\x01), (d) WEP via capability bit, (e) malformed length field that overruns buffer. Verify auth_mode output matches reference bit-encoding.
Credit arithmetic.hasCredit(tx_seq, last_credit) for: (0,0)=false, (0,1)=true, (255,0)=true, (100,120)=true, (100,90)=false (requires the 20-delta accept rule). Named wrap tests (mandatory; separate from the basic table): credit_wrap_forward exercises tx_seq=0xFE, then four successive sends — credits must be accepted as {0xFF, 0x00, 0x01, 0x02} in sequence without a false-stall. credit_wrap_stale exercises a delivered credit of 0x05 when last_credit=0xFD — the credit_delta = 0x08 <= 20 rule accepts it. credit_wrap_stale_reject exercises a delivered credit of 0x10 when last_credit=0xF0 and tx_seq=0x00 — the delta is 0x20 > 20 and must be rejected.
Join state machine. Drive a synthetic event stream and verify JoinState + three flags (auth_ok, join_ok, keyed) match the §6.1 event-to-transition table. Sequences to test:
Happy path: AUTH ok → JOIN ok → PSK_SUP KEYED → joined.
PSK_SUP reason=14 (roam noise) — must NOT trigger rejoin. Regression guard against the bug Embassy documented at runner.rs:1216.
WPA3 AUTH FAIL reason=16 auth_type=3 (with wpa3_mode != .off) — link-down, not badauth-retry. Under .auto + wpa3_wpa2_aes_psk AuthType, triggers WPA2 fallback attempt per §5.10.
Crypto drift class: single ICV_ERROR, MIC_ERROR, UNICAST_DECODE_ERROR each independently trigger pend_rejoin. 3 × MULTICAST_DECODE_ERROR within 5 s triggers pend_rejoin; 2 events within 5 s does NOT (boundary).
PSM_WATCHDOG single event triggers pend_rejoin + stats increment.
GTK_PLUMBED updates last_healthy_ms counter but does not change JoinState or flags.
BCNLOST_MSG while up → degraded; next successful RX → up.
Event decoder on captured payloads. 10–20 hex-encoded event payloads captured from real hardware, decoded and asserted. Tests belong in src/cyw43_new/tests/events.zig. Because these captures don't exist at Phase 1 start, §7.1.1 below specifies the procedure for producing them in Phase 2.
Once Phase 2 lands the event pipeline (§6.1 decoder), shadow-mode logging on real hardware produces the hex captures that §7.1 item 5 needs for regression testing.
Capture tool: add to src/cyw43_new/tests/hardware_bringup.zig (the Phase 1 -Dcyw43=new_shadow test exe) an event-dump mode:
// Compile with -DCAPTURE_EVENTS=1 to enable.// Dumps each event as a hex line over UART:// EVT_HEX: aabbccdd... <event_name> <status=N reason=M flags=F>fnonEventDump(ctx: *anyopaque, ev: *constEvent, raw_wrapper_bytes: []constu8) void {
_=ctx;
console.puts("EVT_HEX: ");
for (raw_wrapper_bytes[0..@min(raw_wrapper_bytes.len, 48)]) |b| {
console.putHex8(b);
}
console.puts(" ");
console.puts(@tagName(std.meta.activeTag(ev.*)));
console.puts("\n");
}
Capture scenarios (run each, save picocom UART log):
Scenario
Expected events observed
Scan-only
ESCAN_RESULT × N, CSA_COMPLETE_IND
WPA2 clean join
SET_SSID, AUTH, JOIN, LINK, PSK_SUP KEYED
WPA2 wrong password
SET_SSID, AUTH FAIL, DEAUTH_IND reason=2
SSID not found
SET_SSID status=3 reason=0
WPA3 join
AUTH (SAE), SET_SSID, JOIN, LINK, PSK_SUP KEYED
Router power-cycle
DEAUTH / DISASSOC, BCNLOST_MSG, then reconnect sequence
Forced hostapd DEAUTH
DEAUTH_IND reason (varies by AP)
Power-save ICV_ERROR
repeated ICV_ERROR (49), MIC_ERROR possibly, after induced rekey-miss
This gives us a regression net over every real-world event shape we've observed. Captures accumulate across phases; by end of Phase 3 we have a corpus covering every event in §2.2's handler table.
6. SPI command word packing.packCmd(true, true, 2, 0, 64) should produce 0xC0000040; byte-swapped form should produce {0x00,0x40,0x00,0xc0} (half-word swap).
All of these live in their respective .zig module’s test "..." blocks. The host build already supports zig build test and has the 78+ nanoruby tests as baseline (§11 gates).
7.2 Mock-transport integration tests
A Zig mock implementation of the Spi interface in §5.1 that plays back canned transaction logs. This is the wire-format conformance gate parallel to byte-identity.
7.2.1 Log format
Plain-text, line-oriented, #-prefixed comments. Each line is one SPI transaction seen from the host side:
Labels appear in assertion-failure messages so diff output is navigable.
7.2.2 Golden transcripts needed
Path
Captured from
Scope
tests/golden/boot_bus_init.log
old driver
SPI test reg through CLM load + WLC_UP
tests/golden/scan_escan.log
old driver
escan iovar + all-channels results
tests/golden/join_wpa2_psk.log
old driver
Full WPA2 join: passphrase → SET_SSID → PSK_SUP KEYED
tests/golden/join_wpa3_sae.log
new driver
WPA3 SAE join (Phase 3 deliverable; captured from new driver in shadow mode)
tests/golden/reconnect_icv_error.log
new driver
Induced power-save ICV_ERROR + auto-rejoin
tests/golden/reconnect_after_deauth.log
new driver
Hostapd-injected DEAUTH + auto-rejoin
tests/golden/pmksa_clear_on_boot.log
new driver
pmkid_info iovar at end of wifiOn
tests/golden/eth_tx_tcp_ack.log
old driver
Single TCP ACK via BDC
Phase 1 produces the first three. Phase 2/3 produce the rest.
7.2.3 Logic-analyzer capture procedure
Hardware: Saleae Logic 2 (16-channel, 24 MHz minimum) is the reference tool. DSLogic, sigrok with fx2lafw, or any 4+ channel analyzer ≥ 50 MS/s also works.
Pin mapping on Pico W (CYW43 SPI + control lines):
Signal
GP#
Analyzer channel
WL_CLK
GP29 (adapt if board differs)
Ch0
WL_DIO (bidirectional)
GP24
Ch1
WL_CS
GP25
Ch2
WL_IRQ (= WL_SDIO_1)
GP24 when CS high
(shared with DIO)
WL_REG_ON
GP23
Ch3
GND
—
probe ground
Sample rate: 24 MHz minimum. Pico W PIO-SPI runs at 33 MHz theoretical max but 24 MHz + digital-filter-in-Logic is sufficient.
Trigger: rising edge on WL_REG_ON channel. Captures the full bring-up from chip power-on.
Capture duration:
boot_bus_init: ~600 ms after trigger (covers firmware upload and CLM load).
scan_escan: ~3 s (active scan + 2.4G result delivery).
join_wpa2_psk: ~5 s (association + 4-way handshake).
reconnect_*: ~30 s with induced disruption in middle.
Export:File → Export data → CSV, "Time" + all channels, "Full precision time". Column order Time [s], WL_CLK, WL_DIO, WL_CS, WL_REG_ON.
7.2.4 tools/spi_trace_to_mock.py specification
Phase 1 deliverable. Python 3.9+, standard library only. CLI:
spi_trace_to_mock.py <input.csv> <output.log>
[--start-offset <seconds>] # skip initial noise
[--label <prefix>] # prefix every ## group label
Logic:
Parse CSV, build a time-sorted event list.
Detect gSPI command frames by CS-low + CLK sequences. The first 4 bytes on DIO after CS-low are the command word per §2.4.2.
Parse command word: {write, incr, fn, addr, sz}. Follow with sz bytes of data (TX if write, RX if read — direction inferable from cmd-word bit 31).
Emit a log line per transaction. Group with ## <label> at natural SPI inactivity gaps (>1 ms).
Special-case pre-mode-switch byte-swap: if the SPI_BUS_CONTROL write hasn't happened yet, apply the {b1,b0,b3,b2} swap so the log reflects logical values, not wire-byte-order.
The tool is intentionally one-way (capture → log). Reverse direction ("log → test-driver") is the mock SPI framework itself, §7.2.5.
7.2.5 Mock SPI framework sketch (Zig)
Concrete skeleton for the Phase 1 test harness. src/cyw43_new/tests/mock_spi.zig:
pubconstMockSpi=struct {
log_lines: []constLogLine,
cursor: usize=0,
rx_buf_next: ?[]constu8=null, // staged for the next transferRxpubconstLogLine=struct {
direction: enum { tx, rx },
fn_id: u8,
addr: u32,
len: u32,
bytes: []constu8,
label: []constu8="",
source_line: u32, // for assertion-failure messages
};
/// Implements the Spi vtable interface from §5.1.pubfntransferTx(ctx: *anyopaque, cmd_word: u32, payload: []constu8) WifiError!void {
constself: *MockSpi=@ptrCast(@alignCast(ctx));
if (self.cursor>=self.log_lines.len) returnerror.MockExhausted;
constexpected=self.log_lines[self.cursor];
if (expected.direction!=.tx) {
std.debug.panic("mock: expected rx at line {d} ({s}), got tx",
.{ expected.source_line, expected.label });
}
// Parse cmd_word; confirm fn/addr/len match expected.// Confirm payload bytes match expected.bytes; emit hex diff on mismatch.self.cursor+=1;
// If next line is an rx paired with this tx (read command), stage its bytes.if (self.cursor<self.log_lines.lenandself.log_lines[self.cursor].direction==.rx) {
self.rx_buf_next=self.log_lines[self.cursor].bytes;
self.cursor+=1;
}
}
pubfntransferRx(ctx: *anyopaque, out: []u8) WifiError!void {
constself: *MockSpi=@ptrCast(@alignCast(ctx));
conststaged=self.rx_buf_nextorelsereturnerror.MockUnexpectedRx;
if (staged.len!=out.len) returnerror.MockLenMismatch;
@memcpy(out, staged);
self.rx_buf_next=null;
}
pubfnasSpi(self: *MockSpi) Spi {
return .{ .ctx=self, .vt=&mock_vtable };
}
};
Failure-mode assertion messages include the golden-log source line, the expected hex, the actual hex, the first byte position where they differ, and the preceding group label:
Every golden trace and event-payload fixture carries a sidecar metadata file so future contributors can tell whether the fixture is still valid after a blob upgrade or hardware change.
Path convention: alongside the fixture, with .meta suffix.
Metadata schema (plain-text key=value, parseable by trivial tools):
# tests/golden/boot_bus_init.log.meta
capture_date = 2026-06-02
scenario = boot_bus_init
firmware_sha256 = <sha256 of src/cyw43/firmware/43439A0_combined.bin at capture time>
firmware_version = 7.95.61
nvram_sha256 = <sha256 of src/cyw43/firmware/43439A0_nvram.bin>
clm_blob_len = 984
board = pico_w_revA
board_rev = 1.3
host_cpu_freq_mhz = 125
logic_analyzer = Saleae Logic 2 Pro 16, sw v2.4.17, HW rev E
sample_rate_mhz = 24
capture_duration_ms = 620
trigger = rising_edge on WL_REG_ON
driver_source_sha = <git rev-parse HEAD of pico repo when capture taken>
driver_config = -Dcyw43=old -Dengine=js (or whatever was running)
logger_enabled = true
notes = First Phase 1 capture; baseline boot sequence.
Mandatory fields (test harness refuses to load a fixture without these): capture_date, firmware_sha256, nvram_sha256, driver_source_sha, scenario, driver_config, logger_enabled.
Validation on replay: before running a test that uses a fixture, the harness reads .meta and asserts firmware_sha256 / nvram_sha256 match current blob SHAs. If a blob was upgraded without regenerating fixtures, the test fails loudly with a clear message (not silently passes against stale data).
This closes the R17 feedback loop: blob upgrades CANNOT silently invalidate wire-format tests. Also makes the corpus readable for the next maintainer 2 years from now.
7.2.7 Why this is the primary wire-correctness gate
Byte-identity (§7.4) only catches regressions in the -Dcyw43=old build — it doesn't validate the new driver. The mock replay validates:
If a Zig translation has an off-by-one in any of these, replay fails loudly with hex diff. This is the gate that turns R1 (translation bug) from "medium-probability silent failure" into "low-probability loud failure" — dramatically cheaper to debug.
7.3 Hardware validation matrix
Must-pass scenarios before flipping default (§8.4). Run each on real Pico W with the NVRAM/CLM version committed to this repo. All must pass clean (no UART corruption beyond ISSUES.md #25 tolerance; no watchdog reset; no driver panic).
#
Scenario
Steps
Pass criteria
H1
Fresh join
Power-cycle Pico; join(ssid, key); DHCP
up within 15 s; MQTT broker ping works
H2
Re-join after router reboot
H1 succeed; reboot router; wait for AP to come back
Driver auto-reconnects within 60 s of AP return
H3
Re-join after explicit DEAUTH
H1 succeed; send DEAUTH from hostapd test script
Auto-reconnect within 10 s
H4
Sustained traffic soak
H1 + MQTT pub every 10 s for 60 min
No UART burst beyond ISSUES.md #25 tolerance; MQTT uninterrupted
H5
DHCP lease renewal
H1 + let lease expire and renew (~50% of lease time)
Lease renews without reconnect
H6
Roam between APs on same SSID
Two APs same SSID, power one off mid-session
Driver roams or at worst reconnects
H7
Bad password
join(ssid, "wrong")
Returns JoinBadAuth within 5 s; no retry spam
H8
SSID not found
join("nonexistent", "x")
Returns JoinNoNetwork within 10 s
H9
Scan during idle
scan(.{}) with running TCP session
Scan completes without disrupting session
H10
Power-save cycle
Set PM2, idle 30 s, then TX
First TX after idle wakes chip cleanly
H11
WPA3 (if enabled)
H1 but WPA3 SSID
up within 15 s
H12
ISSUES.md #25 diagnostic
H4 + exhaustive event logging
Event type driving the burst is logged
7.4 Regression gates (parallel to the matrix)
Every commit in the Phase 1–4 migration must pass:
-Dengine=js UF2 byte-identical to .preflight-baseline/pico-preintegration.uf2. Verify: sha256sum zig-out/firmware/pico.uf2 against the pinned hash.
-Dcyw43=old is the default during Phase 1–3 (so the "default" shipping build keeps hitting the old tree).
Build-selection isolation: the -Dcyw43=old -Dengine=js build traverses the same old-driver source set as the pre-rewrite baseline and produces identical UF2 bytes. New-driver code (src/cyw43_new/) must be unreachable from the old-driver build selection — not merely guarded behind an if, but not present at all in the root source graph of the old build.
Linker script, boot2, firmware blobs unchanged.
The build-selection isolation rule is the enforceable invariant. Any mechanism that achieves it (separate root files per build flag, module gating, or anything equivalent to how docs/NANORUBY.md §A2 separates -Dengine=js from -Dengine=ruby) is acceptable.
Byte-identity does not hold (and is not expected to hold) for:
-Dcyw43=new build (that’s the point).
Any engine other than js.
Builds with different -D flags (-DSSID, -DUSB_HOST, etc.) — those have their own baselines (see M2 entry in src/ruby/nanoruby/UPSTREAM.md).
Section 8 — Migration plan
Four phases. Each phase is landable as a small sequence of commits; inter-phase boundaries are hardware-verified checkpoints.
8.0 Cross-phase infrastructure and peer-review discipline
Before Phase 1 starts, the next session should understand the infrastructure decisions that apply across all phases.
8.0.1 build.zig integration — concrete pattern
The -Dcyw43=old|new|new_shadow flag is added with the same root-source-file selection pattern that docs/NANORUBY.md M2 used for the engine gate. Build-selection isolation (per §7.4) is achieved by making the root source file depend on the flag, not by conditional imports inside a shared root file — this is what guarantees byte-identity under -Dcyw43=old.
Concrete sketch to adapt (not literal final code — build.zig is 22 KB and has existing structure):
// build.zig additionsconstCyw43=enum { old, new, new_shadow };
constcyw43_sel=b.option(
Cyw43,
"cyw43",
"CYW43 driver selection: old (shipping), new (Phase 4 cutover), new_shadow (both compiled for dev)"
) orelse.old;
// Expose to source as a build_config import (mirror of nanoruby pattern)constbuild_options=b.addOptions();
build_options.addOption(Cyw43, "cyw43", cyw43_sel);
fw_mod.addImport("build_config", build_options.createModule());
// Root source file selection — mirrors nanoruby -Dengine gate.// This is what preserves byte-identity for -Dcyw43=old.constfw_root=switch (cyw43_sel) {
.old=>b.path("src/main.zig"), // current, untouched.new=>b.path("src/main_cyw43_new.zig"), // wires Driver to bindings/wifi.zig.new_shadow=>b.path("src/main.zig"), // default root; new tree is reachable// only via a dedicated test exe, see below
};
// For .new_shadow, add a dedicated test exe that instantiates the new driver// without swapping the production path:if (cyw43_sel==.new_shadow) {
consttest_new=b.addExecutable(.{
.name="test-cyw43-new",
.root_module=b.createModule(.{
.root_source_file=b.path("src/cyw43_new/tests/hardware_bringup.zig"),
.target=fw_target,
.optimize=fw_optimize,
}),
});
b.installArtifact(test_new);
}
Consumer pattern (in source code):
constbuild_config=@import("build_config");
// compile-time branch; zero runtime costif (build_config.cyw43==.new) {
// use new driver
}
Key rules:
The .old arm's source graph is identical to pre-rewrite — no if (cyw43 == .old) branches inside main.zig itself. The flag only gates root-source-file at the build step. This is what makes byte-identity trivially preserved.
.new_shadow keeps .old as the production root and adds a dedicated test executable. Both drivers get compiled (for their respective trees), but the shipping UF2 still uses old.
.new switches the root source file to a version that wires the new driver to bindings/wifi.zig. Only flipped at Phase 4 cutover.
8.0.2 Peer-review checkpoints
The user-ai MCP's discuss tool with conversation_id: pico-cyw43-rewrite-plan-2026 is the designated peer review channel (see §1 Companion references). Invoke the peer at every checkpoint below, not just when stuck. A quick 2–3-sentence status summary is enough to trigger a sanity check; the peer has the full plan history in the conversation.
Checkpoint
Requirement
What to review
Before Phase 1 starts
REQUIRED
Scope-interpretation sanity check. Does the next AI understand §8.1 as intended?
Before Phase 1 PR merge
Conditional (invoke if any deviation from §2.4.4 bring-up sequence or new-hardware-state observed)
Transport + boot working on hardware.
Before Phase 2 starts
Conditional (invoke if mock-framework design differs from §7.2.5 sketch)
Strategy for event-decoder tests with Phase-2 captures (§7.1 item 5).
Before Phase 2 PR merge
REQUIRED
ISSUES.md #25 findings; the unknown-event type identified (or falsification).
Conditional (invoke if iovar returns non-zero on our blob — §5.9.4 fallback kicks in)
docs/CYW43-PMKSA-RESEARCH.md re-read; iovar response on our blob tested first.
After WPA3 wire-spec cross-ref (P3c start)
Conditional (invoke if deviating from Embassy PR #3323 pattern)
§5.10 wire-level spec cross-checked against Embassy.
Before Phase 4 cutover
REQUIRED
Full hardware matrix §7.3 complete. All go/no-go §11 checkboxes green. Last-look.
On any unexpected hardware behavior
Conditional (invoke BEFORE adding a workaround)
Post the symptom and proposed workaround to the peer.
Required checkpoints (4): Phase-1 start, Phase-2 PR, Phase-3 start, Phase-4 cutover. Conditional (5): PR merges without issues, research re-reads aligning with prior findings, and hardware surprises. Required checkpoints are hard gates; conditional ones are escape hatches when risk is elevated.
Peer review is cheap; it has caught real bugs in this plan (the PSK_SUP reason=14 bug, the time-boxed-PMKSA ambiguity, the too-brittle byte-identity rule, the WPA3 bool-vs-enum conflation, the rejoin-storm livelock risk). Budget ~20 min per required checkpoint.
8.0.3 PIO SPI carryover
Our existing src/cyw43/transport/pio_spi.zig (406 lines) is hardware-proven — the chip bring-up sequence in §2.4.4 works on real Pico W through this transport. Phase 1 ports it with cleanups, not a rewrite:
Keep verbatim: PIO program assembly (the CYW43 gSPI timing is fixed; our PIO instructions are known-good).
Keep verbatim: Pin-mapping constants and GPIO setup.
Clean: any std.debug.print → Logger; anyerror returns → explicit WifiError; raw pointers → slices per §3.1.
Rename: the new home is src/cyw43_new/transport/pio_spi.zig with the same public surface (transferRx/transferTx/setPolarity/reset) but accessed through the Spi vtable interface (§5.1) rather than as module-global functions.
Preserve: the specific setPolarity(self, 0) call after mode-switch (cyw43_spi.c:84 equivalent). Do not optimise away — this is part of the documented bring-up.
This is intentionally ~80% line-for-line preservation, not a clean-room reimplementation. The PIO timing is firmware-and-hardware-defined; the existing code is the canonical Zig version of it. Rewriting from scratch would invite regressions in a well-tested component.
8.0.4 No optimization before parity
Rule: through end of Phase 2 (wire-byte parity established via golden-trace regression), no throughput or latency micro-optimizations are merged.
Allowed in P1/P2:
Carrying forward proven optimizations from the existing tree (e.g. the PIO SPI program — hardware-tuned, don't touch).
Compile-time constant folding that falls out naturally from comptime usage.
Dead-code elimination from unused feature flags.
NOT allowed in P1/P2:
SDPCM credit prediction / speculative TX.
Backplane-window batching beyond what §2.4.3 specifies.
Bus bypass shortcuts that skip ioctl-response wait.
Rewriting the PIO SPI program for "faster" timing.
Rationale: a bug introduced by a performance optimization is at least 10× harder to debug than one introduced by a protocol translation. P1/P2 is about establishing correctness; P3+ can consider perf if measurement demands it. In practice, the reference C driver's perf profile is adequate for Pico W workloads and our Zig port should match it; further optimization is premature.
P4 soak may reveal specific bottlenecks — at that point a performance PR is acceptable with: (a) benchmarks showing the delta, (b) golden trace regression still passing, (c) no change to semantic behavior.
8.1 Phase 1 — Parallel tree scaffolding + transport + boot
Scope:
Create src/cyw43_new/ tree.
Add -Dcyw43=old|new|new_shadow build option; old is default.
old = current src/cyw43 path; must remain byte-identical.
new_shadow = both drivers compiled; old still wired to bindings/wifi.zig; new driver exposed via zig build test-cyw43-new for off-line exercise.
new = new driver wired to bindings/wifi.zig (reserved for Phase 4).
Implement ll/boot.zig + ll/clm.zig + ll/power.zig through the CLM-load point.
Implement mock SPI transport (host-side, under src/cyw43_new/tests/).
Capture golden SPI traces from old driver: boot/bus_init.
Host-interrupt-pin behavioral verification (added per GPT-5.4 turn-6 review — this is exactly where "works in happy path" drivers get flaky later). Phase 1 validates on real hardware:
Polarity: on Pico W, WL_HOST_WAKE (via CYW43_PIN_WL_IRQ in reference; effectively a shared line with WL_DIO when CS is high) is active at what logical level? Reference says host_interrupt_pin_active = 0 for SPI mode (cyw43_ll.c:86). Verify via scoped capture of WL_IRQ during ioctl response: interrupt assertion should be clearly observable with the documented polarity.
Level vs edge: our Zig driver polls via cb_read_host_interrupt_pin() returning the current level (not edge-triggered). Verify that polling behavior tolerates missed transitions — i.e. if WL_IRQ goes active and back to idle between two has_work() checks, the subsequent poll must still discover the pending packet via SPI_INTERRUPT_REGISTER / SPI_STATUS_REGISTER read. This is what cyw43_ll.c:1007-1063 handles via the spi_int check when the pin itself isn't currently asserted. Our port must preserve this fallback.
Missed-interrupt tolerance: deliberately induce a scenario where WL_IRQ fires between polls (e.g. rapid ioctl sequence). Confirm the had_successful_packet + SPI_INTERRUPT_REGISTER fallback pattern recovers without packet loss. Reference at cyw43_ll.c:1014-1064.
Validation:
zig build -Dcyw43=old byte-identical to baseline.
zig build -Dcyw43=new_shadow compiles + links for firmware target.
zig build test includes new driver host tests; all pass.
Mock SPI replay of boot/bus_init matches golden byte-for-byte.
On hardware (via a dedicated test entry, not via bindings/wifi.zig): new driver can execute busInit end-to-end on Pico W, print OK, and halt. Takes ~400 ms.
Commit size: ~1,300 LOC added across ~10 files. 3–4 commits.
Expected time: 1–2 coding sessions (best case ~8 h uninterrupted; realistic 12–16 h including trace-capture harness + hardware firmware-upload verification).
Dependencies: none (just the repo’s current state).
Capture golden SPI traces: scan/escan_request, events/startup_event_stream.
Validation:
-Dengine=js still byte-identical (old driver untouched).
Mock-replay scan passes wire-for-wire.
New driver can: init, wifi_on, issue scan, log every event decoded.
Host unit tests pass for event decoder (§7.1 tests).
On hardware: new driver scan returns real AP list including correct auth_mode.
Event-mask effectiveness gate (P2-mandatory). The bsscfg:event_msgs write alone is not a guarantee of delivery. Verify by observing that each of the following known events is received by the decoder during a scripted sequence:
SET_SSID, AUTH, LINK, PSK_SUP (from a successful WPA2 join)
ESCAN_RESULT (from a scan)
DEAUTH_IND (induced via hostapd or iw disconnection on a test AP)
DISASSOC (from leave())
If any expected event fails to arrive, debug the mask construction (bsscfg:event_msgs payload shape — name buffer length, bsscfgidx endianness, exact mask byte count for this firmware vintage) before proceeding. Broadcom iovars fail silently — this gate catches the silent-failure class.
Rate-limit unknown-event logging per §6.1.4 authoritative spec. Ring buffer bounded at 16 distinct {type,status,reason} tuples; 5-second coalescing window; UART output capped at ~1 line per distinct tuple per window plus one coalescing summary. Unbounded logging under a flood would itself perturb UART timing and muddy the ISSUES.md #25 diagnostic (which is exactly the kind of diagnostic the mechanism is meant to resolve — self-awareness matters here).
ISSUES.md #25 diagnostic progress: run a 30-min soak with new driver in shadow mode, collect event log, identify the 180-s-cadence event (likely roam-related). Findings go into UPSTREAM.md M-entry.
Rollback:-Dcyw43=new_shadow remains experimental; all production builds unaffected.
Commit size: ~1,100 LOC added across ~7 files. 3–5 commits.
Expected time: 1–2 coding sessions (best case 16 h; event-pipeline bugs absorb time, realistic 20–30 h including mask-effectiveness debug).
Implement the full join state machine (§6.1) and deferred-action processing.
P3 reliability deliverables (ordered by value/cost per docs/CYW43-PMKSA-RESEARCH.md):
P3a — ICV_ERROR handler (highest value, 3 lines). Event 49 (CYW43_EV_ICV_ERROR) sets pend_rejoin = true and schedules poll. Addresses pico-sdk #2153. This lands as part of the exhaustive event decoder commit — no separate step; it's the specific event-49 branch. Verify on hardware: reproduce the #2153 power-save-induced ICV_ERROR flood scenario; observe that the first event 49 triggers rejoin within ~5 s instead of silent flood.
P3b — PMKSA clear_on_boot (mandatory, ~20 LOC). At end of wifiOn() after WLC_UP, send the 356-byte pmkid_info iovar (see docs/CYW43-PMKSA-RESEARCH.md for exact payload). State transitions to wifi_up_pmksa_cleared; join() requires that state. Verify on hardware via H2 (router power-cycle) A/B. If the iovar is unsupported on our blob vintage (unexpected per research), execute §5.9.4 fallback.
P3c — PMKSA cache_in_boot (stretch, optional). Implement host-side cache + DEAUTH eviction per §5.9.3. If time-constrained, defer to Phase B; clear_on_boot remains the Phase 4 default either way.
Add compat façade in cyw43.zig (§5.10).
Capture golden SPI traces: join/wpa2_psk, reconnect/after_deauth, reconnect/after_icv_error, pmksa/clear_on_boot.
New driver under -Dcyw43=new_shadow passes hardware matrix H1, H2, H3, H7, H8.
ICV_ERROR validation (P3a). Reproduce pico-sdk #2153 scenario: set PM to PM_POWERSAVE, idle for >2 minutes to trigger a rekey event, observe whether event 49 fires. New driver must detect the event, queue pend_rejoin, and re-establish the link within ~10 s. Old driver under the same setup shows the event-49 flood.
PMKSA validation (P3b). H2 (router power-cycle) demonstrably succeeds with the new driver AND demonstrably failed-or-degraded with the old driver on the same physical setup (A/B). Plus the clear-effectiveness negative proof from §11.3 (on-wire / AP-log / timing evidence that the clear actually took effect).
60-min soak under MQTT workload: no watchdog, link stays up, auto-reconnect works.
Event-mask fully opened; all events decoded. Any remaining "unknown" events are logged with type+status+reason.
Rollback: revert to end-of-Phase-2 state. Hardware shipping keeps using old driver.
Commit size: ~700 LOC added across ~4 files. 3–4 commits.
Dependencies: Phase 2 complete and mock-soak-validated.
8.4 Phase 4 — Flip default, soak, retire old tree
Scope:
Change build default to -Dcyw43=new. bindings/wifi.zig now calls into the new driver via the compat façade.
Update .preflight-baseline/pico-preintegration.uf2only when the flip happens (this is the one commit where byte-identity is intentionally broken; capture the new baseline).
After 2 weeks of daily use / soak validation without regressions: delete src/cyw43/ entirely.
At that point, the compat façade can start gradually thinning (optional Phase 5).
Validation:
Hardware matrix §7.3 passes against -Dcyw43=new as default.
2-week soak: no regression reports, no watchdog, no reconnect failure that the old driver handled.
Commit adding the new baseline is reviewed against:
Dependencies: Phase 3 passes all hardware matrix + 60-min soak.
8.5 Migration summary table
Phase
Added LOC
Time (realistic)
Default build
Risk
1
~1,300
1–2 sessions
old
Low (parallel tree, no cutover)
2
~1,100
1–2 sessions
old
Low
3
~700
2–3 sessions + 1–2 weeks hardware soak
old
Medium (new driver on hardware soak)
4
~50
1 session + 2 weeks soak
new
High (cutover moment)
5 (opt)
-2,200 (delete old)
1 session
new
Low (cleanup)
Hour estimates are best-case / uninterrupted. Real elapsed time stretches for bus-bringup debugging, event-mask-effectiveness debugging, and hardware iteration windows — these are the three known time-sinks in a driver rewrite.
Section 9 — Risk register
Ranked by probability × impact. Each has a mitigation and a detection strategy.
ID
Description
P
I
Mitigation
Detection
R1
C→Zig translation bug that compiles but fails at runtime (silent wire-format corruption)
M
H
Mock-replay golden SPI traces (§7.2) at each phase; host unit tests for encoders/decoders
Replay byte diff
R2
Endianness mismatch in event wrapper (BE) vs SDPCM header (LE)
M
H
Explicit endianness matrix §2.4.2 embedded in code comments; std.mem.readInt(T, slice, .big|.little) called with explicit endianness
Host tests on captured payloads
R3
Alignment hard-fault on Cortex-M0+ from struct-cast over RX buffer
150 ms startup timing miss → intermittent boot hang
M
M
Preserve exactly; mark as "known-required timing", do not optimise away
Hangs in boot on ~10% of power-cycles
R6
PMKSA iovar returns error on our blob vintage → clear_on_boot cannot be implemented as specified
Very Low
M
Pre-Phase-3 research completed (docs/CYW43-PMKSA-RESEARCH.md): iovar pmkid_info is confirmed for CYW43439 family (CY_CC_43439_CHIP_ID explicit in brcmfmac). Legacy API applies to our blob. Our actual firmware is 7.95.61 (2023-01-11, identical to Embassy's). Residual risk very low; validated via <30-min hardware test early in P3b. If iovar returns error: blob upgrade to soypat's 7.95.62 (triggers R17) OR §5.9.4 documented-alternate-primitive fallback.
H2 (router power-cycle) comparison. Firmware error response on the iovar call is observable.
R6b
PMKSA cache semantics wrong → stale PMKID used on reconnect, failure worse than no cache
L
M
cache_in_boot mode must evict on DEAUTH_IND, DISASSOC_IND, ICV_ERROR (§5.9.3); unit test covers eviction; cache_in_boot ships only after H3 (forced DEAUTH) demonstrates clean recovery
H3 fails on cache_in_boot but not on clear_on_boot
R7
Firmware blob version mismatch with the reference's tested vintage
L
M
Pin firmware SHA to firmware/UPSTREAM.md; sanity check runs on every boot
Architectural invariant §2.4.7 + §6.4; pend-flag mechanism only way to trigger ioctl from event
Stack corruption; intermittent crash
R11
Backplane window cache drift (raw write to HIGH/MID/LOW somewhere)
L
H
Single-owner invariant §2.4.3; grep-enforceable in CI (no SDIO_BACKPLANE_ADDRESS outside bus/backplane.zig)
FW upload corruption; CLM load timeout
R12
LICENSE.RP attribution gap for a translated file
L
M
Per-file header rule §10.3; UPSTREAM.md mapping table; CI lint that every file under src/cyw43_new/ll/ and ctrl/ has the header
Manual audit in Phase 4
R13
Hidden dependency on pico-sdk RTOS primitives (mutex/semaphore)
L
M
§2.4 audit confirms reference uses CYW43_THREAD_ENTER/EXIT only as opt-in mutex; our cooperative single-poll model is equivalent (poll serialises access)
Concurrency bug appears only under TCP retransmit + scan in parallel
R14
ISSUES.md #25 root cause is not an unhandled event (hypothesis wrong)
M
L
Phase 2 event logging gives us signal either way; if hypothesis is wrong, we at least know what IS happening at 180 s
Compare before/after burst pattern with exhaustive event log
R15
Byte-identity gate slips on an unintended import during new-tree development
L
H
CI check run on every commit; fail-fast
sha256sum mismatch at commit time
R16
P2P/WFD action frame events appear unexpectedly even in STA-only
L
L
Event decoder's Event.unknown branch handles gracefully; no crash
All on-wire format assumptions in this plan are validated only against the specific firmware/NVRAM/CLM revisions committed under src/cyw43/firmware/ (and mirrored to src/cyw43_new/firmware.zig unchanged). Changing blobs requires rerunning §7.2 golden-trace regression fixtures and §7.3 H1/H2/H3/H12 hardware scenarios. A blob swap is treated as a Phase-3-equivalent re-validation, not a drop-in replacement.
Regression: golden-trace byte mismatch on fresh capture
Highest-residual-risk items (P+I both at least M): R1, R2, R3, R9, R14. Each of these has a specific gate in §7 that must pass before the next phase ships.
LICENSE.RP (reproduced in full at src/cyw43_new/LICENSE-REFERENCE.md) grants use on RP2040-family semiconductors. Key obligations for redistributions in source:
Retain the George Robotics / Raspberry Pi Ltd copyright notice.
Retain the list of conditions + disclaimer.
Redistributions in binary form: reproduce the notice in documentation/materials accompanying the binary.
Our rewrite is a derivative work informed by protocol behavior, control-flow, and reference identifiers. It does not incorporate verbatim copies of reference C code or translated bytecode. The Zig implementation itself is the pico project’s work and is licensed under this project’s license (pending license selection; currently unstated — flag for repo owners: choose a license before Phase 4 ship).
10.2 Top-level artifacts
At the root of src/cyw43_new/:
LICENSE-REFERENCE.md — full text of misc/pico-sdk/lib/cyw43-driver/LICENSE.RP, verbatim, with a preamble explaining that the file is reproduced for compliance under terms 2/3.
UPSTREAM.md — reference SHA (dd7568229f3bf7a37737b9e1ef250c26efe75b23), snapshot date, and the running log of intentional algorithmic deviations from the reference (M1, M2, ...). Same style as src/ruby/nanoruby/UPSTREAM.md. Phase 1 lands M1; Phases 2–4 add entries as they introduce local behaviors.
10.3 Per-file provenance header
Every Zig file under src/cyw43_new/ll/ and src/cyw43_new/ctrl/ (the modules whose algorithms are directly traceable to specific C files) begins with:
// This file is part of pico's pure-Zig CYW43 driver.//// Implementation informed by the protocol behavior and control-flow of// cyw43-driver's <reference_file>.c. Any direct borrowings of identifiers,// constants, or comments are attributed inline.//// Reference driver: https://github.com/georgerobotics/cyw43-driver// Reference SHA: dd7568229f3bf7a37737b9e1ef250c26efe75b23// Reference file/function lineage: see UPSTREAM.md § "Source mapping".// (A single Zig file may draw from multiple reference files when the// underlying logic is cross-cutting; the lineage table enumerates// each borrowing.)//// Copyright (C) 2019-2022 George Robotics Pty Ltd (reference driver;// licensed under LICENSE-REFERENCE.md terms)// Copyright (C) 2026 pico project contributors (this Zig translation)
Files under src/cyw43_new/transport/, bus/, and utility modules (config.zig, errors.zig, firmware.zig, hal.zig, types.zig) carry a reduced header that cites LICENSE-REFERENCE.md without per-file lineage (they are ports of constants and port-abstractions, not translated algorithms).
10.4 Lineage mapping appendix in UPSTREAM.md
A table mapping every Zig function with algorithmic lineage → reference C function + line range. Example fragment:
The appendix is populated incrementally — each phase’s commits update it as functions land.
10.4.1 Additional consulted references (beyond the primary pico-sdk reference)
These are non-primary references consulted for cross-verification, interop details, and alternative-implementation comparison. They are not directly derived from; the Zig code is written from behavioral understanding + the cross-referenced specs. Citations appear in UPSTREAM.md M-entries where a specific claim was supported by one of these sources.
master (read from torvalds/linux via WebFetch on 2026-04-20) — paths drivers/net/wireless/broadcom/brcm80211/brcmfmac/{cfg80211.c, fwil_types.h, feature.c}
GPL-2.0 (read-only; behavioral evidence)
PMKSA iovar spec (pmkid_info), chip-compat confirmation for CYW43439 (CY_CC_43439_CHIP_ID), WLC-version feature gating. See docs/CYW43-PMKSA-RESEARCH.md.
Third-opinion reference: baremetal-heapless state-machine patterns (TinyGo), alternative PIO SPI (bus_pico_pio.go), WHD-translated protocol/event definitions under whd/. Grep-confirmed: no PMKSA support.
Infineon WHD (docs only)
Public API reference at infineon.github.io/wifi-host-driver/
Apache-2.0 (docs consulted, source not cloned)
Cross-reference for PMKSA (confirmed no public set_pmksa API exposed), whd_wifi_set_pmk signature.
Discipline when consulting these:
License boundaries per §10.3. brcmfmac is GPL-2.0: read-only, no copying. Embassy and soypat are permissive: behavioral evidence preferred over copying, but copying with attribution is legally OK.
Cite in UPSTREAM.md when a claim rests on a specific source. e.g. "M5: event-49 handler: lineage — cyw43-driver L428-432 (PR #130); cross-verified with Embassy events.rs and soypat whd/asyncevent.go."
Prefer the primary pico-sdk reference when it is definitive. The other sources are tie-breakers and gap-fillers.
Staleness. These are snapshots. Re-clone and diff if a future issue suggests one of them has fixed something we're hitting.
10.5 UPSTREAM.md M-log convention
The local modification log works exactly like src/ruby/nanoruby/UPSTREAM.md:
### M1 — Parallel-tree scaffolding (Phase 1)
Problem: n/a (initial port).
Edits: created `src/cyw43_new/` with the 10 Phase-1 files.
Upstream intent: n/a (fork by design).
Acceptance: -Dengine=js byte-identical; -Dcyw43=new_shadow compiles.
### M2 — Exhaustive event decoder (Phase 2)
Problem: reference driver's non-exhaustive event handling silently drops events,
blocking diagnosis of ISSUES.md #25.
Edits: `ll/events.zig` includes the full 89-name table and logs every unknown
event with type+status+reason+flags+ifidx+payload-hex-prefix (first 16 bytes).
Difference from reference: exhaustive vs. opportunistic.
Acceptance: event log from 30-min hardware soak identifies 180 s cadence event.
### M3 — PMKSA clear-on-boot (Phase 3, mandatory improvement over reference)
Problem: stale firmware-side PMKID state after AP power-cycle causes
repeated 4-way handshake failures (H2 scenario). Reference driver does
not implement PMKSA management.
Pre-work: `src/cyw43_new/notes/pmksa_research.md` cross-references
brcmfmac (Linux GPL, read-only reference) and Infineon WHD (Apache 2.0).
Identified iovar: `pmkid_info` with bsscfg prefix + struct list payload.
See research doc for full wire-format citations.
Edits: `ctrl/pmksa.zig` (new, ~180 LOC), cache data structure with
LRU eviction, iovar pack/unpack, hook in `wifiOn()` for clear-on-boot
and in the event handler for DEAUTH-triggered eviction.
Acceptance: H2 passes on new driver with `.clear_on_boot`, demonstrably
fails on `-Dcyw43=old`. `.cache_in_boot` ships with DEAUTH eviction
verified via captured 802.11 fast-reauth exchange on same-BSSID
reconnect within a boot.
10.6 Handling future upstream fixes
When the reference driver publishes a fix (e.g. a CYW43 firmware vintage bump or a new iovar sequence for an edge case):
git -C misc/pico-sdk fetch && git log dd7568229f3bf7a37737b9e1ef250c26efe75b23..HEAD -- lib/cyw43-driver/ to see the delta.
Per-commit: examine the diff, decide whether it affects our port.
If yes: add an M-entry describing the upstream change and our port response.
Update the reference SHA in UPSTREAM.md only when a batch of upstream commits has been evaluated.
Section 11 — Go/no-go criteria for the next session
The coding AI that executes this plan must verify every item below before writing code. If any fails, iterate on the plan — don’t proceed.
11.1 Plan itself
This document exists at docs/CYW43-REWRITE.md and was peer-reviewed by GPT-5.4 via conversation pico-cyw43-rewrite-plan-2026. Peer critique log accessible via the user-ai MCP’s discuss tool with that conversation ID.
Every section referenced from §1 (overview) to §11 (this checklist) is populated.
Sections 3 (Zig-idiom style guide) and 10.3 (per-file attribution rules) have been re-read.
11.2 Audit depth
Every public function in cyw43.h has a row in §2.1's table.
Every state transition in cyw43_cb_process_async_event (cyw43_ctrl.c:333-439) is reflected in §2.3.1 + §6.1.
Every event type enumerated in cyw43_ll.h:67-82 has an entry in §2.2's event table.
Every major SDPCM / CDC / BDC code path in cyw43_ll.c is covered in §2.4.
The endianness matrix §2.4.2 has entries for every on-wire field the driver touches.
11.3 Reliability deliverables
Auto-reconnect on DEAUTH/DISASSOC is defined in §6.1 and surfaced as a Config.reconnect policy with explicit retry_badauth / retry_nonet gates defaulting OFF.
PMKSA is a mandatory Phase 3 deliverable per §5.9. clear_on_boot is the default and must land before Phase 4 cutover. cache_in_boot is a Phase 3 stretch; OK to defer the full cache implementation, not OK to defer clear_on_boot.
PMKSA pre-work artifact present:docs/CYW43-PMKSA-RESEARCH.md exists, cites specific brcmfmac source lines, specifies the iovar name (pmkid_info), wire payload (356 bytes = __le32 npmk + 16 × 22-byte entries), chip compatibility (CY_CC_43439_CHIP_ID in brcmfmac), and API version (legacy for our blob). Produced in the planning session alongside this document; Phase 3 P3b coding can proceed directly.
ICV_ERROR (event 49) handler present in the event dispatcher (§6.1). On receipt: set pend_rejoin = true; pend_rejoin_wpa = false; and schedule poll. Addresses pico-sdk #2153.
ICV_ERROR hardware verification: reproduce the pico-sdk #2153 power-save-induced ICV_ERROR scenario; new driver recovers within ~10 s; old driver shows silent flood.
PMKSA hardware verification: §7.3 H2 (router power-cycle) demonstrably succeeds with new driver + clear_on_boot AND demonstrably failed-or-degraded with the old driver on the same physical setup. Captured as UPSTREAM.md M-entry evidence.
Three-flag link-state model in §6.1 adopted; bitmask model abandoned per decision log (2026-04-20).
Crypto-drift class expanded beyond ICV_ERROR. §6.1 event-to-transition table includes pend_rejoin on MIC_ERROR (17), UNICAST_DECODE_ERROR (50), PSM_WATCHDOG (41). MULTICAST_DECODE_ERROR (51) gated behind the §6.1.1 threshold. All have host-side unit tests.
PSK_SUP reason=14 IGNORE rule is in §6.1 table AND in §7.1 test matrix item 4. Regression guard documented.
WPA3-SAE implemented per §5.10. wpa3_mode: Wpa3Mode = .auto default. AUTH FAIL reason=16 auth_type=3 triggers link-down (not badauth-retry). Fallback behavior per §5.10 table: wpa3_wpa2_aes_psk falls back to WPA2; wpa3_sae_aes_psk does NOT. sae_password iovar used for WPA3 join; WLC_SET_AUTH=3 for WLC auth cmd.
Beacon-loss → degraded transition (§6.2) driven by BCNLOST_MSG (31) event. GTK_PLUMBED or successful RX transitions degraded → up.
Firmware blob version recorded.firmware.zig exposes FW_VERSION = "7.95.61", FW_DATE = "2023-01-11", FW_BUILD = "abcd531 CY", FW_SHA256 = "..." as compile-time pub consts. The on-device banner line at boot should mention the firmware version so support issues have concrete data.
PMKID_CACHE event wired to cache_in_boot sync (§5.9.5) if Phase 3 P3c is pursued; otherwise explicitly deferred to Phase B with UPSTREAM.md note.
PMKSA clear-effectiveness negative proof. "We called the iovar" is insufficient. Capture observable evidence that the clear actually took effect. At least one of:
On-wire evidence: a 4-way-handshake exchange captured by a 802.11 monitor (wireshark with monitor-mode NIC, or hostapd logs) after a reconnect that would otherwise have reused a PMKID. Absence of the PMKID TLV in the Association Request confirms cache was cleared.
AP-side evidence: hostapd log lines showing "PMKSA cache entry not found" for our BSSID on the reconnect.
Timing evidence (weakest): reconnect latency measurably longer than a cached-reauth would produce, consistent with full 4-way handshake, across at least 10 trials.
At least the timing evidence path must be captured in every case, because it requires no special tooling; the on-wire or AP-side evidence is strongly preferred if a monitor-mode capture rig or AP log access is available. Paste the evidence into UPSTREAM.md M3.
The ISSUES.md #25 diagnostic path is explicitly called out in §2.8 (G13) and §8.2 (validation + rate-limited logging).
Unknown-event logging mechanism implemented per §6.1.4 (authoritative spec). Event.unknown struct has all 10 fields enumerated. EventLog ring buffer bounded at 16 distinct tuples with 5-second coalescing window. 89-entry event_names table populated. Driver.getEventLog() + clearEventLog() API exposed. wifi events UART shell command implemented at src/bindings/wifi.zig (Phase 3). Invariants (no allocation, no blocking, no driver-callback reentry) verified by reviewer.
Event-mask effectiveness gate (§8.2) is listed as a P2 hard requirement, not a wish.
P2 sign-off: at least one induced DEAUTH_IND and one induced DISASSOC have been captured on hardware and decoded with expected status/reason fields matching the reference tables (§2.2).
P3 sign-off: reconnect policy verified on hardware for each of the four trigger classes separately: transient loss (DEAUTH/DISASSOC), BADAUTH, NONET, and manual leave+rejoin. Each transition observed to match §6.1.
Wire-format gate: the first golden-trace diff review (§7.2) includes byte-level verification of (a) packCmd output for three representative read/write/backplane commands, (b) SDPCM header bytes for one CONTROL and one DATA frame, (c) CDC header bytes for one ioctl, (d) bsscfg:event_msgs payload bytes.
State-safety gate: no callback code path reachable from HostHooks (onEvent, onEthernetRx, onLinkUp, onLinkDown) can synchronously invoke doIoctl or sendEthernet while the driver is in an active RX/IOCTL dispatch. Audited via grep + code review.
Blob-coupling record: every golden trace fixture captured in §7.2 has a metadata header recording sha256(firmware/43439A0_combined.bin) and sha256(firmware/43439A0_nvram.bin) at capture time. Re-use of a trace requires current blob hashes to match.
11.4 Licensing
LICENSE-REFERENCE.md strategy defined (§10.2).
Per-file provenance header defined (§10.3).
Lineage mapping table scaffold present in §10.4.
Repo license selection flagged for repo-owner decision (§10.1). This is a true blocker for Phase 4 ship but not for Phase 1 start.
11.5 Risk register has mitigations for all H×H entries
Every idiom in §3 has been cross-referenced against ZIG-0.16.0-REFERENCE.md. In particular: std.mem.readInt with explicit endianness, callconv(std.builtin.CallingConvention.c), enum(u16) with trailing _, error unions, *anyopaque + vtable pattern.
The freestanding-stdlib discipline is preserved: no std.debug.print, no std.Io.File, no std.Io.Threaded.
11.7 Migration plan executable
Phase 1 scope (§8.1) fits in ~1 coding session without requiring hardware cutover.
Each phase has explicit validation criteria distinct from the next.
Rollback plan for each phase does not require data migration.
11.8 The hard stops
Do not write code if any of these is true:
The byte-identity baseline .preflight-baseline/pico-preintegration.uf2 does not exist or the SHA is not recorded in this repo.
The -Dcyw43=old|new|new_shadow build option has not been added (that’s Phase 1 step 1 — OK if not yet, but Phase 2+ hinges on it).
The project license is unresolved AND we are at or past Phase 4.
The firmware/NVRAM/CLM blobs under src/cyw43/firmware/ have changed since the last src/cyw43/device.zig::FW_LEN was measured, without §7.2 golden traces being re-captured. Blob changes are a Phase-3-equivalent re-validation event (R17).
12.3 Event mask (bsscfg:event_msgs) default construction
Reference starts with all 19 bytes = 0xff, then clears 6 specific bits. Event N lives at mask byte[N/8], bit N%8. The C reference (cyw43_ll.c:1895-1902) uses:
Blob-coupling note (R17 applies here). This 19-byte mask and the event-number-to-bit mapping are validated against the committed 7.95.61 firmware blob family (Cypress/Infineon CYW43439 family). A blob upgrade (e.g. to 7.95.62 or a post-WLC-12.0 vintage) may: (a) add new event numbers past 91, (b) re-assign existing event numbers (unlikely but possible), (c) change how mask bits are interpreted. Changing blobs requires re-running §7.2 golden-trace regression and verifying the mask still disables the intended events (ROAM, TXFAIL, RADIO, PROBREQ_MSG, IF, PROBRESP_MSG). Do not cargo-cult this hex into a blob upgrade without re-verification.
Sanity check against the events we specifically handle (§2.2):
All events in our §2.2 handler table have their mask bit ENABLED after the 6 standard clears:
MIC_ERROR (17) → byte[2] bit 1 = 1 ✓
PMKID_CACHE (21) → byte[2] bit 5 = 1 ✓
BCNLOST_MSG (31) → byte[3] bit 7 = 1 ✓
PSM_WATCHDOG (41) → byte[5] bit 1 = 1 ✓
ICV_ERROR (49) → byte[6] bit 1 = 1 ✓
UNICAST_DECODE_ERROR (50) → byte[6] bit 2 = 1 ✓
MULTICAST_DECODE_ERROR (51) → byte[6] bit 3 = 1 ✓
GTK_PLUMBED (84) → byte[10] bit 4 = 1 ✓
No mask changes needed to enable our new handlers; the broad default mask already delivers them. For future events added to the handler table: check this table, and if a required bit is in the cleared set, remove the corresponding CLR_EV equivalent in our mask-construction code.
Full iovar wire format (bsscfg:event_msgs with bsscfgidx prefix):
mfp — Management Frame Protection (required for WPA3; §5.10).
sae — WPA3-Personal (SAE) supported (§5.10).
btsdio — Bluetooth-over-SDIO. Not used in this rewrite (BT out of scope per §1).
wowlpf / tko / keepalive / pktfilter — power-save and offload features available for future phases.
Alternative vintages audited but not adopted:
Vintage
Delta
Reason not adopted
pico-sdk 7.95.49.00
Older (~2021)
Behind our current blob.
pico-sdk 7.95.59 (1YN Murata variant)
Older module variant
Murata 1YN hardware, not Pico W.
soypat 7.95.62 (Apr 2023)
Newer by 3 months, no btsdio tag
Marginal improvement; cost/benefit of R17 revalidation not justified by known delta. Consider if Phase 3 hardware surfaces a 7.95.61-specific bug.
CYW43_WIFI_FW_LEN is currently hard-coded in src/cyw43/device.zig as 231077; the new firmware.zig must expose the same length as a pub const and record the version string + SHA256 of the binary as compile-time constants so the on-device banner can report what vintage is running.
12.4.1 NVRAM variant verification checklist
Our NVRAM file is src/cyw43/firmware/43439A0_nvram.bin (742 bytes). The CYW43439 ships with module-specific NVRAMs — the wrong NVRAM results in degraded RF performance, wrong MAC OUI, or failed regulatory compliance. Phase 1 must confirm we have the correct Pico-W variant.
Verification (Phase 1, before first boot attempt):
# 1. Verify byte-for-byte or SHA256 match:
shasum -a 256 src/cyw43/firmware/43439A0_nvram.bin \
misc/embassy/cyw43-firmware/nvram_rp2040.bin
# If SHA256 matches: confirmed Pico-W variant. Done.# 2. If SHA256 differs but files are both 742 bytes: likely the same variant with# different text-encoding (embassy's is typically the original Cypress text NVRAM# converted to binary form — check if ours is too).# Use `strings src/cyw43/firmware/43439A0_nvram.bin | head -10` to see:# the expected content starts with something like:# "NVRAMRev=$Rev$"# "manfid=0x2d0"# "prodid=0x0727"# "vendid=0x14e4"# "devid=0x43e2" # CYW43439 device id# "boardtype=0x0887" # Murata 1YN or similar# 3. If text-encoding differs, spot-check key fields match rp2040.bin equivalent:# - boardtype# - macaddr (if baked)# - regulatory (country/region code)# - crystal freq (xtalfreq=37400 for Pico W)
Acceptance: either byte-identical to Embassy's nvram_rp2040.bin, or identical boardtype + xtalfreq + regulatory fields with documented differences in a Phase 1 UPSTREAM.md M-entry.
On a mismatch: do not boot Pico W with a mismatched NVRAM; RF tuning is module-specific and a wrong NVRAM can produce FCC-non-compliant emissions or cause PA overdrive. Use Embassy's nvram_rp2040.bin as the authoritative Pico W NVRAM.
#15: SPI backplane block writes MUST use 64-byte chunks. ll/boot.zig upload loop preserves this.
#16: Bulk firmware payload words must be LE-packed. bus/bus.zig writeBytes preserves this.
#17: Backplane window registers WRITE-ONLY from SPI. bus/backplane.zig owns the cache (§2.4.3).
#18: SDPCM credit check before every IOCTL send. ll/ioctl.zig enforces via frame.hasCredit().
#19: bsscfg: iovars encode extra u32 interface index. ll/ioctl.zig::setBsscfgIovarU32 encapsulates.
#20: pollDevice must drain ALL pending packets. ctrl/poll.zig loop-until-none.
#21: BDC TX header must use version 2 (0x20). ll/frame.zig::packBdc hard-codes this with a comment citing the gotcha.
End of plan.
Plan produced 2026-04-20 for the pico project. Peer-reviewed by GPT-5.4 in conversation pico-cyw43-rewrite-plan-2026. Execution target: subsequent coding sessions. See §11 for pre-execution checklist.
Hard-won findings from bring-up on Pico W hardware, April 2026. Cross-referenced against four implementations: Pico SDK, Embassy (Rust), PicoWi (C bit-bang), and our Zig driver.
The Pico W SPI Interface
The CYW43439 on the Pico W uses a nonstandard half-duplex SPI on a single shared data line:
Signal
RP2040 GPIO
Function
WL_REG_ON
GPIO23
Power enable (active high)
WL_D
GPIO24
Shared: MOSI + MISO + IRQ
WL_CS
GPIO25
Chip select (active low)
WL_CLK
GPIO29
SPI clock
GPIO24 is shared via resistor network:
SDIO_CMD (SPI MOSI) connected directly
SDIO_DATA0 (SPI MISO) connected via 470 ohm protection resistor
SDIO_DATA1 (IRQ) connected via 10K resistor
SDIO_DATA2 (mode select) determines SPI vs SDIO at power-up
Power-Up Sequence (Critical)
The DATA pin state at power-up selects SPI vs SDIO mode:
WL_DATA must be OUTPUT LOW before WL_ON goes high — this selects SPI mode
WL_ON LOW for >= 20ms (power down)
WL_ON HIGH (power up, DATA=LOW selects SPI)
Wait 250ms (SDK uses 250ms, not 50ms)
Switch DATA to input for SPI operation
If DATA floats high during power-up, the chip enters SDIO mode and will not respond to gSPI commands.
The CYW43439 has two internal clock states that gate what the host can do:
ALP (Active Low Power) — a slow clock sufficient for SPI bus access, register reads/writes, and backplane windowing. ALP is available shortly after power-up. With ALP, the host can read chip ID, program the backplane window, and upload firmware to RAM. But the WLAN ARM core cannot execute firmware at full speed on ALP alone.
HT (High Throughput) — the full-speed clock required for firmware execution, packet processing, and radio operation. HT becomes available only after the firmware has been uploaded, the WLAN core is released from reset, and the firmware successfully boots. The firmware itself switches the chip from ALP to HT and sets the HT_AVAIL bit in the chip clock CSR (0x1000E).
The host bring-up sequence interacts with these clocks as follows:
After power-up, request ALP by writing ALP_AVAIL_REQ (0x08) to the clock CSR
Poll until ALP_AVAIL (0x40) appears — the chip is now awake enough for bus access
Upload firmware and NVRAM, write the NVRAM token
Release the WLAN core from reset
Poll the clock CSR for HT_AVAIL (0x80) — this means the firmware has booted
Once HT is available, the firmware is running and ready for IOCTL commands
"HT-ready" in this project's documentation means: the firmware has booted and is running at full speed, ready to accept control commands.
gSPI Command Word Format
32-bit command, packed as a C bitfield on little-endian ARM:
gSPI sends command and data bytes in LITTLE-ENDIAN order (LSByte first), with MSbit-first within each byte.
For command word 0x4000A004 (read, incr, func0, addr=0x14, len=4):
Memory on LE ARM: [04, A0, 00, 40]
Wire order: 04 first, then A0, then 00, then 40
Each byte sent bit 7 first
This was confirmed by cross-referencing three independent implementations:
PicoWi (C bit-bang, definitive proof)
spi_write((uint8_t*)&msg, 32); // sends raw struct bytes, byte 0 first
On LE ARM, byte 0 of a u32 is the LSByte. PicoWi's spi_write starts from byte 0, sending MSbit-first within each byte.
Pico SDK (PIO + DMA)
buf[0] =SWAP32(make_cmd(false, true, fn, reg, 4));
// DMA with BSWAP=true transfers to PIO TX FIFO
SWAP32 is ARM rev16 (swap bytes within each halfword)
DMA BSWAP is a full byte reverse for 32-bit transfers (0xAABBCCDD -> 0xDDCCBBAA)
Combined: rev16 then bswap32 produce the correct byte order in the PIO FIFO
PIO shifts MSBit-first, producing LSByte-first on wire
Critical correction: DMA BSWAP for 32-bit words is a full byte reverse, NOT rev16. This was the source of initial confusion. The RP2040 datasheet says "the two bytes of the two halfwords are each reversed" which is misleading — for word transfers it's a complete reversal.
Embassy (Rust PIO)
Uses DMA with byte-swap and shift_out.direction = ShiftDirection::Left (MSB-first).
The net effect matches: LSByte-first on wire.
Implication for our Zig PIO driver
Since our PIO shifts bit 31 first (MSB-first), we must swapEndian (full byte reverse) the command word before pushing to the TX FIFO:
txPut(swapEndian(cmd)); // 0x4000A004 -> 0x04A00040 -> PIO sends 04,A0,00,40
Response Byte Order
Responses also arrive in LE byte order. The PIO captures 32 bits MSB-first into the ISR. After swapEndian, the correct host-native value is recovered.
For the test register:
Wire: AD BE ED FE (LSByte of 0xFEEDBEAD first)
PIO ISR: 0xADBEEDFE
After swapEndian: 0xFEEDBEAD
SPI Clock Phase (SPI Mode)
CYW43 gSPI uses CPOL=0, CPHA=0 (SPI Mode 0) with a half-duplex twist:
TX Phase (host to device)
Host drives data while CLK is LOW
CYW43 samples on CLK RISING edge
RX Phase (device to host) — SUBTLE
The device drives data on the CLK rising edge. The host must sample after the data settles:
Implementation
Sample point
PIO instruction
PicoWi (bit-bang)
Before CLK cycle (CLK is LOW from previous)
read; CLK high; CLK low
Pico SDK (high speed)
CLK LOW (falling edge)
in pins, 1 side 0
Embassy (low speed)
CLK HIGH (after rising)
in pins, 1 side 1
Embassy (high speed)
CLK LOW (falling edge)
in pins, 1 side 0
At low SPI speeds (~1 MHz), either edge works because data is stable for a long time. At high speeds (>30 MHz), falling-edge sampling is preferred.
Turnaround (Direction Switch)
After the 32-bit command, the host releases the DATA line and the CYW43 starts driving it for the response. The turnaround gap between TX and RX is implementation-dependent:
Implementation
Built-in gap clocks
Configurable?
Pico SDKspi_gap01_sample0
1 (nop side 1)
No
Pico SDKspi_gap010_sample1
2
No
PicoWi (bit-bang)
0 (just a usdelay)
N/A
Embassy (overclock)
2
No
Embassy (high speed)
1
No
Embassy (low speed)
1 (nop side 0)
No
The CYW43's SPI_RESP_DELAY_Fx registers add additional device-side delay. These must be coordinated with the host turnaround:
Before bus config: RESP_DELAY defaults to 0. Use minimal host turnaround.
After bus config: Set RESP_DELAY to match host turnaround.
For backplane reads (function 1), additional response padding is inserted before the response data. The SDK defines CYW43_BACKPLANE_READ_PAD_LEN_BYTES = 16 for SPI (4 words), but the current proven Zig path uses 4-byte padding because the SPI_RESP_DELAY_F1 write was not yet shown to take effect reliably. Treat 16 bytes as the reference/SDK behavior, and 4 bytes as the currently working implementation detail.
PIO Pin Configuration (RP2040-specific)
Side-set drives value, NOT output enable
PIO side-set controls the pin value but does NOT set the output enable. You must explicitly set pindirs for the CLK pin:
// SET_BASE targets data_pin — set data OEexecImmediate(pioSet(DST_PINDIRS, 1));
// Temporarily retarget SET_BASE to clk_pin — set clock OEhal.regWrite(pinctrl_addr, modified_pinctrl_with_clk_as_set_base);
execImmediate(pioSet(DST_PINDIRS, 1));
hal.regWrite(pinctrl_addr, original_pinctrl); // restore
Without this, the CLK pin stays as input and no clock signal reaches the CYW43.
PINCTRL must include IN_BASE
The in pins, 1 instruction reads from IN_BASE, not from OUT_BASE or SET_BASE. If IN_BASE is not set to the data pin, reads sample the wrong GPIO.
FSTAT bit positions
FSTAT register for SM0:
Bit 0: RXFULL
Bit 8: RXEMPTY <-- use this for "has data" check
Bit 16: TXFULL
Bit 24: TXEMPTY
Common bug: checking RXFULL (bit 0) instead of RXEMPTY (bit 8). On an empty FIFO, RXFULL=0 which makes drainRx() loop forever.
Pull Configuration
Implementation
DATA pin pull
Pico SDK
Pull-DOWN
Embassy
No pull
PicoWi
External pull-up on module
The SDK uses pull-down. The CYW43 module may have its own pull-ups. For debugging, pull-down is recommended — it distinguishes "line undriven" (reads 0) from "device driving high" (reads 1).
Register Access Patterns
Function 0 (bus core) reads
No response padding
RESP_DELAY applies directly
Function 1 (backplane) reads
4 extra padding bytes before response data
SDK: if (func == BACKPLANE_FUNCTION) msg.hdr.len += 4; and reads 4 extra bytes
All register reads use incr=true
The SDK sets the auto-increment bit for ALL register reads, not just block transfers.
The SDK uses read_reg_u32_swap() for the initial test register read, which applies SWAP32 (rev16) on both command and response. This is because the initial bus state may have different byte ordering before WORD_LENGTH_32 is configured.
Proven Zig Bring-Up Configuration
The current proven Zig path uses two distinct SPI access modes:
Pre-config (16-bit halfword mode) — swap16x2 + swapEndian on commands and responses
Post-config (32-bit word mode) — raw commands and raw 32-bit register access, with bulk payload words packed little-endian
The mode switch in bus.initBus() is:
Phase 1: readReg32Swap() reads the test register and verifies 0xFEEDBEAD
Phase 2: writeReg32Swap() enables WORD_LENGTH_32
Phase 3: all subsequent access uses raw helpers (cmdReadRaw, cmdWriteRaw)
The proven RP2040 PIO program matches the SDK spi_gap01_sample0 shape:
0: out pins, 1 side 0 ; TX bit, CLK LOW
1: jmp x--, 0 side 1 ; CLK HIGH, loop
2: set pindirs, 0 side 0 ; turnaround: DATA=input
3: nop side 1 ; 1 gap clock
4: in pins, 1 side 0 ; RX sample, CLK LOW
5: jmp y--, 4 side 1 ; CLK HIGH, loop
Operational notes from the proven path:
host preloads X = tx_bits - 1 and Y = rx_bits - 1 for each transfer
autopull/autopush use 32-bit thresholds
current proven backplane read path uses 1 padding word (4 bytes)
backplane block writes use 64-byte chunks
STATUS_ENABLE remains disabled on the current Zig path because it prepends a status word to every response and complicates parsing during bring-up
the current proven clock pad config matches the SDK: 12 mA drive + fast slew
Key Test Values
Register
Address
Expected Value
SPI_TEST_REGISTER
0x14
0xFEEDBEAD
SPI_TEST_RW
0x18
Write/readback
CHIPCOMMON_CHIPID
backplane 0x18000000
raw word 0x1545A9AF; low 16 bits 0xA9AF = 43439 decimal = CYW43439
Firmware Blobs Required
The current Zig build uses two embedded files:
43439A0_combined.bin (~227 KB) — combined WLAN firmware + CLM blob in the SDK/Embassy combined layout
43439A0_nvram.bin (~742 B) — board-specific config (antenna, crystal, power)
In 32-bit word mode, ALL SPI transfers are 32-bit words, even for 8-bit register accesses.
Empirically working path (proven on Pico W hardware):
Write: cmdWriteRaw(cmd, &[_]u32{@as(u32, val)}) — value in LSByte of u32
Read: @truncate(result[0]) — extract LSByte from raw PIO result
The CYW43 direct backplane registers (0x1000x range) appear to handle byte-lane positioning internally. The earlier hypothesis that 8-bit values needed val << 24 (MSByte positioning) was investigated but the LSByte path works empirically for all tested registers including the backplane window bytes and clock CSR.
The critical companion fix was PIO TXSTALL wait: without waiting for the PIO shift engine to finish before CS release, write-only transactions could be truncated on the wire, causing register writes to silently fail.
firmware verify OK (231KB Embassy-matched pair, 64-byte chunks, LE packing)
HT clock OK after firmware upload
F2 ready wait before first IOCTL
MAC read via cur_etheraddr iovar: 28:CD:C1:10:3E:1B
CLM upload via clmload iovar: status 0
LED blink via gpioout iovar: visually confirmed
Wi-Fi UP via WLC_DOWN → country → event_msgs → WLC_UP
Wi-Fi scan via escan iovar: 56 ESCAN_RESULT events, real SSIDs discovered
The CHIPCOMMON_CHIPID register uses the standard Broadcom Silicon Backplane format:
bits [15:0] = chip ID (decimal chip number as hex: 43439 = 0xA9AF)
bits [19:16] = chip revision (0x5 for our CYW43439)
bits [31:20] = package/other info
Full raw word 0x1545A9AF breaks down as: ID=0xA9AF, rev=0x5, pkg=0x154.
This encoding is standard across the Broadcom SBP family: BCM4329 stores 0x4329, BCM43438 stores 0xA99E, CYW43439 stores 0xA9AF. The marketing name 0x4373 is NOT the chipcommon register value.
Bugs Found During Bring-Up
CLK pin OE not set — side-set drives value only; must explicitly set pindirs
FSTAT RXEMPTY vs RXFULL — bit 8, not bit 0; wrong check causes infinite drain loop
Command byte order — must be LE on wire; requires swapEndian before PIO TX
swap16x2 is required before WORD_LENGTH_32 — the initial 16-bit halfword mode swaps bytes within each halfword; the test-register path must account for this.
1-bit alignment behavior differs at low speed — the SDK gap program can produce a 1-bit response shift around ~1 MHz, while Embassy's no-gap path works there. At >30 MHz, the proven path is the SDK-style gap program.
DATA pin must be LOW at power-up — selects SPI mode; floating high = SDIO mode
DMA BSWAP is full byte reverse — not rev16 as RP2040 docs misleadingly suggest
STATUS_ENABLE prepends a status word — enabling it adds an extra 32-bit word to every response. The current proven Zig path leaves it disabled.
PIO TXSTALL wait required for write-only transactions — without waiting for FDEBUG.TXSTALL, CS releases before final bits leave the wire, causing backplane window writes to silently fail. Copied from the SDK's write-only PIO path.
Backplane window write order matters — must be HIGH/MID/LOW (matching SDK), not LOW/MID/HIGH.
CHIPCOMMON_CHIPID uses decimal chip number in low 16 bits — CYW43439 reports 0xA9AF in low 16 bits, not 0x4373.
SPI backplane block writes must use 64-byte chunks — SDK defines CYW43_BUS_MAX_BLOCK_SIZE = 64 for SPI. This is a hardware constraint of the CYW43's SPI-to-backplane bridge FIFO. Writing 512-byte blocks silently corrupts firmware uploads even if small synthetic test writes appear fine.
Bulk firmware payload words must be little-endian packed — the final firmware boot blocker was a host-side byte swap inside each 32-bit bulk payload word. Full-image verification caught this at offset 0x1000: expected 0x00801BD4, got 0xD41B8000. Little-endian payload packing fixed the upload and allowed the firmware to boot.
Backplane window registers are write-only from SPI — cannot read back 0x1000A/B/C to verify. Track window state in software and only write changed bytes. Force-write all three bytes after any error recovery. The SDK resets to CHIPCOMMON_BASE_ADDRESS after each backplane access.
SDK documents 16-byte SPI backplane read padding, but the current proven path uses 4 bytes — keep this distinction explicit in docs until the SPI_RESP_DELAY_F1 configuration path is independently verified.
Firmware and CLM must be a matched pair — using a 224KB firmware with a 984-byte CLM from a different release produced clmload_status=3 (BCME_BADOPTION). Switching to Embassy's matched pair (231KB FW + 984B CLM from the same wb43439A0_7_95_49_00_combined.h) gave status 0.
F2 ready wait is required before first IOCTL — after HT clock, poll STATUS_F2_RX_READY (bit 5 of SPI status register) before sending any IOCTLs. Without this, the first IOCTL times out.
Event mask must be configured before scan — set event_msgs iovar to enable ESCAN_RESULT delivery. Without this, scan events are never generated and the scan poll loop sees zero packets.
pollDevice() must drain all pending packets — event and control responses can arrive back-to-back. Reading only one packet per poll loses the second.
BDC TX header must use version 2 (0x20) — version 0 silently drops data-channel frames; both Pico SDK and Embassy use BDC v2.