Skip to content

Instantly share code, notes, and snippets.

@pdp7
Created April 28, 2026 01:00
Show Gist options
  • Select an option

  • Save pdp7/6f7e7091478bdd3b4b26e088feb36b74 to your computer and use it in GitHub Desktop.

Select an option

Save pdp7/6f7e7091478bdd3b4b26e088feb36b74 to your computer and use it in GitHub Desktop.
MPAM QoS features and MPAM support in Linux resctrl

MPAM QoS features and MPAM support in Linux resctrl

A technical survey of Arm's Memory Partitioning and Monitoring (MPAM) extension and the multi-year effort to expose it through the Linux resctrl filesystem. Written against this tree (Linux v7.0-base, RISC-V CBQRI development branch); file paths and message-IDs cite primary sources where possible.


1. Executive summary

MPAM is Arm's optional architectural extension for memory-system QoS. It tags every memory transaction with a 16-bit PARTID (partition ID) and an 8-bit PMG (performance monitoring group), and provides MMIO-mapped controllers ("Memory System Components", MSCs) that consume those tags to enforce capacity and bandwidth policies and to count traffic. The architecture is published as the Memory System Resource Partitioning and Monitoring (MPAM), for Armv8-A specification (ARM DDI 0598B/db).

Linux exposes MPAM through resctrl, the same pseudo-filesystem originally introduced for Intel RDT. Reaching that point required ~7 years of cross-arch refactoring driven mostly by James Morse (Arm), with later contributions from Joey Gouly, Ben Horgan, and Rohit Mathew. The work concluded with three big milestones:

Milestone Series Lands
resctrl becomes cross-arch (data-struct split, CDP merge, IPI rework, bytes-cleanup, fs/move) 2018–2025 prep series, terminating in [PATCH v12] x86/resctrl: Move the resctrl filesystem code to /fs/resctrl v6.16 (mid-2025)
arm64 cpufeature MPAM detection + KVM trap Joey Gouly v6 (Oct 30, 2024) v6.13
The basic MPAM driver (no userspace yet) "arm_mpam: Add basic mpam driver" v1→v6 (Aug–Nov 2025) v7.0
MPAM↔resctrl glue (the user-visible piece) "arm_mpam: resctrl..." RFC→v6 (Dec 2025–Mar 2026) v7.1

In this tree (v7.0-base), the basic driver is present at drivers/resctrl/mpam_devices.c (2725 lines) but is gated behind CONFIG_EXPERT and explicitly does not yet select ARCH_HAS_CPU_RESCTRL (Kconfig comment "does nothing yet"). The wiring lives in the v7.1 glue series.


2. What MPAM provides (the architecture)

2.1 Tagging model

Every CPU thread carries two MPAM identifiers in its execution context:

  • PARTID — selects which partition policy applies. Width is IMPLEMENTATION_DEFINED, up to 16 bits. Resources are configured by writing policy registers indexed by PARTID.
  • PMG — a "performance monitoring group" sub-index, up to 8 bits. PARTID selects the policy; (PARTID, PMG) together act as the filter for monitoring counters. PMG is independent of PARTID for control purposes and is shared across the whole machine.

These propagate alongside the transaction through the SoC interconnect.

The thread's tags come from system registers per Exception Level (MPAM0_EL1, MPAM1_EL1, MPAM2_EL2, MPAM3_EL3). EL2/EL3 have hooks for trapping or remapping (MPAMHCR_EL2, virtualisation registers MPAMVPM0–7_EL2 / MPAMVPMV_EL2). KVM saves/restores these during world switches.

2.2 Hardware containers

MSC (Memory System Component)     ← physical block: a cache, DDR controller, IOMMU, …
 ├── RIS  (Resource Instance Selector)   ← logical slice within an MSC
 │     └── controls / monitors for one resource type
 ├── partsel registers                  ← MPAMCFG_PART_SEL: pick PARTID
 ├── monsel  registers                  ← MSMON_CFG_MON_SEL: pick a monitor
 └── interrupts (errors, overflow)

An MSC is the firmware-described physical container. A RIS is the smallest controllable unit, used when a single MSC virtualises several caches or several memory channels. The Linux driver groups RIS-es into a component (all RIS-es for one resource on one topology level — e.g. an L2 slice) and groups components into a class (all L2s, all L3s, all memory).

2.3 Allocation features (control)

Selected at MMIO offset MPAMCFG_* after writing a PARTID into MPAMCFG_PART_SEL:

Feature Register Meaning
CPBM — Cache portion bitmap MPAMCFG_CPBM Bitmap of "ways"/portions a PARTID may use. Equivalent of x86 CAT's CBM.
CMAX — Cache max-capacity MPAMCFG_CMAX Soft cap on cache footprint (not way-based).
CMIN — Cache min-capacity MPAMCFG_CMIN Reservation.
MBW_PBM — Memory bandwidth portion BM MPAMCFG_MBW_PBM Bandwidth portion bitmap, analogous to AMD MBA bitmap.
MBW_MAX — Memory bandwidth max MPAMCFG_MBW_MAX Max as a fraction of total. Equivalent of x86/AMD MBA max-throttle.
MBW_MIN — Memory bandwidth min MPAMCFG_MBW_MIN Reservation.
MBW_PROP — Proportional stride MPAMCFG_MBW_PROP Weighted-share scheduler input.
PRI / DSPRI / INTPRI MPAMCFG_PRI Internal/downstream priority for arbitration.
CCAP — Cache capacity MPAMCFG_CCAP Same idea as CMAX, finer-grain partitions.
Reset value n/a A PARTID's policies on reset are typically "unrestricted".

Not every MSC implements every feature; the driver discovers feature presence by reading MPAMF_IDR and friends (see drivers/resctrl/mpam_internal.h, register layout block).

2.4 Monitoring features (counting)

Selected at MMIO offset MSMON_* after writing into MSMON_CFG_MON_SEL:

Feature Register Meaning
CSU — Cache storage usage MSMON_CSU "How many bytes of this cache does PARTID/PMG occupy right now?" Equivalent of x86 LLC occupancy (CMT).
MBWU — Memory bandwidth usage MSMON_MBWU Counter of bytes traversing the MSC for a (PARTID, PMG). Equivalent of x86 MBM_TOTAL/_LOCAL.
OFLOW interrupt MSMON_OFLOW_* Optional overflow signalling.

MBWU counters come in three width variants the architecture allows (IMPLEMENTATION_DEFINED): 31-bit (one 32-bit register, top bit reserved), 44-bit (long counter, two 32-bit reads), 63-bit ("LWD" long-wide counter, two 32-bit reads). Driver state-tracks both 31-bit overflow and power-management saves/restores; see Rohit Mathew's two patches "Probe for long/lwd mbwu counters" / "Use long MBWU counters if supported".

A monitor in MPAM is a filter slot, not per-PARTID storage — there are finite filter slots on an MSC, and the architecture only requires one. This is the counter assignment model (cf. AMD ABMC) that motivates the resctrl "assignable counters" hooks (resctrl_arch_config_cntr, resctrl_arch_cntr_read, resctrl_arch_reset_cntr).

2.5 The mpam-fb firmware interface (the locking constraint)

A platform may expose its MSCs through a still-alpha firmware specification ("mpam-fb", proxied via PCC). On those systems the OS cannot poke the MMIO directly — it must write a request and wait for a firmware interrupt before the next access. This is too slow for IPIs, and it's the reason the 2022–2024 series had to convert resctrl's synchronous IPI-based reads to a work queue, and the reason the driver's locking is mutex-based around part_sel_lock / mon_sel_lock rather than spinlocks. James Morse's cover letters note this constraint will not be revisited even if it makes the locking look "very strange" by day-1.


3. ARM architecture integration in Linux

Item Source
CPU feature bit ARM64_MPAM arch/arm64/include/asm/cpucaps.h, decoded via ID_AA64PFR0_EL1.MPAM and ID_AA64PFR1_EL1.MPAM_frac
Boot-time discovery & override arch/arm64/kernel/pi/idreg-override.c (arm64.nompam cmdline), arch/arm64/kernel/cpufeature.c
MPAMIDR_EL1 reads deferred init in arch/arm64/kernel/cpuinfo.c
EL2 setup (MPAM trap defaults) arch/arm64/include/asm/el2_setup.h
KVM register mapping (VNCR offsets) arch/arm64/include/asm/vncr_mapping.h (MPAM1_EL1, MPAMHCR_EL2, MPAMVPMV_EL2, MPAMVPM0–7_EL2)
KVM ID-reg masking hides MPAM bits from guests when host can't context-switch them — Joey Gouly's v6 series

All of the above landed in v6.13 (the Joey Gouly series 20241030160317.2528209-1-joey.gouly@arm.com).


4. ACPI / firmware

MPAM is described to the OS through a dedicated ACPI table:

  • ACPI MPAM table v2 — Rafael Wysocki's "ACPICA: Add support for Arm's MPAM ACPI table version 2" (Apr 2023). Header struct acpi_table_mpam in include/acpi/actbl*.h.
  • Parserdrivers/acpi/arm64/mpam.c (411 lines):
    • acpi_mpam_count_msc() — top-level count of MSC nodes
    • acpi_mpam_parse_resources() — per-MSC RIS list
    • acpi_mpam_register_irq() — IRQ registration helper
    • Resource references describe which cache (by PPTT cache-id) or memory domain (by proximity domain) an MSC covers.
  • PPTT prerequisites — Several patches before the driver itself land cacheinfo/PPTT helpers that find the cache level by id, or fill a cpumask from a processor container or cache-id. These are reused on every arch.
  • PCC — when the platform uses mpam-fb, the parsing path follows a PCC mailbox descriptor.
  • DT — A binding existed in v1 of the basic driver but was removed in v2 pending real DTS contribution from a platform vendor. The driver is ACPI-only as currently merged.

5. The resctrl cross-arch refactor

resctrl was originally an x86-only filesystem under arch/x86/kernel/cpu/resctrl/. Bringing MPAM (and later RISC-V CBQRI) under the same UAPI required peeling the "fs" layer off the "arch" layer in seven sequential series, each landed before the next began.

Chronology

Year Series Notes / message-id Status
2018 RFC v1 — first cross-arch proposal (20 patches) 20180824104519.11203-1-james.morse@arm.com superseded
2020 RFC v2 — reduced to 2 patches, "tip of the MPAM iceberg" 20200214182947.39194-1-james.morse@arm.com superseded
2020 Misc cleanup (sparse-bitmaps, linear-MBA, get_cache_id) v1–v5 20200214182401.39008-1-james.morse@arm.com series merged ~late 2020
2020 KVM "Hide unsupported MPAM from the guest" (single) 20200925160102.118858-1-james.morse@arm.com dropped, revived 2024
2020 "Set your time-machine to 2020" — the cited what-is-MPAM cover 20201030161120.227225-1-james.morse@arm.com (CDP merge v1) superseded by v7
2021 Merge the CDP resources v2–v7 (introduces struct resctrl_schema) terminating 20210728170637.25610-1-james.morse@arm.com merged ~v5.16
2021–2022 resctrl_arch_rmid_read() returns bytes v1–v4 terminating 20220412124419.30689-1-james.morse@arm.com merged ~v6.1
2022–2024 Monitored CLOSID+RMID together / arch+fs locking split / IPI→workqueue rework v1–v9 terminating 20240213184438.16675-1-james.morse@arm.com merged ~v6.10
2023–2024 arm64 cpufeature MPAM detection (James → Joey Gouly v6) 20241030160317.2528209-1-joey.gouly@arm.com merged ~v6.13
2024–2025 Move resctrl filesystem code to /fs/resctrl v1–v12 terminating 20250515165855.31452-1-james.morse@arm.com merged ~v6.16
2025 "arm_mpam: Add basic mpam driver" RFC, v1–v6 (Ben Horgan took over from v4) terminating 20251119122305.302149-1-ben.horgan@arm.com merged ~v7.0
2025–2026 "arm_mpam: resctrl..." glue series RFC–v6 (the user-visible piece) terminating 20260313144617.3420416-1-ben.horgan@arm.com merged ~v7.1
2026 Glue fixes, ABMC counter-assignment paving, Reinette's quality cleanup various merged into v7.1-rcN

The IPI rework (item 8) is the named MPAM blocker: x86's resctrl read path sent an IPI to the target CPU, which is fine for MSR-based RDT but unworkable for mpam-fb. The rework converts those reads to scheduled work on the target domain.

A parallel Marvell-only path (Amit Singh Tomar, Jan 2024, "ARM: MPAM: add support for priority partitioning control") was redirected by Reinette Chatre and superseded by James's basic driver.


6. The cross-arch interface (include/linux/resctrl.h, 730 lines)

After the v6.16 move, fs/resctrl/{ctrlmondata.c,monitor.c,pseudo_lock.c,rdtgroup.c,internal.h,monitor_trace.h} (≈9.7k lines) is generic. Architectures plug in via a fixed set of resctrl_arch_* hooks and the struct rdt_resource / struct rdt_*_domain data model.

6.1 Resource & domain model

  • enum resctrl_res_levelRDT_RESOURCE_L3, _L2, _MBA, _SMBA, plus RISC-V's _RBWB, _MWEIGHT, _PERF_PKG. MPAM expects to register L3, L2, MBA, and MBM monitoring domains; CDP is folded into per-CTRL "schema".
  • struct rdt_resource (resctrl.h:330) — capabilities (alloc_capable, mon_capable), control/monitoring scopes (cache level, package, NUMA node), embedded resctrl_cache / resctrl_membw / resctrl_mon feature blocks, RCU-protected ctrl_domains and mon_domains lists.
  • struct rdt_ctrl_domain (resctrl.h:164) — group of CPUs sharing one control instance; carries staged_config[CDP_*].
  • struct rdt_l3_mon_domain (resctrl.h:198) — group of CPUs sharing one monitoring instance; carries rmid_busy_llc, mbm_states, the limbo worker.
  • struct resctrl_schema (resctrl.h:366) — what userspace sees in /sys/fs/resctrl/info/<NAME>/. CDP becomes two schemas (CODE / DATA) over one rdt_resource.

6.2 The resctrl_arch_* hook surface

Discovery & shape:

  • resctrl_arch_get_resource(level)struct rdt_resource *
  • resctrl_arch_get_num_closid(r) (CLOSID == PARTID for MPAM)
  • resctrl_arch_system_num_rmid_idx() (combined CLOSID+RMID index for MPAM and RISC-V; flat for x86)
  • resctrl_arch_is_evt_configurable(evt)

Task switch:

  • resctrl_arch_sync_cpu_closid_rmid(info) — IPI callback invoked on context switches to update per-CPU PARTID/PMG context (writes MPAM0_EL1/MPAM1_EL1 on arm64).

Apply / read control:

  • resctrl_arch_update_one(r, d, closid, val, t) — apply one schemata value
  • resctrl_arch_update_domains(r, closid)
  • resctrl_arch_get_config(r, d, closid, t)
  • resctrl_arch_reset_all_ctrls(r)

Monitoring (the MPAM-shaped path):

  • resctrl_arch_rmid_read(r, hdr, closid, rmid, eid, val, arch_mon_ctx) — may sleep; for MPAM the arch_mon_ctx carries the allocated monitor filter slot (see comment at resctrl.h:549–550).
  • resctrl_arch_mon_ctx_alloc() / _free() — get a free MPAM monitor.
  • resctrl_arch_reset_rmid() / _rmid_all().

Counter-assignment (ABMC + MPAM):

  • resctrl_arch_mbm_cntr_assign_enabled(r) / _set(r, enable)
  • resctrl_arch_config_cntr() — assign / unassign a counter to an event
  • resctrl_arch_cntr_read() / resctrl_arch_reset_cntr()

Event-config (for hardware that filters MBM events, e.g. local-vs-total):

  • resctrl_arch_mon_event_config_write() / _read() — IPI-shaped write/read

CDP, IO-allocation, pre-mount, pseudo-lock arch hooks round out the surface.

Domain lifecycle hooks called by the architecture into the FS:

  • resctrl_online_ctrl_domain() / resctrl_offline_ctrl_domain()
  • resctrl_online_mon_domain() / resctrl_offline_mon_domain()

6.3 Terminology mapping

Concept x86 RDT AMD MPAM RISC-V CBQRI
Allocation tag CLOSID CLOSID PARTID RCID
Monitoring tag RMID RMID PMG MCID
Cache-portion bitmap CBM (CAT) CBM (CAT) CPBM (per-AT CBM)
Code/Data split CDP CDP (planned, see v7.1 glue, currently CONFIG_EXPERT) AT (access-type)
Bandwidth control MBA (max %) MBA bitmap MBW_MAX / MBW_PBM / MBW_MIN / MBW_PROP RBWB / MWEIGHT
Bandwidth count MBM_TOTAL / _LOCAL MBM MBWU MBM_TOTAL via BC scope
LLC occupancy CMT CMT CSU (BC scope)
Counter assignment (planned) ABMC filter slots (always-assign) n/a

6.4 The MPAM mapping (per the v7.1 glue series)

  • RDT_RESOURCE_L3, _L2 → MPAM cache class with CPBM/CMAX/CCAP.
  • RDT_RESOURCE_MBA → MPAM memory class with MBW_*.
  • L3-MBM (QOS_L3_MBM_TOTAL) → MPAM MBWU on the memory class. Because MPAM only has finite filter slots, the FS uses ABMC-style counter assignment ("always-on") to back MBM.
  • LLC occupancy (QOS_L3_OCCUP_EVENT_ID) → MPAM CSU on the cache class.
  • CDP_CODE / CDP_DATA exist on MPAM but are gated behind CONFIG_EXPERT in the v7.1 glue: CDP halves the usable PARTID range, and remount semantics interact awkwardly with that.

7. State of this tree (v7.0-base)

drivers/resctrl/
├── Kconfig            609 B
├── Makefile           128 B
├── mpam_devices.c    2725 lines  ← basic MPAM driver
├── mpam_internal.h    661 lines  ← register layout, lock primitives
├── test_mpam_devices.c 397 lines ← KUnit
└── riscv/                        ← CBQRI .o build dir on this tree

drivers/acpi/arm64/
└── mpam.c             411 lines  ← ACPI MPAM table parser

include/linux/
├── arm_mpam.h          66 lines  ← public driver/ACPI API
└── resctrl.h          730 lines  ← cross-arch interface

fs/resctrl/
├── ctrlmondata.c     1043 lines
├── internal.h         (15 KB)
├── Kconfig
├── monitor.c         1903 lines
├── monitor_trace.h
├── pseudo_lock.c     1099 lines
└── rdtgroup.c        4668 lines

Latest commit touching the driver in this tree:

4ad79c874e53 arm_mpam: Fix null pointer dereference when restoring bandwidth counters

arch/arm64/Kconfig's ARM64_MPAM block has the explicit comment

select ARM64_MPAM_DRIVER if EXPERT  # does nothing yet

i.e. the basic driver compiles when CONFIG_EXPERT=y but the user-visible /sys/fs/resctrl path on arm64 is still wired through the v7.1 glue series that lands the actual resctrl_arch_* implementations and selects ARCH_HAS_CPU_RESCTRL.


8. The four-version review history of the basic driver

Version Date Patches Thread msgs Author
RFC 2025-07-11 36 James Morse
v1 2025-08-22 33 200 James Morse
v2 2025-09-10 29 199 James Morse — DT removed
v3 2025-10-17 29 108 James Morse — minor refinements
v4 (respin) 2025-11-07 33 147 Ben Horgan — generic cleanup helpers extracted
v5 2025-11-17 34 Ben Horgan
v6 2025-11-19 34 Ben Horgan — final, merged

Mboxes for v1, v2, v3, and the Ben Horgan respin are saved alongside this report at mpam-lore/thread-*.mbox.

Recurring review themes across the four versions:

  1. mpam-fb firmware constraint drives all locking and the IPI→workqueue rework. Accepted as immutable by v2.
  2. DT vs ACPI — DT removed in v2; will return when a platform vendor contributes real DTS.
  3. PARTID lifecycle — convergence on a single mpam_reprogram_ris_partid that handles both reset and configuration; in_reset_state flag tracks per-component state.
  4. MBWU counter widths — 31/44/63 selection, overflow tracking; Rohit Mathew's two patches plus Ben Horgan's overflow refinement in respin.
  5. Generic helpers extractionplatform_device_put cleanup, acpi_get_table_ret(), PPTT cache-v1 helper extracted as standalone patches in respin per Jonathan Cameron's repeated request.
  6. KUnit test scope — bitmap reset and props_mismatch() covered; monitor read paths under-tested because no platform exists with all counter variants.

9. Outstanding / future work

  • DT support — pending a platform contribution. The binding will return when a real DTS exists.
  • mpam-fb finalisation — the spec is still alpha. The driver has been shaped to accommodate it without a rewrite.
  • CDP for MPAM — gated behind CONFIG_EXPERT in v7.1 glue; PARTID-range halving + remount semantics are unresolved.
  • MBM with free-running counters — dropped in favour of ABMC-style always-on counter assignment to keep semantics consistent with the AMD ABMC path.
  • Monitor coverage testing — the v3 cover letter explicitly flags monitor paths as the most likely site of bugs because no test platform implements every counter variant.
  • Cross-arch ABMC counter-assignment paving — Ben Horgan's x86,fs/resctrl: Pave the way for MPAM counter assignment v1–v4 (Feb–Mar 2026) refactors arch hooks so MPAM and AMD share infrastructure.
  • Reinette Chatre's x86,fs/resctrl quality series (Apr 2026) cleans up after the move (W=12, kernel-doc, Coccinelle).

10. References & primary sources

Specifications

Canonical mailing-list anchors

  • "Set your time-machine to 2020" — [PATCH 0/9] x86/resctrl: Merge the CDP resources v1 20201030161120.227225-1-james.morse@arm.com
  • IPI → workqueue / closid+rmid combined index (the MPAM enabler on the read path) — terminating v9 20240213184438.16675-1-james.morse@arm.com
  • fs/resctrl move final form — v12 20250515165855.31452-1-james.morse@arm.com
  • arm64 cpufeature MPAM detection (final) — Joey Gouly v6 20241030160317.2528209-1-joey.gouly@arm.com
  • "arm_mpam: Add basic mpam driver" v6 — Ben Horgan 20251119122305.302149-1-ben.horgan@arm.com
  • "arm_mpam: resctrl..." glue v6 — Ben Horgan 20260313144617.3420416-1-ben.horgan@arm.com

Local archive (this directory)

  • mpam-subject-listing.txt — all 1223 lore hits with subject containing "MPAM"
  • mpam-resctrl-listing.txt — same, intersected with body containing "resctrl"
  • arm_mpam-listing.txt — 1207 hits with subject containing arm_mpam
  • cover-letters.txt — extracted cover-letter index
  • thread-2025082215...james.morse@arm.com.mbox — basic driver v1 (200 msgs)
  • thread-2025091020...james.morse@arm.com.mbox — basic driver v2 (199 msgs)
  • thread-2025101718...james.morse@arm.com.mbox — basic driver v3 (108 msgs)
  • thread-2025110712...ben.horgan@arm.com.mbox — basic driver v4 respin (147 msgs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment