A technical survey of Arm's Memory Partitioning and Monitoring (MPAM) extension
and the multi-year effort to expose it through the Linux resctrl filesystem.
Written against this tree (Linux v7.0-base, RISC-V CBQRI development branch);
file paths and message-IDs cite primary sources where possible.
MPAM is Arm's optional architectural extension for memory-system QoS. It tags
every memory transaction with a 16-bit PARTID (partition ID) and an 8-bit
PMG (performance monitoring group), and provides MMIO-mapped controllers
("Memory System Components", MSCs) that consume those tags to enforce capacity
and bandwidth policies and to count traffic. The architecture is published as
the Memory System Resource Partitioning and Monitoring (MPAM), for Armv8-A
specification (ARM DDI 0598B/db).
Linux exposes MPAM through resctrl, the same pseudo-filesystem originally
introduced for Intel RDT. Reaching that point required ~7 years of cross-arch
refactoring driven mostly by James Morse (Arm), with later contributions from
Joey Gouly, Ben Horgan, and Rohit Mathew. The work concluded with three big
milestones:
| Milestone | Series | Lands |
|---|---|---|
resctrl becomes cross-arch (data-struct split, CDP merge, IPI rework, bytes-cleanup, fs/move) |
2018–2025 prep series, terminating in [PATCH v12] x86/resctrl: Move the resctrl filesystem code to /fs/resctrl |
v6.16 (mid-2025) |
| arm64 cpufeature MPAM detection + KVM trap | Joey Gouly v6 (Oct 30, 2024) | v6.13 |
| The basic MPAM driver (no userspace yet) | "arm_mpam: Add basic mpam driver" v1→v6 (Aug–Nov 2025) | v7.0 |
| MPAM↔resctrl glue (the user-visible piece) | "arm_mpam: resctrl..." RFC→v6 (Dec 2025–Mar 2026) | v7.1 |
In this tree (v7.0-base), the basic driver is present at
drivers/resctrl/mpam_devices.c (2725 lines) but is gated behind
CONFIG_EXPERT and explicitly does not yet select ARCH_HAS_CPU_RESCTRL
(Kconfig comment "does nothing yet"). The wiring lives in the v7.1 glue series.
Every CPU thread carries two MPAM identifiers in its execution context:
- PARTID — selects which partition policy applies. Width is IMPLEMENTATION_DEFINED, up to 16 bits. Resources are configured by writing policy registers indexed by PARTID.
- PMG — a "performance monitoring group" sub-index, up to 8 bits. PARTID selects the policy; (PARTID, PMG) together act as the filter for monitoring counters. PMG is independent of PARTID for control purposes and is shared across the whole machine.
These propagate alongside the transaction through the SoC interconnect.
The thread's tags come from system registers per Exception Level
(MPAM0_EL1, MPAM1_EL1, MPAM2_EL2, MPAM3_EL3). EL2/EL3 have hooks for
trapping or remapping (MPAMHCR_EL2, virtualisation registers
MPAMVPM0–7_EL2 / MPAMVPMV_EL2). KVM saves/restores these during world
switches.
MSC (Memory System Component) ← physical block: a cache, DDR controller, IOMMU, …
├── RIS (Resource Instance Selector) ← logical slice within an MSC
│ └── controls / monitors for one resource type
├── partsel registers ← MPAMCFG_PART_SEL: pick PARTID
├── monsel registers ← MSMON_CFG_MON_SEL: pick a monitor
└── interrupts (errors, overflow)
An MSC is the firmware-described physical container. A RIS is the smallest controllable unit, used when a single MSC virtualises several caches or several memory channels. The Linux driver groups RIS-es into a component (all RIS-es for one resource on one topology level — e.g. an L2 slice) and groups components into a class (all L2s, all L3s, all memory).
Selected at MMIO offset MPAMCFG_* after writing a PARTID into
MPAMCFG_PART_SEL:
| Feature | Register | Meaning |
|---|---|---|
| CPBM — Cache portion bitmap | MPAMCFG_CPBM |
Bitmap of "ways"/portions a PARTID may use. Equivalent of x86 CAT's CBM. |
| CMAX — Cache max-capacity | MPAMCFG_CMAX |
Soft cap on cache footprint (not way-based). |
| CMIN — Cache min-capacity | MPAMCFG_CMIN |
Reservation. |
| MBW_PBM — Memory bandwidth portion BM | MPAMCFG_MBW_PBM |
Bandwidth portion bitmap, analogous to AMD MBA bitmap. |
| MBW_MAX — Memory bandwidth max | MPAMCFG_MBW_MAX |
Max as a fraction of total. Equivalent of x86/AMD MBA max-throttle. |
| MBW_MIN — Memory bandwidth min | MPAMCFG_MBW_MIN |
Reservation. |
| MBW_PROP — Proportional stride | MPAMCFG_MBW_PROP |
Weighted-share scheduler input. |
| PRI / DSPRI / INTPRI | MPAMCFG_PRI |
Internal/downstream priority for arbitration. |
| CCAP — Cache capacity | MPAMCFG_CCAP |
Same idea as CMAX, finer-grain partitions. |
| Reset value | n/a | A PARTID's policies on reset are typically "unrestricted". |
Not every MSC implements every feature; the driver discovers feature presence
by reading MPAMF_IDR and friends (see drivers/resctrl/mpam_internal.h,
register layout block).
Selected at MMIO offset MSMON_* after writing into MSMON_CFG_MON_SEL:
| Feature | Register | Meaning |
|---|---|---|
| CSU — Cache storage usage | MSMON_CSU |
"How many bytes of this cache does PARTID/PMG occupy right now?" Equivalent of x86 LLC occupancy (CMT). |
| MBWU — Memory bandwidth usage | MSMON_MBWU |
Counter of bytes traversing the MSC for a (PARTID, PMG). Equivalent of x86 MBM_TOTAL/_LOCAL. |
| OFLOW interrupt | MSMON_OFLOW_* |
Optional overflow signalling. |
MBWU counters come in three width variants the architecture allows (IMPLEMENTATION_DEFINED): 31-bit (one 32-bit register, top bit reserved), 44-bit (long counter, two 32-bit reads), 63-bit ("LWD" long-wide counter, two 32-bit reads). Driver state-tracks both 31-bit overflow and power-management saves/restores; see Rohit Mathew's two patches "Probe for long/lwd mbwu counters" / "Use long MBWU counters if supported".
A monitor in MPAM is a filter slot, not per-PARTID storage — there are
finite filter slots on an MSC, and the architecture only requires one. This
is the counter assignment model (cf. AMD ABMC) that motivates the resctrl
"assignable counters" hooks (resctrl_arch_config_cntr,
resctrl_arch_cntr_read, resctrl_arch_reset_cntr).
A platform may expose its MSCs through a still-alpha firmware specification
("mpam-fb", proxied via PCC). On those systems the OS cannot poke the
MMIO directly — it must write a request and wait for a firmware interrupt
before the next access. This is too slow for IPIs, and it's the reason the
2022–2024 series had to convert resctrl's synchronous IPI-based reads to a
work queue, and the reason the driver's locking is mutex-based around
part_sel_lock / mon_sel_lock rather than spinlocks. James Morse's cover
letters note this constraint will not be revisited even if it makes the
locking look "very strange" by day-1.
| Item | Source |
|---|---|
CPU feature bit ARM64_MPAM |
arch/arm64/include/asm/cpucaps.h, decoded via ID_AA64PFR0_EL1.MPAM and ID_AA64PFR1_EL1.MPAM_frac |
| Boot-time discovery & override | arch/arm64/kernel/pi/idreg-override.c (arm64.nompam cmdline), arch/arm64/kernel/cpufeature.c |
MPAMIDR_EL1 reads |
deferred init in arch/arm64/kernel/cpuinfo.c |
| EL2 setup (MPAM trap defaults) | arch/arm64/include/asm/el2_setup.h |
| KVM register mapping (VNCR offsets) | arch/arm64/include/asm/vncr_mapping.h (MPAM1_EL1, MPAMHCR_EL2, MPAMVPMV_EL2, MPAMVPM0–7_EL2) |
| KVM ID-reg masking | hides MPAM bits from guests when host can't context-switch them — Joey Gouly's v6 series |
All of the above landed in v6.13 (the Joey Gouly series
20241030160317.2528209-1-joey.gouly@arm.com).
MPAM is described to the OS through a dedicated ACPI table:
- ACPI MPAM table v2 — Rafael Wysocki's "ACPICA: Add support for Arm's
MPAM ACPI table version 2" (Apr 2023). Header struct
acpi_table_mpamininclude/acpi/actbl*.h. - Parser —
drivers/acpi/arm64/mpam.c(411 lines):acpi_mpam_count_msc()— top-level count of MSC nodesacpi_mpam_parse_resources()— per-MSC RIS listacpi_mpam_register_irq()— IRQ registration helper- Resource references describe which cache (by PPTT cache-id) or memory domain (by proximity domain) an MSC covers.
- PPTT prerequisites — Several patches before the driver itself land cacheinfo/PPTT helpers that find the cache level by id, or fill a cpumask from a processor container or cache-id. These are reused on every arch.
- PCC — when the platform uses
mpam-fb, the parsing path follows a PCC mailbox descriptor. - DT — A binding existed in v1 of the basic driver but was removed in v2 pending real DTS contribution from a platform vendor. The driver is ACPI-only as currently merged.
resctrl was originally an x86-only filesystem under
arch/x86/kernel/cpu/resctrl/. Bringing MPAM (and later RISC-V CBQRI) under
the same UAPI required peeling the "fs" layer off the "arch" layer in seven
sequential series, each landed before the next began.
| Year | Series | Notes / message-id | Status |
|---|---|---|---|
| 2018 | RFC v1 — first cross-arch proposal (20 patches) | 20180824104519.11203-1-james.morse@arm.com |
superseded |
| 2020 | RFC v2 — reduced to 2 patches, "tip of the MPAM iceberg" | 20200214182947.39194-1-james.morse@arm.com |
superseded |
| 2020 | Misc cleanup (sparse-bitmaps, linear-MBA, get_cache_id) v1–v5 | 20200214182401.39008-1-james.morse@arm.com series |
merged ~late 2020 |
| 2020 | KVM "Hide unsupported MPAM from the guest" (single) | 20200925160102.118858-1-james.morse@arm.com |
dropped, revived 2024 |
| 2020 | "Set your time-machine to 2020" — the cited what-is-MPAM cover | 20201030161120.227225-1-james.morse@arm.com (CDP merge v1) |
superseded by v7 |
| 2021 | Merge the CDP resources v2–v7 (introduces struct resctrl_schema) |
terminating 20210728170637.25610-1-james.morse@arm.com |
merged ~v5.16 |
| 2021–2022 | resctrl_arch_rmid_read() returns bytes v1–v4 |
terminating 20220412124419.30689-1-james.morse@arm.com |
merged ~v6.1 |
| 2022–2024 | Monitored CLOSID+RMID together / arch+fs locking split / IPI→workqueue rework v1–v9 | terminating 20240213184438.16675-1-james.morse@arm.com |
merged ~v6.10 |
| 2023–2024 | arm64 cpufeature MPAM detection (James → Joey Gouly v6) | 20241030160317.2528209-1-joey.gouly@arm.com |
merged ~v6.13 |
| 2024–2025 | Move resctrl filesystem code to /fs/resctrl v1–v12 | terminating 20250515165855.31452-1-james.morse@arm.com |
merged ~v6.16 |
| 2025 | "arm_mpam: Add basic mpam driver" RFC, v1–v6 (Ben Horgan took over from v4) | terminating 20251119122305.302149-1-ben.horgan@arm.com |
merged ~v7.0 |
| 2025–2026 | "arm_mpam: resctrl..." glue series RFC–v6 (the user-visible piece) | terminating 20260313144617.3420416-1-ben.horgan@arm.com |
merged ~v7.1 |
| 2026 | Glue fixes, ABMC counter-assignment paving, Reinette's quality cleanup | various | merged into v7.1-rcN |
The IPI rework (item 8) is the named MPAM blocker: x86's resctrl read path
sent an IPI to the target CPU, which is fine for MSR-based RDT but
unworkable for mpam-fb. The rework converts those reads to scheduled work
on the target domain.
A parallel Marvell-only path (Amit Singh Tomar, Jan 2024, "ARM: MPAM: add support for priority partitioning control") was redirected by Reinette Chatre and superseded by James's basic driver.
After the v6.16 move, fs/resctrl/{ctrlmondata.c,monitor.c,pseudo_lock.c,rdtgroup.c,internal.h,monitor_trace.h} (≈9.7k lines) is generic. Architectures plug in via a fixed set of resctrl_arch_* hooks and the struct rdt_resource / struct rdt_*_domain data model.
enum resctrl_res_level—RDT_RESOURCE_L3,_L2,_MBA,_SMBA, plus RISC-V's_RBWB,_MWEIGHT,_PERF_PKG. MPAM expects to register L3, L2, MBA, and MBM monitoring domains; CDP is folded into per-CTRL "schema".struct rdt_resource(resctrl.h:330) — capabilities (alloc_capable,mon_capable), control/monitoring scopes (cache level, package, NUMA node), embeddedresctrl_cache/resctrl_membw/resctrl_monfeature blocks, RCU-protectedctrl_domainsandmon_domainslists.struct rdt_ctrl_domain(resctrl.h:164) — group of CPUs sharing one control instance; carriesstaged_config[CDP_*].struct rdt_l3_mon_domain(resctrl.h:198) — group of CPUs sharing one monitoring instance; carriesrmid_busy_llc,mbm_states, the limbo worker.struct resctrl_schema(resctrl.h:366) — what userspace sees in/sys/fs/resctrl/info/<NAME>/. CDP becomes two schemas (CODE / DATA) over onerdt_resource.
Discovery & shape:
resctrl_arch_get_resource(level)→struct rdt_resource *resctrl_arch_get_num_closid(r)(CLOSID == PARTID for MPAM)resctrl_arch_system_num_rmid_idx()(combined CLOSID+RMID index for MPAM and RISC-V; flat for x86)resctrl_arch_is_evt_configurable(evt)
Task switch:
resctrl_arch_sync_cpu_closid_rmid(info)— IPI callback invoked on context switches to update per-CPU PARTID/PMG context (writesMPAM0_EL1/MPAM1_EL1on arm64).
Apply / read control:
resctrl_arch_update_one(r, d, closid, val, t)— apply one schemata valueresctrl_arch_update_domains(r, closid)resctrl_arch_get_config(r, d, closid, t)resctrl_arch_reset_all_ctrls(r)
Monitoring (the MPAM-shaped path):
resctrl_arch_rmid_read(r, hdr, closid, rmid, eid, val, arch_mon_ctx)— may sleep; for MPAM thearch_mon_ctxcarries the allocated monitor filter slot (see comment atresctrl.h:549–550).resctrl_arch_mon_ctx_alloc()/_free()— get a free MPAM monitor.resctrl_arch_reset_rmid()/_rmid_all().
Counter-assignment (ABMC + MPAM):
resctrl_arch_mbm_cntr_assign_enabled(r)/_set(r, enable)resctrl_arch_config_cntr()— assign / unassign a counter to an eventresctrl_arch_cntr_read()/resctrl_arch_reset_cntr()
Event-config (for hardware that filters MBM events, e.g. local-vs-total):
resctrl_arch_mon_event_config_write()/_read()— IPI-shaped write/read
CDP, IO-allocation, pre-mount, pseudo-lock arch hooks round out the surface.
Domain lifecycle hooks called by the architecture into the FS:
resctrl_online_ctrl_domain()/resctrl_offline_ctrl_domain()resctrl_online_mon_domain()/resctrl_offline_mon_domain()
| Concept | x86 RDT | AMD | MPAM | RISC-V CBQRI |
|---|---|---|---|---|
| Allocation tag | CLOSID | CLOSID | PARTID | RCID |
| Monitoring tag | RMID | RMID | PMG | MCID |
| Cache-portion bitmap | CBM (CAT) | CBM (CAT) | CPBM | (per-AT CBM) |
| Code/Data split | CDP | CDP | (planned, see v7.1 glue, currently CONFIG_EXPERT) |
AT (access-type) |
| Bandwidth control | MBA (max %) | MBA bitmap | MBW_MAX / MBW_PBM / MBW_MIN / MBW_PROP | RBWB / MWEIGHT |
| Bandwidth count | MBM_TOTAL / _LOCAL | MBM | MBWU | MBM_TOTAL via BC scope |
| LLC occupancy | CMT | CMT | CSU | (BC scope) |
| Counter assignment | (planned) | ABMC | filter slots (always-assign) | n/a |
RDT_RESOURCE_L3,_L2→ MPAM cache class with CPBM/CMAX/CCAP.RDT_RESOURCE_MBA→ MPAM memory class withMBW_*.- L3-MBM (
QOS_L3_MBM_TOTAL) → MPAM MBWU on the memory class. Because MPAM only has finite filter slots, the FS uses ABMC-style counter assignment ("always-on") to back MBM. - LLC occupancy (
QOS_L3_OCCUP_EVENT_ID) → MPAM CSU on the cache class. CDP_CODE/CDP_DATAexist on MPAM but are gated behindCONFIG_EXPERTin the v7.1 glue: CDP halves the usable PARTID range, and remount semantics interact awkwardly with that.
drivers/resctrl/
├── Kconfig 609 B
├── Makefile 128 B
├── mpam_devices.c 2725 lines ← basic MPAM driver
├── mpam_internal.h 661 lines ← register layout, lock primitives
├── test_mpam_devices.c 397 lines ← KUnit
└── riscv/ ← CBQRI .o build dir on this tree
drivers/acpi/arm64/
└── mpam.c 411 lines ← ACPI MPAM table parser
include/linux/
├── arm_mpam.h 66 lines ← public driver/ACPI API
└── resctrl.h 730 lines ← cross-arch interface
fs/resctrl/
├── ctrlmondata.c 1043 lines
├── internal.h (15 KB)
├── Kconfig
├── monitor.c 1903 lines
├── monitor_trace.h
├── pseudo_lock.c 1099 lines
└── rdtgroup.c 4668 lines
Latest commit touching the driver in this tree:
4ad79c874e53 arm_mpam: Fix null pointer dereference when restoring bandwidth counters
arch/arm64/Kconfig's ARM64_MPAM block has the explicit comment
select ARM64_MPAM_DRIVER if EXPERT # does nothing yet
i.e. the basic driver compiles when CONFIG_EXPERT=y but the user-visible
/sys/fs/resctrl path on arm64 is still wired through the v7.1 glue series
that lands the actual resctrl_arch_* implementations and selects
ARCH_HAS_CPU_RESCTRL.
| Version | Date | Patches | Thread msgs | Author |
|---|---|---|---|---|
| RFC | 2025-07-11 | 36 | — | James Morse |
| v1 | 2025-08-22 | 33 | 200 | James Morse |
| v2 | 2025-09-10 | 29 | 199 | James Morse — DT removed |
| v3 | 2025-10-17 | 29 | 108 | James Morse — minor refinements |
| v4 (respin) | 2025-11-07 | 33 | 147 | Ben Horgan — generic cleanup helpers extracted |
| v5 | 2025-11-17 | 34 | — | Ben Horgan |
| v6 | 2025-11-19 | 34 | — | Ben Horgan — final, merged |
Mboxes for v1, v2, v3, and the Ben Horgan respin are saved alongside this
report at mpam-lore/thread-*.mbox.
Recurring review themes across the four versions:
- mpam-fb firmware constraint drives all locking and the IPI→workqueue rework. Accepted as immutable by v2.
- DT vs ACPI — DT removed in v2; will return when a platform vendor contributes real DTS.
- PARTID lifecycle — convergence on a single
mpam_reprogram_ris_partidthat handles both reset and configuration;in_reset_stateflag tracks per-component state. - MBWU counter widths — 31/44/63 selection, overflow tracking; Rohit Mathew's two patches plus Ben Horgan's overflow refinement in respin.
- Generic helpers extraction —
platform_device_putcleanup,acpi_get_table_ret(), PPTT cache-v1 helper extracted as standalone patches in respin per Jonathan Cameron's repeated request. - KUnit test scope — bitmap reset and
props_mismatch()covered; monitor read paths under-tested because no platform exists with all counter variants.
- DT support — pending a platform contribution. The binding will return when a real DTS exists.
mpam-fbfinalisation — the spec is still alpha. The driver has been shaped to accommodate it without a rewrite.- CDP for MPAM — gated behind
CONFIG_EXPERTin v7.1 glue; PARTID-range halving + remount semantics are unresolved. - MBM with free-running counters — dropped in favour of ABMC-style always-on counter assignment to keep semantics consistent with the AMD ABMC path.
- Monitor coverage testing — the v3 cover letter explicitly flags monitor paths as the most likely site of bugs because no test platform implements every counter variant.
- Cross-arch ABMC counter-assignment paving — Ben Horgan's
x86,fs/resctrl: Pave the way for MPAM counter assignmentv1–v4 (Feb–Mar 2026) refactors arch hooks so MPAM and AMD share infrastructure. - Reinette Chatre's
x86,fs/resctrlquality series (Apr 2026) cleans up after the move (W=12, kernel-doc, Coccinelle).
- Memory System Resource Partitioning and Monitoring (MPAM), for Armv8-A — Arm DDI 0598B/db https://developer.arm.com/documentation/ddi0598/db/?lang=en
- ACPI for the Memory System Resource Partitioning and Monitoring (MPAM) table, Arm DEN0065 (v2 in ACPICA April 2023).
- "Set your time-machine to 2020" —
[PATCH 0/9] x86/resctrl: Merge the CDP resourcesv120201030161120.227225-1-james.morse@arm.com - IPI → workqueue / closid+rmid combined index (the MPAM enabler on the
read path) — terminating v9
20240213184438.16675-1-james.morse@arm.com fs/resctrlmove final form — v1220250515165855.31452-1-james.morse@arm.com- arm64 cpufeature MPAM detection (final) — Joey Gouly v6
20241030160317.2528209-1-joey.gouly@arm.com - "arm_mpam: Add basic mpam driver" v6 — Ben Horgan
20251119122305.302149-1-ben.horgan@arm.com - "arm_mpam: resctrl..." glue v6 — Ben Horgan
20260313144617.3420416-1-ben.horgan@arm.com
mpam-subject-listing.txt— all 1223 lore hits with subject containing "MPAM"mpam-resctrl-listing.txt— same, intersected with body containing "resctrl"arm_mpam-listing.txt— 1207 hits with subject containingarm_mpamcover-letters.txt— extracted cover-letter indexthread-2025082215...james.morse@arm.com.mbox— basic driver v1 (200 msgs)thread-2025091020...james.morse@arm.com.mbox— basic driver v2 (199 msgs)thread-2025101718...james.morse@arm.com.mbox— basic driver v3 (108 msgs)thread-2025110712...ben.horgan@arm.com.mbox— basic driver v4 respin (147 msgs)