Skip to content

Instantly share code, notes, and snippets.

@pdp7
Last active June 3, 2026 05:18
Show Gist options
  • Select an option

  • Save pdp7/16b06d133636b29dabcdf40c73181d2e to your computer and use it in GitHub Desktop.

Select an option

Save pdp7/16b06d133636b29dabcdf40c73181d2e to your computer and use it in GitHub Desktop.
2026-06-02-cbqri-resctrl-controls-design.md

CBQRI bandwidth allocation on the resctrl multiple-controls framework

Date: 2026-06-02 Branch: dfustini/cbqri-controls-rfc (worktree, based on reinette/resctrl/controls_rfc_v1) Author: Drew Fustini

Status

Approach A approved. The goal is to gauge the cost of re-modeling CBQRI's two bandwidth allocations onto Reinette Chatre's "multiple controls per resource" framework, before committing to a v7 direction.

Progress:

  • The two fs gap commits are implemented as isolated commits on this branch, 8eee0a8 (RESCTRL_CTRL_NAME_WGHT) and f51631a (default_to_min reset value).
  • The no DEF control risk is refuted by static trace, see the finding below.
  • The CBQRI arch port is complete and committed, 49454e3. It builds clean (riscv, LLVM=1 W=1, independently re-verified, zero warnings) and boots, see results below. Only cbqri_resctrl.c needed real API work, the arch resctrl.h header needed none.

Prototype results (2026-06-02, confirmed)

Built and booted under run-acpi-sock-1bc.sh (1 BC backing both controls, the Approach A topology). schemata after mount:

MB_WGHT:72=255
 MB_MIN:72=756
     L2:64=fff;65=fff
     L3:75=ffff
  • No DEF control confirmed empirically: no bare "MB" line, only MB_MIN and MB_WGHT. info dirs are MB_MIN, MB_WGHT, L2, L3, L3_MON.
  • MB_MIN:72=756 is the derived default for RCID 0, mrbwb 819 minus 63 other RCIDs seeded at 1, which matches cbqri_rcid0_rbwb() and the device's max_rcids=64. L2=fff is 12 cblks, L3=ffff is 16 cblks, all match the QEMU device config.
  • Writes route correctly: MB_WGHT reaches Mweight (77 read back), MB_MIN reaches Rbwb (a value that breaks sum(Rbwb) <= MRBWB is rejected on the Rbwb path).
  • default_to_min confirmed: a fresh mkdir group reads MB_MIN=1 (minimum) and MB_WGHT=255 (max).
  • No oops or warning during probe, mount, writes, or mkdir.
  • In-guest kselftest (MB_MIN, MB_WGHT, CBQRI_MBM): pass 3, fail 0, skip 0.
  • fs/resctrl needed no change beyond the two gap commits. The headline finding is confirmed: a resource with only named controls works unmodified.

One RFC API detail surfaced during the build, resctrl_arch_io_alloc_enable() gained a struct resctrl_ctrl * parameter, the CBQRI stub was updated to match. This is inside the driver, not an fs/resctrl change.

Problem

The current CBQRI series (b4/ssqosid-cbqri-rqsc) exposes the two CBQRI bandwidth allocations by adding two new top level resources to enum resctrl_res_level:

  • RDT_RESOURCE_MB_MIN, minimum reserved bandwidth (CBQRI Rbwb)
  • RDT_RESOURCE_MB_WGHT, weighted share of unreserved bandwidth (CBQRI Mweight)

Reinette's controls RFC (reinette/resctrl/controls_rfc_v1, based on 7.1-rc2) introduces the model that multiple allocations on the same physical thing are multiple named controls on a single resource, not separate resources. Her RFC exists specifically to replace the "one resource per control" pattern that the CBQRI series uses. For CBQRI to merge it needs to converge on this generic schema rather than carry parallel resource entries.

Target framework (what resctrl_ctrl provides)

struct rdt_resource no longer embeds cache or membw properties. It holds:

struct rdt_resource {
	enum resctrl_res_level	rid;
	...
	char			*name;		/* "MB", "L3", ... */
	bool			bw_delay_linear;	/* resource level */
	enum membw_throttle_mode bw_throttle_mode;	/* resource level */
	struct list_head	controls;	/* list of struct resctrl_ctrl */
};

Each allocation is a struct resctrl_ctrl on r->controls:

struct resctrl_ctrl {
	struct list_head	entry;
	enum resctrl_scope	scope;		/* moved off the resource */
	struct list_head	domains;	/* per control domains */
	enum resctrl_ctrl_type	type;		/* BITMAP or SCALAR */
	enum resctrl_ctrl_name	name;		/* DEF, MIN, MAX */
	union {
		struct resctrl_cache	cache;
		struct resctrl_membw	membw;
	};
};

The schema name is built by fs from the resource name plus the control name suffix, for example resource "MB" plus control MIN gives "MB_MIN". The architecture passes an enum name, fs owns the string, so names stay consistent across architectures.

Arch wraps each control in a private struct and adds it to the list. x86 uses:

struct resctrl_hw_ctrl {
	struct resctrl_ctrl	r_ctrl;
	u32			msr_base;
	void			(*msr_update)(struct msr_param *m);
};
...
list_add(&hw_ctrl->r_ctrl.entry, &r->controls);

The x86 NOT_FOR_INCLUSION demo (commit 37557cac8045) adds three controls to MBA, the real throttle as DEF plus dummy MIN and MAX, each dispatching to a different msr_update callback. The arch update and read callbacks now receive a struct resctrl_ctrl *ctrl so they can tell which control they are servicing.

Design (Approach A: two named controls, no default)

CBQRI backs a single bandwidth resource named "MB" by reusing RDT_RESOURCE_MBA. It does not create a default (unnamed) control, because CBQRI has no plain "MB" throttle equivalent. It adds exactly two controls:

  • control name MIN, type SCALAR, schema "MB_MIN", backed by Rbwb
  • control name WGHT, type SCALAR, schema "MB_WGHT", backed by Mweight

Resource level membw properties that are not per control stay on rdt_resource, bw_throttle_mode is THREAD_THROTTLE_UNDEFINED for both (CBQRI is not throttle based), bw_delay_linear is not used.

This preserves the exact user visible schema entries (MB_MIN and MB_WGHT) that the current series already ships, so there is no UABI change for users.

Rejected alternatives:

  • Approach B, make MIN the DEF control so its schema is the bare "MB". Zero framework friction but it drops the explicit MB_MIN name and collides semantically with the x86 throttle "MB". Rejected.
  • Approach C, add a no-op DEF placeholder plus MIN and WGHT. Exposes a bare "MB" schema entry that CBQRI cannot honor. Rejected.

Arch glue

Introduce a CBQRI control wrapper analogous to resctrl_hw_ctrl:

struct cbqri_hw_ctrl {
	struct resctrl_ctrl	r_ctrl;
	struct cbqri_controller	*bc;	/* backing bandwidth controller */
};

Both the MIN and WGHT controls point at the same backing BC, since cbqri_resctrl_pick_bw_alloc() already picks one BC to back both allocations.

resctrl_arch_update_one(), resctrl_arch_get_config() and resctrl_arch_reset_all_ctrls() stop switching on r->rid and instead dispatch on ctrl->name (MIN applies Rbwb, WGHT applies Mweight). This is cleaner than the current rid switch because both controls share one rid.

Domains

Domains move from the resource to the control (ctrl->domains). CBQRI's two controls share one BC, so both per control domain lists describe the same hardware. Domain online and offline run per control.

Cache side (mechanical)

L2 and L3 cache allocation map to a single DEF control of type BITMAP holding cbm_len, shareable_bits, min_cbm_bits. This is the framework's natural default case and is a mechanical move of fields from res->cache to a single control. Low risk.

Field and call mapping

Today (separate resources) Controls model
RDT_RESOURCE_MB_MIN, RDT_RESOURCE_MB_WGHT enum entries dropped, reuse RDT_RESOURCE_MBA, name "MB", two controls
res->name = "MB_MIN" / "MB_WGHT" ctrl->name = RESCTRL_CTRL_NAME_MIN / _WGHT, fs builds string
res->schema_fmt = RESCTRL_SCHEMA_RANGE ctrl->type = RESCTRL_CTRL_SCALAR
res->membw.{min_bw,max_bw,bw_gran} same fields on each ctrl->membw
res->ctrl_scope, res->ctrl_domains ctrl->scope, ctrl->domains (per control)
arch update/get/reset switch on r->rid switch on ctrl->name
cache props on res->cache single DEF control, type BITMAP

Upstream gaps and the key risk

Two properties do not exist in the RFC yet. They become items to raise with Reinette rather than CBQRI local patches:

  1. RESCTRL_CTRL_NAME_WGHT. Must be added to enum resctrl_ctrl_name and the fs string table in ctrlmondata.c. MIN already exists. This is small and generic, exactly the extension pattern the RFC was designed for.

  2. Per control reset and default value. The current series carries default_to_min (commit ed4eea9) so new groups reset to min_bw. The RFC has no such field, resctrl_get_default_ctrlval() returns membw.max_bw for SCALAR. MB_MIN must reset to min or the CBQRI invariant sum(Rbwb) <= MRBWB overflows at mkdir. This needs a generic per control reset value property on resctrl_ctrl or resctrl_membw.

Key risk: Approach A gives a resource with no DEF control. The concern was that the RFC fs obtains resource global properties from the default control and that several paths test ctrl->name == RESCTRL_CTRL_NAME_DEF.

Finding (static trace, 2026-06-02): largely refuted. Every read path that fetches the default control already guards for NULL and returns empty rather than dereferencing, see rdt_default_ctrl_show() (rdtgroup.c:1014), rdt_min_bw_show() (1178), rdt_bw_gran_show() (1221), resctrl_get_cache_ctrl() and resctrl_get_mba_sc_ctrl() (ctrlmondata.c:348, 366). The schemata write path resolves controls by name via resctrl_resource_ctrl_get() (ctrlmondata.c:373), so it does not need a default. The allocation init path rdtgroup_init_alloc() (rdtgroup.c:4032) iterates for_each_resource_ctrl(), so both named controls are initialized. The only path that mattered is the reset value at rdtgroup.c:4015, resctrl_get_default_ctrlval() returns max_bw for SCALAR. That is the second gap below. No fs change is needed to tolerate a resource with no DEF control, only the reset value behaviour. The compiling and booting prototype confirms this empirically.

Conversion impact on the current series

Current commit Fate under controls model
97ae9e5 Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT dropped
62fa613 Add resctrl_is_membw() helper dropped or reworked, membw-ness becomes ctrl->type
ed4eea9 default_to_min at reset reworked into upstream item 2
f4f458 MB_WGHT via Mweight rewritten as the WGHT control
da7a94f MB_MIN via Rbwb rewritten as the MIN control
BC probe and device ops commits rewritten to register two controls and dispatch by ctrl
cache and L3 mon commits mechanical, single DEF control

Most work concentrates in drivers/resctrl/cbqri_resctrl.c arch glue. The net fs/resctrl footprint is smaller than today (two enum entries removed, only the WGHT name and the reset value property added), which aligns with the minimize-fs/resctrl-changes principle.

Validation plan (prototype)

  1. Reuse RDT_RESOURCE_MBA, register "MB" with MIN and WGHT controls in cbqri_resctrl.c, no DEF control.
  2. Add RESCTRL_CTRL_NAME_WGHT and a per control reset value as the minimal fs changes needed to build.
  3. Build with LLVM=1 W=1 via build-matrix.sh.
  4. Boot under QEMU, mount resctrl, confirm MB_MIN and MB_WGHT appear in schemata, confirm writes reach Rbwb and Mweight, confirm reset behaviour.
  5. Record whether the no DEF control path works unmodified or needs an fs change, this is the headline finding for the on-list discussion.

Conversion cost (measured against the staged port)

All CBQRI files are staged on this branch. The conversion surface is narrow.

Independent of the resctrl restructure, brought in clean, no change needed: arch/riscv/* (Ssqosid detect, srmcfg CSR, qos.c), drivers/acpi/riscv/rqsc.*, drivers/resctrl/cbqri_devices.c, cbqri_internal.h, include/linux/riscv_cbqri.h, the driver Kconfig and Makefile.

fs/resctrl side: no CBQRI specific changes needed beyond the two gap commits already landed. The series' RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT enum entries and the resctrl_is_membw() helper are dropped. The get_num_closid NULL guard is re-evaluated against the RFC, which already returns per resource closid through rdt_resource_final.

Needs API adaptation, two files only: drivers/resctrl/cbqri_resctrl.c and arch/riscv/include/asm/resctrl.h. Concrete deltas:

  • resctrl_arch_get_cdp_enabled(rid) becomes (struct rdt_resource *r)
  • resctrl_arch_set_cdp_enabled(rid, en) becomes (r, struct resctrl_ctrl *ctrl, en)
  • resctrl_arch_update_one() and resctrl_arch_get_config() gain struct resctrl_ctrl *ctrl and switch on ctrl->name instead of r->rid
  • resctrl_online_ctrl_domain() and the offline variant gain ctrl, domains move from res->ctrl_domains to ctrl->domains
  • introduce struct cbqri_hw_ctrl { struct resctrl_ctrl r_ctrl; struct cbqri_controller *hw; } and replace writes to res->membw, res->cache, res->ctrl_scope and res->schema_fmt with per control fields
  • cbqri_resctrl_control_init() builds the controls list: the bandwidth resource reuses RDT_RESOURCE_MBA named "MB" with two controls MIN and WGHT (MIN sets membw.default_to_min), each cache resource gets one DEF control of type BITMAP

Effort: one file rewrite plus header signature edits, then a riscv build loop and a QEMU boot. Static analysis found no structural blocker. The build and boot remain to confirm empirically.

Open questions for Reinette

  • Will the framework accept a resource with only named controls and no DEF control, or should that be added as a generic capability?
  • Is a per control reset and default value (beyond the SCALAR max_bw default) acceptable to add to resctrl_ctrl?
  • Preferred path for the WGHT name, add to the shared enum now or gate behind the arch that needs it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment