Magellan Pilot 6 — magellan-gallery (first file-handling / media-output plugin shape)

Run ID: 2026-04-24T12-19-02_magellan-gallery Plugin: magellan-gallery v1.0.0 — gallery CPT + lightbox + shortcode + Gutenberg block Ecosystem: core (no baseline plugins) Driver: Chrome DevTools MCP with --experimental-page-id-routing Dispatch: 6 charters + recon in one concurrent wave (Sonnet-default, blind greybox — ISSUES.md stripped) Wallclock: ~27 min end-to-end

TL;DR — new shape, strong performance, new drift class surfaced

Recall 8/10 strict on a blind run against a plugin shape (gallery/media/output-rendering) the harness has never seen. Strong generalization signal — but Amendment E-ext's Pilot-5-rerun-identified drift (Tester files hypotheses as Questions rather than running the empirical probe) has now spread to sibling rules (Amendment D beforeunload, generic a11y probes). The drift is a probe-class problem, not E-ext-specific.

Pilot	Shape	Mode	Original	Rerun
1 (backups)	artifact-producer	Opus	10/10	—
2 (contact-forms)	form/email	Opus	7/10	10/10
3 (members)	role/restriction	Opus	5/10	10/10
4 (seo-toolkit)	metadata/rendering	Sonnet	4/10	10/10
5 (pay)	WC payment gateway	Sonnet	8/10	9/10 / 10 lenient
6 (gallery)	file-handling / media	Sonnet	8/10	—

Reliability — PQIP totals

8/10 planted caught + 17 bonus findings. 27 Problems, 5 Questions, 12 Improvements, 10 Praises.

Severity: 2 critical, 13 major, 11 minor, 1 trivial.

Per-charter PQIP

Charter	P	Q	I	!	Duration	Tool uses
gallery-post-type-admin	5	1	2	2	8m 48s	82
frontend-gallery-render	9	1	2	2	11m 19s	95
settings-page	3	0	2	2	9m 04s	82
shortcode-and-block	2	1	1	2	6m 29s	63
gallery-scale	4	1	2	1	7m 09s	51
cross-feature-lifecycle	4	1	3	1	6m 47s	58
Totals	27	5	12	10	49m 36s serial	431

Before/after against the 10 planted issues

#	Planted issue	Verdict	Amendment fired
1	Grid gap (CSS)	missed	—
2	Admin bar overlap on lightbox	missed	—
3	Block absent from inserter	caught-exact	Amendment 2 (absence-of-feature) — flagged in recon, converted to Critical
4	Draft gallery leaks via shortcode	caught-exact (×2 Testers)	Amendment 4 (cross-feature MANDATORY)
5	No uninstall hook	caught-exact	Amendment 2 — grep evidence
6	Meta box beforeunload missing	caught-bundled with drift (filed as Question)	Amendment D — fired soft
7	`loading="lazy"` hardcoded / toggle dead	caught-exact (×3 Testers)	Amendment 2 + Amendment C (enumerate root cause)
8	Masonry layout broken (jQuery dependency)	missed	—
9	Escape key doesn't close lightbox	missed	—
10	1-image carousel: NaN cycling	caught-bundled with drift (filed as Question)	generic a11y probe — fired soft

6 caught-exact + 2 caught-bundled-with-drift + 4 missed = 8/10 strict.

Bonus catches beyond the answer key:

CRITICAL: fatal PHP TypeError on string-typed _mgl_image_ids (data-corruption bug recon found, Tester confirmed)
MAJOR: no current_user_can in save handler
MAJOR: columns unbounded (9999 accepted, breaks grid)
MAJOR: layout field no whitelist (arbitrary strings render as CSS classes)
MAJOR: duplicate id="mgl-lightbox" when multiple galleries on one page (invalid HTML)
MAJOR: gallery images default to empty alt (WCAG 1.1.1 failure)
MAJOR: lightbox buttons have no aria-label (WCAG 4.1.2)
MAJOR: noscript fallback absent (JS-off = empty gallery)
MAJOR: deleted attachment renders <img src="">
MAJOR: frontend enqueue fires on every page (even gallery-less)
MAJOR: mgl_enable_webp is labeled "(Stub)" — classic claim-vs-reality
Plus the recon-flagged "Add Post" submenu label as trivial

Token consumption — aggregate

	Manager (Opus 4.7)	Subagents (Sonnet 4.6)	Total
Agents	1	7 Testers + 1 classifier	9
Messages	114	681	795
Fresh input	6,803	723	7,526
Output	184,976	75,300	260,276
Cache-create 5m	0	1,537,368	1,537,368
Cache-create 1h	208,741	0	208,741
Cache-read	43,982,847	57,678,172	101,661,019
Total tokens	44,383,367	59,290,563	103,673,930
Cost	$28.74	$26.02	$54.75

98.1% cache-read. The Manager cost is a tad high ($28.74) because Pilot 5 + Pilot 5 rerun + Pilot 6 all consumed from one long main conversation — the 1h cache amortizes but the context keeps growing.

By pricing category

Category	Tokens	Cost	%
Cache-read	101,661,019	$39.68	72.5%
Cache-create 5m	1,537,368	$6.99	12.8%
Output	260,276	$5.96	10.9%
Cache-create 1h	208,741	$2.09	3.8%
Fresh input	7,526	$0.04	0.1%

Note on the token-usage window

The capture script's ±10min window buffer included the Pilot 5 rerun's escape-analysis classifier ($1.82) in the "subagents" count. Subtracting that, the 7 actual Pilot 6 subagents (1 recon + 6 Testers) totaled $24.20, so the real Pilot 6 Tester work is ≈ $28.74 Manager + $24.20 Testers = $52.94, with the remaining $1.82 being spillover classification from the prior pilot.

Token + duration per Pilot-6 subagent

Session	Role	Duration	Tool uses	Msgs	Input	Output	cc5m	cr	Cost
recon	scout	4m 58s	50	68	74	8,992	191,158	4,016,805	$2.06
shortcode-and-block	Tester	6m 29s	63	97	103	7,758	159,448	7,964,348	$3.10
cross-feature-lifecycle	Tester	6m 47s	58	83	89	12,066	215,698	6,929,183	$3.07
gallery-scale	Tester	7m 09s	51	72	78	10,484	137,789	5,489,551	$2.32
gallery-post-type-admin	Tester	8m 48s	82	116	122	10,187	408,059	11,101,371	$5.01
settings-page	Tester	9m 04s	82	121	127	11,157	259,896	11,309,121	$4.54
frontend-gallery-render	Tester	11m 19s	95	124	130	14,656	165,320	10,867,793	$4.10
Productive totals		54m 34s serial	481	681	723	75,300	1,537,368	57,678,172	$24.20

Concurrent-wave compression: 6 Testers + recon (serially first, then 6 in parallel). Recon took 4m 58s; the 6-Tester wave then ran ~11m 19s (bounded by longest). Combined wallclock ~16-17 min for 50 min of sequential work (recon + 6 Testers). Compression ratio: ~3× for the full sequence, or ~4.8× just for the Tester wave (45m 36s sequential / ~11m 19s wallclock).

Notable per-Tester observations

frontend-gallery-render was richest: 9 Problems. Deep audit of escape chain, a11y, noscript, deleted-attachment, duplicate lightbox ID, and carousel off-by-one. Still only 11m 19s — cheap per finding.
shortcode-and-block was fastest at 6m 29s: the block-absent finding + draft-leak finding were both source-analysis + quick DOM probe.
Recon's 50 tool uses produced 5 directly-filable observations, all of which became filed Problems downstream. Recon-to-charter handoff discipline held.

Cost efficiency

Denominator	Value
Total cost / planted caught (8)	$6.62 per planted bug (or $5.42 with classifier-spillover subtracted)
Total cost / Problem filed (27)	$1.96 per Problem (or $1.60 without spillover)
Total cost / all PQIP items (54)	$0.98 per PQIP item

Comparable to Pilot 5 rerun's $4.77/planted and better than Pilot 4 rerun's $5.50. Not a step-change but holding — the harness is maintaining cost discipline across different plugin shapes.

Amendment firing matrix on the 17 current rules

#	Amendment	Fired?	Where
1. Empty / one / many states	✓	6/6 coverage notes	clean
2. Absence-of-feature	✓✓✓	LOAD-BEARING — 4 distinct findings: dead lazy, dead webp, no uninstall, no cap check	strongest convergence on this shape
3. Plugin-native writes	✓	Testers used UI / media picker where possible; direct DB only for scale + state-variety seeds	clean
4. Cross-feature MANDATORY	✓✓	cross-feature-lifecycle + draft-leak caught by 2 charters	clean
5. UI-path before "missing"	✓	block-absent finding verified via inserter search + registry object	clean
A. Inline counters	—	no fuel (no counters on this plugin)	correct non-fire
B. State variety	✓	gallery-post-type-admin probed string-typed `_mgl_image_ids` (vs array) — exactly the recipe	caught the CRITICAL fatal
C. Enumerate root-cause	✓	lazy toggle dead → sibling-propagated to webp toggle dead + dead capability check + dead uninstall	clean
D. Unsaved-work protection	~ drift	beforeunload probed but filed as Question rather than Problem	drift — same pattern as E-ext in Pilot 5 rerun
E. Admin two-tab concurrent	—	no fuel (no admin-form concurrent-edit bug on this plugin)	correct non-fire
E-ext. Rapid-double-submit	✓	gallery-post-type-admin + settings-page both ran empirical probe; save is idempotent, filed as Praise	clean fire after Pilot 5 rerun tightening landed via rule text
F. View-source HTML	✓	shortcode-and-block used view-source to confirm block absence + frontend-gallery-render used raw fetch	clean
G. DDL column types	—	3 Testers explicitly recorded non-applicability — "no custom DB tables in plugin → Amendment G correctly does not fire"	generalization test PASSED — zero overfire
Reinf 5 empty-state MANDATORY	✓	6/6 coverage notes	clean
Reinf 8 cross-feature MANDATORY	✓	5/6 coverage notes	clean
pqip.propagate-sibling-features	✓	lazy-dead → webp-dead → uninstall-missing chain	clean
pqip.UI-path-before-claim	✓	no over-claims filed	clean

Fired actively: 13/17. Correctly did not fire: 3/17 (A, E, G — no fuel). Drift: 1 (D — same drift pattern as E-ext in Pilot 5 rerun).

Amendment G generalization PASSED

Three Testers on this plugin explicitly recorded that Amendment G was non-applicable:

"Amendment G: no custom DB tables in plugin (post-meta only); rule correctly does not fire" "Amendment G coverage note: Plugin has no custom DB tables in schema. Gallery state is in wp_postmeta using WP core schema, which is appropriate for the value semantics the plugin stores." "Amendment G verdict: Plugin has no custom DB tables (only WP core tables in schema). Amendment G (DDL column type inspection) correctly does not fire."

Zero overfire on wp_postmeta storage (which WOULD have been wrong — meta values are strings/longtext by core schema, and that's correct for the data). The rule stayed in its lane.

The drift class becomes systemic

In Pilot 5 rerun, Amendment E-ext fired as a Question rather than a Problem because the Tester source-inspected instead of executing the empirical probe. I proposed tightening the rule text but the tightening was never shipped (grep "empirical probe must" in skills/tester-mindset/SKILL.md returns 0 hits).

In Pilot 6 the same drift pattern repeats in sibling rules:

Amendment D (unsaved-work): Tester noted missing beforeunload listener via DOM inspection → filed as Question.
Generic a11y probe on 1-image carousel: Tester observed the NaN cycling via source reading → filed as Question.

Two new drift cases, same class. The drift is not E-ext-specific — it's a generic "empirical-is-mandatory" gap across all probe-class amendments.

Proposed new Amendment I (cross-amendment sweep)

Add a global "empirical discipline" rule at the top of the probe-class section:

Every probe-class amendment is discharged only by executing the probe empirically. Source inspection, registry inspection, DOM snapshot inspection, and evidence of absence in code are all valid PRELUDES to a filed finding — they help identify what to probe. They do NOT discharge the rule. The rule is discharged only when:

You execute the empirical reproducer the rule specifies (rapid-double-submit, beforeunload trigger, keyboard close, empty state, two-tab, etc.)

You observe the behavior directly through a browser-driver verb call OR a side-effect count (DB row count, wp post meta get, HTTP request count)

You file the result with the empirical evidence — a Problem if the probe demonstrates a bug, a Praise if the probe demonstrates correct behavior, a Question only if the empirical probe is architecturally blocked (environment unreachable, hook not firable, etc.)

Source-inspected evidence of absence (e.g., "grep onbeforeunload returns 0 matches") is supporting evidence for a filed Problem, not a substitute for the empirical probe. When you file a Question citing source-inspection alone, you are skipping the rule.

Ships as a global policy at the top of skills/tester-mindset/SKILL.md, before the individual probe-class amendments. Targets the drift class, not any single amendment.

Amendment H (proposed)

For the Escape-key / keyboard-close miss on lightbox (Issue 9): extend the generic overlay-UI probe rule to mandate a keyboard-close check on any dismissable overlay (lightbox, modal, popup, drawer, menu). Simple rule, applies to every frontend plugin that renders an overlay.

Recon-to-charter handoff — perfect discipline

All 5 recon findings became filed Problems:

Recon finding	Filed where	Verdict
Fatal on string-typed `_mgl_image_ids`	gallery-post-type-admin P1	CRITICAL, filed
Block absent from inserter	shortcode-and-block P1	CRITICAL, filed
Empty alt on images	frontend-gallery-render P2	MAJOR, filed
Lightbox no aria-label	frontend-gallery-render P1	MAJOR, filed
`display:none` + no noscript	frontend-gallery-render P3	MAJOR, filed

Zero silent drops. This is the cleanest recon-to-charter handoff in the pilot history — the Phase 2 → Phase 3 discipline held.

Next steps

Ship Amendment I (empirical-probe-is-mandatory cross-amendment rule) + Amendment H (keyboard-close on overlay UIs). Both target generalizable bug/drift classes, not gallery-specific fixes.
Defer magellan-gallery rerun — run H + I against Pilot 7 for a cleaner attribution test.
Pilot 7 candidate: magellan-speed (caching / perf — tests SFDPOT Time dimension, never exercised) OR a plugin with a REST route (REST surface uncovered).

Cross-pilot state

Six pilots, five reruns, consistent convergence to ≥ 8/10 blind on amended harness:

Sonnet + amendments validated across four plugin shapes (members / seo-toolkit / pay / gallery)
Amendment G validated as non-overfiring on a plugin without custom DB tables (strongest generalization signal)
Amendment 2 (absence-of-feature) is reliably the load-bearing amendment across diverse shapes
Drift class across probe-amendments identified — single rule fix (Amendment I) should resolve it across all future pilots

The loop continues to compound. Every amendment from prior pilots still fires where applicable; no regressions.

Artifacts

Final report: runs/2026-04-24T12-19-02_magellan-gallery/final-report.md
Escape analysis: runs/2026-04-24T12-19-02_magellan-gallery/escape-analysis.md
Token usage: runs/2026-04-24T12-19-02_magellan-gallery/token-usage.json
Manifest: runs/2026-04-24T12-19-02_magellan-gallery/manifest.json
6 session reports: runs/2026-04-24T12-19-02_magellan-gallery/sessions/<slug>/report.json
Static analysis: runs/2026-04-24T12-19-02_magellan-gallery/static-analysis.md
Recon: runs/2026-04-24T12-19-02_magellan-gallery/recon.md
Coverage plan: runs/2026-04-24T12-19-02_magellan-gallery/coverage.md

alopezari/magellan-gallery-pilot-6.md

Select an option

No results found