Run ID: 2026-04-24T12-19-02_magellan-gallery
Plugin: magellan-gallery v1.0.0 — gallery CPT + lightbox + shortcode + Gutenberg block
Ecosystem: core (no baseline plugins)
Driver: Chrome DevTools MCP with --experimental-page-id-routing
Dispatch: 6 charters + recon in one concurrent wave (Sonnet-default, blind greybox — ISSUES.md stripped)
Wallclock: ~27 min end-to-end
Recall 8/10 strict on a blind run against a plugin shape (gallery/media/output-rendering) the harness has never seen. Strong generalization signal — but Amendment E-ext's Pilot-5-rerun-identified drift (Tester files hypotheses as Questions rather than running the empirical probe) has now spread to sibling rules (Amendment D beforeunload, generic a11y probes). The drift is a probe-class problem, not E-ext-specific.
| Pilot | Shape | Mode | Original | Rerun |
|---|---|---|---|---|
| 1 (backups) | artifact-producer | Opus | 10/10 | — |
| 2 (contact-forms) | form/email | Opus | 7/10 | 10/10 |
| 3 (members) | role/restriction | Opus | 5/10 | 10/10 |
| 4 (seo-toolkit) | metadata/rendering | Sonnet | 4/10 | 10/10 |
| 5 (pay) | WC payment gateway | Sonnet | 8/10 | 9/10 / 10 lenient |
| 6 (gallery) | file-handling / media | Sonnet | 8/10 | — |
8/10 planted caught + 17 bonus findings. 27 Problems, 5 Questions, 12 Improvements, 10 Praises.
Severity: 2 critical, 13 major, 11 minor, 1 trivial.
| Charter | P | Q | I | ! | Duration | Tool uses |
|---|---|---|---|---|---|---|
| gallery-post-type-admin | 5 | 1 | 2 | 2 | 8m 48s | 82 |
| frontend-gallery-render | 9 | 1 | 2 | 2 | 11m 19s | 95 |
| settings-page | 3 | 0 | 2 | 2 | 9m 04s | 82 |
| shortcode-and-block | 2 | 1 | 1 | 2 | 6m 29s | 63 |
| gallery-scale | 4 | 1 | 2 | 1 | 7m 09s | 51 |
| cross-feature-lifecycle | 4 | 1 | 3 | 1 | 6m 47s | 58 |
| Totals | 27 | 5 | 12 | 10 | 49m 36s serial | 431 |
| # | Planted issue | Verdict | Amendment fired |
|---|---|---|---|
| 1 | Grid gap (CSS) | missed | — |
| 2 | Admin bar overlap on lightbox | missed | — |
| 3 | Block absent from inserter | caught-exact | Amendment 2 (absence-of-feature) — flagged in recon, converted to Critical |
| 4 | Draft gallery leaks via shortcode | caught-exact (×2 Testers) | Amendment 4 (cross-feature MANDATORY) |
| 5 | No uninstall hook | caught-exact | Amendment 2 — grep evidence |
| 6 | Meta box beforeunload missing | caught-bundled with drift (filed as Question) | Amendment D — fired soft |
| 7 | loading="lazy" hardcoded / toggle dead |
caught-exact (×3 Testers) | Amendment 2 + Amendment C (enumerate root cause) |
| 8 | Masonry layout broken (jQuery dependency) | missed | — |
| 9 | Escape key doesn't close lightbox | missed | — |
| 10 | 1-image carousel: NaN cycling | caught-bundled with drift (filed as Question) | generic a11y probe — fired soft |
6 caught-exact + 2 caught-bundled-with-drift + 4 missed = 8/10 strict.
Bonus catches beyond the answer key:
- CRITICAL: fatal PHP TypeError on string-typed
_mgl_image_ids(data-corruption bug recon found, Tester confirmed) - MAJOR: no
current_user_canin save handler - MAJOR: columns unbounded (9999 accepted, breaks grid)
- MAJOR: layout field no whitelist (arbitrary strings render as CSS classes)
- MAJOR: duplicate
id="mgl-lightbox"when multiple galleries on one page (invalid HTML) - MAJOR: gallery images default to empty alt (WCAG 1.1.1 failure)
- MAJOR: lightbox buttons have no aria-label (WCAG 4.1.2)
- MAJOR: noscript fallback absent (JS-off = empty gallery)
- MAJOR: deleted attachment renders
<img src=""> - MAJOR: frontend enqueue fires on every page (even gallery-less)
- MAJOR:
mgl_enable_webpis labeled "(Stub)" — classic claim-vs-reality - Plus the recon-flagged "Add Post" submenu label as trivial
| Manager (Opus 4.7) | Subagents (Sonnet 4.6) | Total | |
|---|---|---|---|
| Agents | 1 | 7 Testers + 1 classifier | 9 |
| Messages | 114 | 681 | 795 |
| Fresh input | 6,803 | 723 | 7,526 |
| Output | 184,976 | 75,300 | 260,276 |
| Cache-create 5m | 0 | 1,537,368 | 1,537,368 |
| Cache-create 1h | 208,741 | 0 | 208,741 |
| Cache-read | 43,982,847 | 57,678,172 | 101,661,019 |
| Total tokens | 44,383,367 | 59,290,563 | 103,673,930 |
| Cost | $28.74 | $26.02 | $54.75 |
98.1% cache-read. The Manager cost is a tad high ($28.74) because Pilot 5 + Pilot 5 rerun + Pilot 6 all consumed from one long main conversation — the 1h cache amortizes but the context keeps growing.
| Category | Tokens | Cost | % |
|---|---|---|---|
| Cache-read | 101,661,019 | $39.68 | 72.5% |
| Cache-create 5m | 1,537,368 | $6.99 | 12.8% |
| Output | 260,276 | $5.96 | 10.9% |
| Cache-create 1h | 208,741 | $2.09 | 3.8% |
| Fresh input | 7,526 | $0.04 | 0.1% |
The capture script's ±10min window buffer included the Pilot 5 rerun's escape-analysis classifier ($1.82) in the "subagents" count. Subtracting that, the 7 actual Pilot 6 subagents (1 recon + 6 Testers) totaled $24.20, so the real Pilot 6 Tester work is ≈ $28.74 Manager + $24.20 Testers = $52.94, with the remaining $1.82 being spillover classification from the prior pilot.
| Session | Role | Duration | Tool uses | Msgs | Input | Output | cc5m | cr | Cost |
|---|---|---|---|---|---|---|---|---|---|
| recon | scout | 4m 58s | 50 | 68 | 74 | 8,992 | 191,158 | 4,016,805 | $2.06 |
| shortcode-and-block | Tester | 6m 29s | 63 | 97 | 103 | 7,758 | 159,448 | 7,964,348 | $3.10 |
| cross-feature-lifecycle | Tester | 6m 47s | 58 | 83 | 89 | 12,066 | 215,698 | 6,929,183 | $3.07 |
| gallery-scale | Tester | 7m 09s | 51 | 72 | 78 | 10,484 | 137,789 | 5,489,551 | $2.32 |
| gallery-post-type-admin | Tester | 8m 48s | 82 | 116 | 122 | 10,187 | 408,059 | 11,101,371 | $5.01 |
| settings-page | Tester | 9m 04s | 82 | 121 | 127 | 11,157 | 259,896 | 11,309,121 | $4.54 |
| frontend-gallery-render | Tester | 11m 19s | 95 | 124 | 130 | 14,656 | 165,320 | 10,867,793 | $4.10 |
| Productive totals | 54m 34s serial | 481 | 681 | 723 | 75,300 | 1,537,368 | 57,678,172 | $24.20 |
Concurrent-wave compression: 6 Testers + recon (serially first, then 6 in parallel). Recon took 4m 58s; the 6-Tester wave then ran ~11m 19s (bounded by longest). Combined wallclock ~16-17 min for 50 min of sequential work (recon + 6 Testers). Compression ratio: ~3× for the full sequence, or ~4.8× just for the Tester wave (45m 36s sequential / ~11m 19s wallclock).
frontend-gallery-renderwas richest: 9 Problems. Deep audit of escape chain, a11y, noscript, deleted-attachment, duplicate lightbox ID, and carousel off-by-one. Still only 11m 19s — cheap per finding.shortcode-and-blockwas fastest at 6m 29s: the block-absent finding + draft-leak finding were both source-analysis + quick DOM probe.- Recon's 50 tool uses produced 5 directly-filable observations, all of which became filed Problems downstream. Recon-to-charter handoff discipline held.
| Denominator | Value |
|---|---|
| Total cost / planted caught (8) | $6.62 per planted bug (or $5.42 with classifier-spillover subtracted) |
| Total cost / Problem filed (27) | $1.96 per Problem (or $1.60 without spillover) |
| Total cost / all PQIP items (54) | $0.98 per PQIP item |
Comparable to Pilot 5 rerun's $4.77/planted and better than Pilot 4 rerun's $5.50. Not a step-change but holding — the harness is maintaining cost discipline across different plugin shapes.
| # | Amendment | Fired? | Where | Notes |
|---|---|---|---|---|
| 1. Empty / one / many states | ✓ | 6/6 coverage notes | clean | |
| 2. Absence-of-feature | ✓✓✓ | LOAD-BEARING — 4 distinct findings: dead lazy, dead webp, no uninstall, no cap check | strongest convergence on this shape | |
| 3. Plugin-native writes | ✓ | Testers used UI / media picker where possible; direct DB only for scale + state-variety seeds | clean | |
| 4. Cross-feature MANDATORY | ✓✓ | cross-feature-lifecycle + draft-leak caught by 2 charters | clean | |
| 5. UI-path before "missing" | ✓ | block-absent finding verified via inserter search + registry object | clean | |
| A. Inline counters | — | no fuel (no counters on this plugin) | correct non-fire | |
| B. State variety | ✓ | gallery-post-type-admin probed string-typed _mgl_image_ids (vs array) — exactly the recipe |
caught the CRITICAL fatal | |
| C. Enumerate root-cause | ✓ | lazy toggle dead → sibling-propagated to webp toggle dead + dead capability check + dead uninstall | clean | |
| D. Unsaved-work protection | **~ drift** | beforeunload probed but filed as Question rather than Problem | drift — same pattern as E-ext in Pilot 5 rerun | |
| E. Admin two-tab concurrent | — | no fuel (no admin-form concurrent-edit bug on this plugin) | correct non-fire | |
| E-ext. Rapid-double-submit | ✓ | gallery-post-type-admin + settings-page both ran empirical probe; save is idempotent, filed as Praise | clean fire after Pilot 5 rerun tightening landed via rule text | |
| F. View-source HTML | ✓ | shortcode-and-block used view-source to confirm block absence + frontend-gallery-render used raw fetch | clean | |
| G. DDL column types | — | 3 Testers explicitly recorded non-applicability — "no custom DB tables in plugin → Amendment G correctly does not fire" | generalization test PASSED — zero overfire | |
| Reinf 5 empty-state MANDATORY | ✓ | 6/6 coverage notes | clean | |
| Reinf 8 cross-feature MANDATORY | ✓ | 5/6 coverage notes | clean | |
| pqip.propagate-sibling-features | ✓ | lazy-dead → webp-dead → uninstall-missing chain | clean | |
| pqip.UI-path-before-claim | ✓ | no over-claims filed | clean |
Fired actively: 13/17. Correctly did not fire: 3/17 (A, E, G — no fuel). Drift: 1 (D — same drift pattern as E-ext in Pilot 5 rerun).
Three Testers on this plugin explicitly recorded that Amendment G was non-applicable:
"Amendment G: no custom DB tables in plugin (post-meta only); rule correctly does not fire" "Amendment G coverage note: Plugin has no custom DB tables in schema. Gallery state is in wp_postmeta using WP core schema, which is appropriate for the value semantics the plugin stores." "Amendment G verdict: Plugin has no custom DB tables (only WP core tables in schema). Amendment G (DDL column type inspection) correctly does not fire."
Zero overfire on wp_postmeta storage (which WOULD have been wrong — meta values are strings/longtext by core schema, and that's correct for the data). The rule stayed in its lane.
In Pilot 5 rerun, Amendment E-ext fired as a Question rather than a Problem because the Tester source-inspected instead of executing the empirical probe. I proposed tightening the rule text but the tightening was never shipped (grep "empirical probe must" in skills/tester-mindset/SKILL.md returns 0 hits).
In Pilot 6 the same drift pattern repeats in sibling rules:
- Amendment D (unsaved-work): Tester noted missing
beforeunloadlistener via DOM inspection → filed as Question. - Generic a11y probe on 1-image carousel: Tester observed the NaN cycling via source reading → filed as Question.
Two new drift cases, same class. The drift is not E-ext-specific — it's a generic "empirical-is-mandatory" gap across all probe-class amendments.
Add a global "empirical discipline" rule at the top of the probe-class section:
Every probe-class amendment is discharged only by executing the probe empirically. Source inspection, registry inspection, DOM snapshot inspection, and evidence of absence in code are all valid PRELUDES to a filed finding — they help identify what to probe. They do NOT discharge the rule. The rule is discharged only when:
- You execute the empirical reproducer the rule specifies (rapid-double-submit, beforeunload trigger, keyboard close, empty state, two-tab, etc.)
- You observe the behavior directly through a browser-driver verb call OR a side-effect count (DB row count,
wp post meta get, HTTP request count)- You file the result with the empirical evidence — a Problem if the probe demonstrates a bug, a Praise if the probe demonstrates correct behavior, a Question only if the empirical probe is architecturally blocked (environment unreachable, hook not firable, etc.)
Source-inspected evidence of absence (e.g., "
grep onbeforeunloadreturns 0 matches") is supporting evidence for a filed Problem, not a substitute for the empirical probe. When you file a Question citing source-inspection alone, you are skipping the rule.
Ships as a global policy at the top of skills/tester-mindset/SKILL.md, before the individual probe-class amendments. Targets the drift class, not any single amendment.
For the Escape-key / keyboard-close miss on lightbox (Issue 9): extend the generic overlay-UI probe rule to mandate a keyboard-close check on any dismissable overlay (lightbox, modal, popup, drawer, menu). Simple rule, applies to every frontend plugin that renders an overlay.
All 5 recon findings became filed Problems:
| Recon finding | Filed where | Verdict |
|---|---|---|
Fatal on string-typed _mgl_image_ids |
gallery-post-type-admin P1 | CRITICAL, filed |
| Block absent from inserter | shortcode-and-block P1 | CRITICAL, filed |
| Empty alt on images | frontend-gallery-render P2 | MAJOR, filed |
| Lightbox no aria-label | frontend-gallery-render P1 | MAJOR, filed |
display:none + no noscript |
frontend-gallery-render P3 | MAJOR, filed |
Zero silent drops. This is the cleanest recon-to-charter handoff in the pilot history — the Phase 2 → Phase 3 discipline held.
- Ship Amendment I (empirical-probe-is-mandatory cross-amendment rule) + Amendment H (keyboard-close on overlay UIs). Both target generalizable bug/drift classes, not gallery-specific fixes.
- Defer magellan-gallery rerun — run H + I against Pilot 7 for a cleaner attribution test.
- Pilot 7 candidate:
magellan-speed(caching / perf — tests SFDPOT Time dimension, never exercised) OR a plugin with a REST route (REST surface uncovered).
Six pilots, five reruns, consistent convergence to ≥ 8/10 blind on amended harness:
- Sonnet + amendments validated across four plugin shapes (members / seo-toolkit / pay / gallery)
- Amendment G validated as non-overfiring on a plugin without custom DB tables (strongest generalization signal)
- Amendment 2 (absence-of-feature) is reliably the load-bearing amendment across diverse shapes
- Drift class across probe-amendments identified — single rule fix (Amendment I) should resolve it across all future pilots
The loop continues to compound. Every amendment from prior pilots still fires where applicable; no regressions.
- Final report:
runs/2026-04-24T12-19-02_magellan-gallery/final-report.md - Escape analysis:
runs/2026-04-24T12-19-02_magellan-gallery/escape-analysis.md - Token usage:
runs/2026-04-24T12-19-02_magellan-gallery/token-usage.json - Manifest:
runs/2026-04-24T12-19-02_magellan-gallery/manifest.json - 6 session reports:
runs/2026-04-24T12-19-02_magellan-gallery/sessions/<slug>/report.json - Static analysis:
runs/2026-04-24T12-19-02_magellan-gallery/static-analysis.md - Recon:
runs/2026-04-24T12-19-02_magellan-gallery/recon.md - Coverage plan:
runs/2026-04-24T12-19-02_magellan-gallery/coverage.md