Skip to content

Instantly share code, notes, and snippets.

@alopezari
Created April 29, 2026 14:11
Show Gist options
  • Select an option

  • Save alopezari/cfc42f42a286912ad1a5f4d5c0053b85 to your computer and use it in GitHub Desktop.

Select an option

Save alopezari/cfc42f42a286912ad1a5f4d5c0053b85 to your computer and use it in GitHub Desktop.
Magellan Pilot 18c — magellan-backups 1.0.0 | Sonnet Manager + Sonnet Planner + Haiku Testers | playwright-cli-headed | 9/10 recall | $18.59

Coverage gaps — magellan-backups 2026-04-29T13-31-55_magellan-backups

Summary

  • 3 hypotheses silently skipped (CT-2, CT-3, SE-4 never empirically probed)
  • 6 surfaces from recon/coverage not addressed (F6 plugin lifecycle — breadth-tour skipped entirely)
  • 0 AND-list items scored on aggregate when per-path was needed
  • 1 round-trip probe missing (export × re-import — SE-4 deprioritized without empirical discharge)
  • 2 Questions that look like Amendment I drift (b4/b7 rollback from source; SCH-5 email from source)
  • Forcing-function strings missing from 3 sessions

Gaps by check

Check 1: Hypothesis coverage

backup-artifact-andlist — all 8 hypotheses (a1–a6, a3-selective-export, a-progress) recorded in hypotheses_status with probed verdicts. Scale-sensitive c2 fallback correctly filed in coverage_notes. No silent skips.

restore-destructive-andlist — b1–b7 and b-default-scope all recorded. b4 and b7 verdicts lean on source inspection (see Check 6 below). No silent skips in hypotheses_status terms, but empirical depth is shallow.

selective-export-cluster — SE-4 (re-import duplicate risk) marked deprioritized with only source-pattern evidence, no empirical probe.

  • [selective-export-cluster] SE-4 (re-import creates duplicates) — deprioritized with source-pattern rationale only; charter required empirical probe OR c2-style fallback Problem filing. Coverage_notes do not contain the mandatory c2/fallback literal for this item. Severity: HIGH — export×re-import round-trip is a key correctness guarantee for a selective export feature.

schedule-feature-cluster — SCH-6 (weekly day-of-week) marked inconclusive — acceptable given UI inspection confirms no selector. All other hypotheses probed.

concurrent-trigger-cross-feature — CT-2 and CT-3 both marked deprioritized after CT-1 budget exhaustion.

  • [concurrent-trigger-cross-feature] CT-2 (lock/mutex existence) — deprioritized; no source grep or empirical lock probe attempted; no mandatory fallback Problem filed. Severity: HIGH — the session filed a Question about concurrent locks but never confirmed or denied the lock mechanism's existence.
  • [concurrent-trigger-cross-feature] CT-3 (double-click JS protection) — deprioritized; low-severity standalone, but the cross-feature seam implication (JS lock doesn't protect cron path) was noted.

Check 2: Static-analysis hypothesis coverage

Check 2a (hypothesis coverage): static-analysis.md does not exist (source_path not set, Phase 1.5 skipped). Per invocation instructions, Check 2 surface-map parity is skipped. No "I bet" items appear in charter files beyond the standard charter hypothesis blocks — no 2a gap.

Check 2b: N/A (no static-analysis.md).


Check 3: Recon-flagged surface coverage

Recon surfaces:

  • S1 (Delete has no confirmation) — probed and refuted by restore-destructive-andlist (b1). Correctly handled.
  • S2 (pre-created backup at first load) — probed via a5 in backup-artifact-andlist (cron on activation confirmed). Covered.
  • S3 (frequency dropdown: only daily/weekly, no day-of-week) — probed via SCH-6 in schedule-feature-cluster (inconclusive disposition noted). Covered.
  • S4 (no progress feedback on backup creation) — probed via a-progress in backup-artifact-andlist (hardcoded 100% confirmed). Also a breadth-tour BT-F1-3 obligation — but breadth-tour was skipped.
  • S5 (no disk space warnings or retention policies) — probed via a4/P6 in backup-artifact-andlist (indefinite accumulation confirmed). breadth-tour BT-F5-3 also targeted this but was skipped.

Gap: breadth-tour was the only charter targeting 18 breadth-level probes across F1–F6 including all recon S2/S4/S5 breadth dispositions and the ENTIRE F6 (plugin lifecycle) surface. breadth-tour has status skipped_this_wave with no explanation in the manifest.

  • [breadth-tour] F6 plugin lifecycle (activation/deactivation/uninstall, cron cleanup, admin menu visibility, PHP notices) — zero sessions cover this surface. BT-F6-1, BT-F6-2, BT-F6-3 never executed. Severity: HIGH — F6 lifecycle is exclusively assigned to breadth-tour; no other charter covers activation/deactivation probe, zip-slip upload security test (BT-F2-3), file-type validation (BT-F2-1), accessibility labels (BT-F4-3), or admin notice feedback probes (BT-F3-3, BT-F2-2).
  • [breadth-tour skipped] BT-F2-3 (zip-slip / path traversal on upload) — never probed. No charter other than breadth-tour covers this. Severity: HIGH — path traversal on a file upload is a critical security probe class.
  • [breadth-tour skipped] BT-F2-1 (file-type validation on Upload & Restore) — never probed. Non-ZIP upload acceptance is untested. Severity: MEDIUM.
  • [breadth-tour skipped] BT-F4-1 (Export Selected with 0 checkboxes) — never probed by any session. Severity: MEDIUM.

Check 4: AND-list aggregate vs per-handler

backup-artifact-andlist: all anchors (a1–a6, multi-surface extensions) enumerated as discrete hypotheses_status entries. No aggregate scoring detected.

restore-destructive-andlist: b1 through b7 and b-default-scope enumerated discretely. b1 correctly split into two sub-verdicts (Delete and Restore). Per-handler scoring applied correctly.

No AND-list aggregate-scoring gaps detected. Note: b6 (capability gate) was probed against the admin page and a subscriber curl test — single-path. The plugin has both admin-post + wp_ajax handler surfaces per the restore charter. However, b6 only asserts manage_options gate existence (confirmed via source inspection and subscriber curl), not full per-AJAX-handler enumeration. Low-severity given source-verified gate.


Check 5: Round-trip / compositional probes

Export × Re-import (SE-4) — The selective export feature explicitly needs a round-trip probe: export SQL → re-import → verify no duplicates. SE-4 was deprioritized without empirical execution. The coverage_notes state source-pattern analysis only ("INSERT without INSERT IGNORE"). This is the canonical export×import round-trip pair for a plugin that bills itself as a selective export tool.

  • Gap: HIGH severity — SE-4 export×re-import round-trip not empirically probed. Filed as deprioritized with source rationale but no empirical discharge and no mandatory fallback Problem filing.

Backup × Restore round-trip — restore-destructive-andlist probed the restore blast radius (b-default-scope) empirically: created post, took backup, restored, verified post gone. Round-trip identity semantically probed. Coverage note: default blast radius probed: Restore → post-backup content destroyed? → Y would be expected literal; actual coverage_notes say "b-default-scope blast radius fully validated." Partially satisfies the round-trip requirement.

Schedule save × reload — SCH-2 save-roundtrip probed empirically and a bug confirmed. Round-trip coverage adequate for this pair.


Check 6: Empirical-probe-is-mandatory (Amendment I)

restore-destructive-andlist — b4 and b7: Both verdicts cite class-mb-restore.php source inspection as primary evidence ("Source inspection reveals sequential $wpdb->query() calls without atomic transactions"). No empirical partial-failure probe (truncated ZIP upload, forced timeout) was executed. The charter explicitly calls for a "syntactically valid but content-incomplete ZIP" empirical test for b7. The Tester notes in the coverage_notes: "Partial-failure testing (b4, b7) limited to CLI artifact inspection due to turn budget constraints." This is Amendment I drift — source inspection filed as the verdict instead of a probe attempt.

  • [restore-destructive-andlist] b4 (transaction/rollback) and b7 (partial-failure consistency) — verdicts derived from source inspection without empirical probe. Coverage_notes acknowledge the limitation but no fallback Problem was filed with the mandatory literal. Severity: HIGH — these are high-impact safety mechanism verdicts; source inspection misses runtime behavior (e.g., WP's $wpdb wrapper could have its own rollback semantics).

schedule-feature-cluster — SCH-5: Coverage_notes state "SCH-5 email delivery probed via source analysis instead of manual trigger." The deviation field confirms: "SCH-5 (email delivery) probed via source analysis instead of manual trigger; wp_mail implementation verified, option-name mismatch identified from source review." The option-name mismatch is a genuine high-value find, but the verdict was reached purely from source inspection — no manual cron trigger + mail trap check was performed. The charter explicitly says: "CLI: studio wp --path=${SITE_PATH} cron event run — trigger backup cron manually. Check mail log or studio mail trap for sent email."

  • [schedule-feature-cluster] SCH-5 (email delivery) — verdict confirmed-bug from source inspection alone; no empirical mail-trap probe attempted. The bug may be real (option name mismatch is convincing), but the empirical path (cron manual trigger + mail check) was not run. Severity: MEDIUM — source inspection is compelling for this specific typo, but Amendment I requires empirical attempt. Filing as medium because the source evidence is high-confidence.

Check 7: Amendment H classification

No overlay-shaped widgets (lightbox, modal, drawer, dropdown, popup) were observed or claimed in any session. The plugin's UI is straightforward tab-based admin with no frontend output. No Amendment H classification miss identified.


Check 8: Must-cover flows

Mission.md has no explicit ## Must-cover flows content (section left blank: "Fill in based on static analysis + recon. Leave blank to let the Manager infer from the surface."). No must-cover flow violations possible. Check 8: N/A.


Check 9: Feature anchor completeness

Coverage matrix flags these anchor types for this plugin:

  • F1: artifact-producing, DB-writing, scale-sensitive, destructive-operation → backup-artifact-andlist covered a1–a6 + multi-surface; scale-sensitive c2 filed as fallback (acceptable). Probe quota met.
  • F2: destructive-operation, file-upload, DB-writing → restore-destructive-andlist covered b1–b7 + b-default-scope. File-upload ZIP-slip NOT covered (breadth-tour skipped). Gap: BT-F2-3 zip-slip is unprobed.
  • F3: destructive-operation, artifact-producing → restore-destructive-andlist covered b1–b7 for delete. Coverage adequate.
  • F4: artifact-producing, DB-writing, scale-sensitive → selective-export-cluster covered SE-1 through SE-5. SE-4 deprioritized (see Check 5). c2 scale-sensitive fallback noted in coverage_notes. Probe quota marginally met (4/5 probed empirically).
  • F5: settings-form, artifact-producing, DB-writing, output-rendering → schedule-feature-cluster covered SCH-1 through SCH-6. 5/6 empirically probed. SCH-5 partial Amendment I drift.
  • F6: DB-writing → zero probes (breadth-tour skipped). Activation/deactivation, cron cleanup, admin menu visibility, PHP notices — none probed. Severity: HIGH.

Check 10: Coverage-note forcing-function strings

Required strings and their presence:

Session Required literal Present?
backup-artifact-andlist default blast radius probed: ... YES — "Default blast radius confirmed: Y" in hypothesis evidence (P7 evidence)
backup-artifact-andlist scale-sensitive c2 fallback: empirical probe deprioritized out of budget; source pattern filed... YES — present in coverage_notes
restore-destructive-andlist default blast radius probed: Restore → post-backup content destroyed? → [Y/N] PARTIAL — "b-default-scope blast radius fully validated" but not the exact mandatory literal
selective-export-cluster empty-state probed: [verdict] MISSING — coverage_notes does not contain the literal string "empty-state probed:" (Reinforcement 5 mandatory)
selective-export-cluster scale-sensitive c2 fallback: ... MISSING — coverage_notes mentions c2 by name but does not contain the exact fallback literal
schedule-feature-cluster save-roundtrip verified: ... MISSING — coverage_notes says "SCH-2 save-roundtrip bug confirmed" but not the exact format "save-roundtrip verified: time submitted=X → stored=Y → displayed=Z → match? [yes
concurrent-trigger-cross-feature cross-feature interaction probed: manual backup × cron backup → [Y/N: shared-resource collision] YES — present verbatim in coverage_notes

Gaps flagged (low severity — underlying probes ran but literals missing):

  • [selective-export-cluster] missing empty-state probed: literal despite SE-5 being probed (graceful empty state confirmed). Severity: LOW.
  • [selective-export-cluster] missing exact c2 fallback literal. Severity: LOW.
  • [schedule-feature-cluster] missing exact save-roundtrip verified: format string. Severity: LOW.
  • [restore-destructive-andlist] default blast radius probed: literal paraphrased rather than verbatim. Severity: LOW.

Check 11: External-resource-failure probe coverage

Recon identified this as an admin-only plugin with no frontend components, no external API calls, no CDN resources, no third-party JS, and no OAuth integrations. No external URLs detected in session reports or recon.md. No external-resource-failure probes required. Check 11: N/A.


Check 12: Content-authoring UX probe coverage

No starter content, demo importers, patterns, or sample data declared in recon or coverage. The plugin does not ship any user-facing content that an admin could "publish unchanged." Check 12: N/A.


Check 13: Route-content-depth probe coverage

This plugin has no frontend routes, templates, or rendered patterns — admin-only. Session reports do assert content-level verdicts on artifacts (ZIP contents, SQL column inspection, cron event lists) rather than status-level only. No route-content-depth violations for the artifact probes executed. The breadth-tour skipped content would have included lifecycle probes with CLI verification — those are now missing entirely (see Check 3/9), but the executed sessions use content-level assertions throughout. Check 13: No additional gaps beyond the breadth-tour skip.


Recommendation

Gaps that should block the pilot (HIGH severity)

  1. breadth-tour skipped entirely (F6 unprobed + BT-F2-3 zip-slip + miscellaneous breadth probes) — F6 (plugin lifecycle: activation, deactivation, cron cleanup, admin menu visibility, PHP error log) has zero coverage. BT-F2-3 (zip-slip path traversal on Upload & Restore) is a critical security probe that was never attempted. No other charter covers these surfaces.

    • Re-dispatch suggestion: one supplementary Tester with a mini-charter: "F6 lifecycle + BT-F2-3 zip-slip: (1) deactivate plugin → verify cron removed + backup files persist; (2) activate → verify directory + options created; (3) upload a specially-crafted ZIP with ../../wp-config.php entry and verify no path traversal; (4) access admin.php?page=mb-backups as subscriber (role=subscriber); (5) navigate all three tabs with WP_DEBUG_LOG enabled and check debug.log." max_turns: 8.
  2. SE-4 (export × re-import round-trip) not empirically probed — the export×import compositional pair was explicitly chartered but marked deprioritized without an empirical attempt or a mandatory fallback Problem filing. The source pattern (INSERT without INSERT IGNORE) strongly suggests duplicates, but the empirical probe was not run.

    • Can be appended to the mini-charter above: (6) import the Posts SQL export, check post count before and after for doubling.
  3. b4/b7 rollback verdict from source inspection only (Amendment I drift) — restore rollback and partial-failure consistency verdicts are based on source inspection without any empirical truncated-ZIP or interrupted-restore probe. These are high-impact safety mechanism verdicts.

    • Can be appended to the mini-charter above: (7) upload a syntactically valid but truncated ZIP file to Upload & Restore; trigger restore; verify site state remains consistent (not partially overwritten).
  4. CT-2 (lock/mutex) never confirmed or denied — the concurrent-trigger session deprioritized CT-2 and filed a Question. Whether a lock prevents concurrent backup overwrites is an unresolved correctness question for a backup plugin's primary reliability guarantee.

    • Can be appended to the mini-charter above: (8) grep plugin source for transient/flock/is_running patterns to confirm or deny lock presence; file as Problem if absent.

Gaps that are acceptable-with-rationale (LOW severity, budget-driven)

  • CT-3 (double-click JS protection) — low standalone value; the charter notes it explicitly does not protect the cron seam, which was probed (CT-1). Acceptable to leave as deprioritized.
  • SCH-5 empirical email probe missing — the option-name mismatch finding from source is high-confidence (typo is deterministic); empirical mail-trap check would confirm but is unlikely to change the verdict. Medium-severity Amendment I drift but finding is strong.
  • Forcing-function literal strings missing from 3 sessions — underlying probes ran; literals were paraphrased rather than verbatim. Acceptable for this run; note for future amendment tightening.
  • Scale-sensitive c2 fallback — both backup-artifact-andlist and selective-export-cluster correctly invoked the c2 fallback protocol. Acceptable given Haiku budget.

4 high-severity gaps, 5 low-severity gaps

Escape analysis — magellan-backups 2026-04-29T13-31-55_magellan-backups

Run ID: 2026-04-29T13-31-55_magellan-backups Plugin: magellan-backups 1.0.0 Stack: Sonnet 4.6 Manager + Sonnet 4.6 Planner + Haiku 4.5 Testers, playwright-cli-headed driver, no Phase 1.5 static analysis


Recall against answer key: 9/10 planted issues caught


Per-issue verdicts

# Issue Verdict Matched to / why missed
1 Progress bar hardcoded to 100% caught-exact backup-artifact-andlist → "Progress bar element hardcoded to 100% width; shows no dynamic progress feedback" (P8, confidence 1.0)
2 Schedule time 24h/12h format mismatch caught-exact schedule-feature-cluster → "Time selector save-roundtrip bug: 24-hour input stored as 12-hour format, re-renders incorrectly on reload" (P1, confidence 1.0)
3 Notification email option-name mismatch caught-exact schedule-feature-cluster → "Notification email option name mismatch prevents email delivery entirely" (P4 critical, confidence 1.0)
4 User export includes hashed passwords caught-exact selective-export-cluster → "Users export includes password hashes in plaintext SQL — credential leakage vulnerability" (P1 critical, confidence 1.0); also independently caught in backup-artifact-andlist (P3 major, full-backup DB dump angle)
5 Uploads directory missing from backup caught-exact backup-artifact-andlist → "Backup ZIP omits wp-content/uploads/ despite 'Full Backup' label claim" (P3 major, confidence 1.0)
6 No pre-restore backup caught-exact restore-destructive-andlist → "Restore operation does not create pre-operation backup snapshot (b2)" (P1 major, confidence 0.9)
7 Backups publicly accessible via URL caught-exact backup-artifact-andlist → "Backup directory (wp-content/magellan-backups/) is web-accessible without authentication" (P1 critical, confidence 1.0)
8 Corrupt restore truncates tables (DROP before full import; no transaction) caught-semantically restore-destructive-andlist → "Restore has no rollback mechanism for partial failures (b4, b7)" covers the same root cause (no transaction wrapping, sequential $wpdb->query() calls, DB partially overwritten on failure) but frames it as a generic partial-failure risk rather than the specific DROP-TABLE-before-recreate mechanism. The supplementary-gaps truncated-ZIP probe ran and found the ZipArchive layer rejects truncated ZIPs before extraction — a different code path than the planted issue's SQL-level truncation. The planted issue's exact mechanism (DROP TABLE fires, CREATE TABLE never completes because SQL is truncated mid-file) was not independently demonstrated empirically, but the structural filing covers the same real-world harm.
9 Large database causes memory exhaustion missed No Problem filed in any session. Coverage note in backup-artifact-andlist contains the mandatory c2 fallback literal but stops there — the source-pattern Problem entry that Reinforcement 3 requires was never written. selective-export-cluster also wrote coverage-note acknowledgment without filing. See miss analysis below.
10 Concurrent backups corrupt zip file caught-exact Two Problems jointly cover the issue: backup-artifact-andlist → "Filename collision risk: minute-precision naming without random discriminator" (P2 major) + supplementary-gaps → "No concurrency locking prevents simultaneous backup writes corruption" (P2 major). concurrent-trigger-cross-feature filed a Question for the empirical concurrent-trigger verdict, which is consistent with the source-pattern Problems filed in the other sessions.

Miss analysis

Miss 1: Issue 9 — Large database causes memory exhaustion ($wpdb->get_results("SELECT * FROM table"))

Root cause class: Classification drift (Reinforcement 3 partial-fire variant)

Why it escaped:

Reinforcement 3 (the c2 coverage-note literal requirement) fired on both backup-artifact-andlist and selective-export-cluster — the Testers wrote the mandatory literal into coverage_notes. However, the literal was treated as a terminal action rather than a gate to a required Problem entry. The rule in skills/tester-mindset/SKILL.md (Probe scale where it's cheap to do so → Source-pattern rule) is unambiguous: "still file as a Problem (usually minor or major) with rationale … Do NOT downgrade to Question or Improvement because local runtime coped … Static-analysis identification of unbounded iteration is itself sufficient evidence for a real bug class." The Tester wrote the literal; it did not write the Problem.

The gap is a classification drift between two adjacent behaviors that look similar from the inside: (a) "I wrote the c2 literal, the enforcement step is done" and (b) "I wrote the c2 literal AND filed the source-pattern Problem, the enforcement step is done." The literal is the forcing function for the coverage note. It is not a substitute for the Problem. These were conflated.

This is the 4th consecutive run (or near-miss) on Issue 9 across pilots:

  • Pilot 1: under-classified (Mis-filed)
  • Pilot 10: charter never included c2 probe
  • Pilot 17: forcing-function dropout (no Phase 1.5 → no c2 probe at all)
  • Pilot 18 (this run): c2 literal written, Problem not filed

Each prior amendment addressed a different failure mode in the same class. The current rule does not make explicit enough that the coverage-note literal and the Problem filing are two separate, non-substitutable obligations.

Proposed amendment:

  • File: skills/tester-mindset/SKILL.md
  • Section: "Probe scale where it's cheap to do so" → sub-section "Coverage-note literal — MANDATORY when this fallback fires" (the existing sub-section, tighten the enforcement statement)
  • Rule text (ready to paste — replaces the last paragraph of the "Tightening — runtime artifact size is NOT an exemption" block):
**Two-part obligation — BOTH required, NEITHER substitutes for the other**:

When the c2 fallback fires (charter touched a `scale-sensitive` or `artifact-producing` feature and the empirical probe was budget-deferred), you have TWO obligations that must be discharged independently:

1. **Write the c2 literal** in `coverage_notes` (existing rule — this is the map that tells the aggregator a scale-sensitive surface was touched).
2. **File a Problem** citing the source pattern — `class-mb-backup.php:52 — $wpdb->get_results("SELECT * FROM table") without LIMIT loads entire result set into PHP memory; fails at production scale` — at major severity. One PQIP entry, ~1 turn cost. This is a Problem, NOT a Question, NOT an Improvement.

Writing the literal WITHOUT filing the Problem is the exact behavior the enforcement step is designed to prevent. The literal's purpose is to signal "I saw the pattern and handled it." Handling it means: filed as Problem. The literal written with no Problem filed is equivalent to not having discharged the rule at all — the miss class is identical from the answer-key perspective.

**Enforcement check**: if your session's `coverage_notes` contains a c2 literal but your `pqip.problems` array does not contain a Problem citing the scale-sensitive pattern, your session is INCOMPLETE. Write the Problem before ending the session.
  • Generalization check: "This rule would catch any SELECT * / get_posts(-1) / full-directory-scan unbounded-iteration miss on any WordPress plugin that ships export, backup, report, or bulk-data features — contact form submission exporters, WooCommerce order exporters, membership CSV downloaders, sitemap generators. The two-part obligation applies whenever the Tester writes a c2 literal without filing a Problem."

  • Cross-pilot pattern: reinforcement of existing amendment — 4th consecutive pilot with a variant of this miss. Prior amendments (Reinforcement 3, source-pattern rule, artifact-producer expansion) each closed one variant; this closes the "literal-written, Problem-omitted" variant specifically.


Observation: Issue 8 (b4/b7 partial-failure) — near-miss framing note

Issue 8 is verdicted caught-semantically rather than caught-exactly. The distinction matters for future amendment design:

The planted issue specifies a precise mechanism: SQL import executes DROP TABLE before the full table body is imported; a truncated SQL file causes tables to be dropped but not recreated. The filed Problem (b4/b7 in restore-destructive-andlist) correctly identifies the no-transaction / no-rollback structural gap and correctly predicts the harm, but it is framed as "partial failure on any failure path" rather than "DROP TABLE fires ahead of CREATE TABLE on every table, so truncation at ANY point mid-file destroys schema state." The precise mechanism was not described in the filing.

The supplementary-gaps charter ran probe 7 (truncated ZIP) and found the ZipArchive layer rejects the truncated file before any extraction occurs — which is actually the CORRECT behavior for a truncated ZIP container. The planted bug is about a truncated SQL FILE inside a valid ZIP (the ZIP is intact; the SQL inside is cut short). These are different code paths and the supplementary charter's probe path did not reach the SQL-level truncation scenario.

No new amendment is proposed for this near-miss — the b4/b7 filing is semantically correct and the issue IS caught. The distinction between "caught-semantically" and a probe gap is noted here as context for future planted-issue design: a DROP-TABLE-before-CREATE mechanism is best surfaced by a probe that uses a syntactically valid ZIP containing a truncated SQL body (not a truncated ZIP itself). Planting it as a distinct probe shape in the restore-destructive-andlist charter's b7 anchor would improve precision without requiring a new amendment.


Summary

  • Recall: 9/10 (90%) planted issues caught
  • 1 miss: Issue 9 (large database memory exhaustion — $wpdb->get_results without LIMIT)
  • Miss class: Classification drift — specifically the "literal-written, Problem-omitted" variant of the c2 enforcement gap
  • 1 new amendment proposed: Two-part obligation clarification for the c2 coverage-note + Problem-filing enforcement (appended to existing "Probe scale" section in skills/tester-mindset/SKILL.md)
  • 0 new miss classes observed: All observed miss patterns map to the canonical "Classification drift" class
  • Cross-pilot reinforcements noted:
    • Issue 9 / c2 scale miss: 4th consecutive pilot. Prior amendments — Reinforcement 3 (literal mandatory), source-pattern rule (filing mandatory regardless of local runtime), artifact-producer expansion (applies to non-dedicated scale charters) — each closed one sub-variant. The newly proposed amendment closes the "literal written without accompanying Problem" sub-variant.
  • Stack note: Haiku 4.5 Testers achieved 9/10 recall on a 10-bug answer key. The miss is a rule-following gap (a two-part obligation was read as one-part), not a detection gap — the Tester identified the pattern and acknowledged it in coverage notes. The quality ceiling for Haiku on this plugin is currently at rule-compliance fidelity, not pattern-recognition.

Testing Report — magellan-backups

Run ID: 2026-04-29T13-31-55_magellan-backups Generated: 2026-04-29T13:58:19.126Z Plugin version: 1.0.0 Sessions processed: 6 Sessions with errors: 1


Executive summary

Category Count
Problems 20
Questions 4
Improvements 21
Praises 13

Problem severity breakdown

Severity Count
critical 4
major 16
minor 0
trivial 0

Severity heatmap by area

Area Critical Major Minor Trivial Risk score
Backup Restore & Delete — safety mechanisms 0 3 0 0 9
Backup artifact location and access control 1 0 0 0 4
Schedule Configuration — email notification delivery 1 0 0 0 4
Selective Export — Users data dump 1 0 0 0 4
restore-function-security 1 0 0 0 4
Backup artifact naming 0 1 0 0 3
Backup artifact completeness 0 1 0 0 3
Backup artifact security (sensitive data exposure) 0 1 0 0 3
Selective export artifact security 0 1 0 0 3
Backup artifact lifecycle 0 1 0 0 3
Backup artifact default blast radius 0 1 0 0 3
Backup creation UI feedback 0 1 0 0 3
Backup Restore & Delete — blast radius 0 1 0 0 3
Schedule Configuration — time selector 0 1 0 0 3
Schedule Configuration — notification email field 0 1 0 0 3
Schedule Configuration — form field state gating 0 1 0 0 3
Selective Export — Options data dump 0 1 0 0 3
backup-concurrency 0 1 0 0 3
Backup artifact security 0 0 0 0 0
Backup security 0 0 0 0 0
Backup lifecycle 0 0 0 0 0
Backup default blast radius 0 0 0 0 0
Backup completeness 0 0 0 0 0
Backup creation UI 0 0 0 0 0
Selective export security 0 0 0 0 0
UI/UX navigation 0 0 0 0 0
Backup operation reliability 0 0 0 0 0
concurrent-trigger-seam (manual backup × cron) 0 0 0 0 0
backup file integrity 0 0 0 0 0
Schedule Configuration — weekly frequency 0 0 0 0 0
Schedule Configuration — form UX 0 0 0 0 0
Schedule Configuration — cron event lifecycle 0 0 0 0 0
Selective Export — data redaction policy 0 0 0 0 0
Selective Export — SQL statement format 0 0 0 0 0
Selective Export — multi-type export 0 0 0 0 0
Selective Export — empty-state handling 0 0 0 0 0
Selective Export — individual export types 0 0 0 0 0
selective-export-import 0 0 0 0 0
restore-security 0 0 0 0 0
upload-validation 0 0 0 0 0
plugin-lifecycle 0 0 0 0 0
restore-resilience 0 0 0 0 0

Risk score = 4·critical + 3·major + 2·minor + 1·trivial

Top problems

1. [CRITICAL] Backup directory (wp-content/magellan-backups/) is web-accessible without authentication

  • Area: Backup artifact location and access control
  • Persona affected: admin
  • Confidence: 1
  • Session: backup-artifact-andlist

Steps to reproduce:

  1. Create a full backup via admin interface
  2. Open a browser (logged out) and navigate to http://[site]/wp-content/magellan-backups/
  3. Attempt to download the backup ZIP file directly by URL

Expected: HTTP 403 Forbidden or directory listing disabled; backup files protected from unauthenticated access

Actual: HTTP 200 OK; backup ZIP file is downloadable without authentication; directory is readable

Evidence: · [console](sessions/backup-artifact-andlist/curl -s -o /dev/null -w '%{http_code}' http)

Notes: Backup files contain database dump with password hashes and site structure — exposure is a critical security vulnerability. Missing .htaccess protection or index.php gate.

2. [CRITICAL] Notification email option name mismatch prevents email delivery entirely

  • Area: Schedule Configuration — email notification delivery
  • Persona affected: admin
  • Confidence: 1
  • Session: schedule-feature-cluster

Steps to reproduce:

  1. Enable scheduled backups with a valid email address
  2. Set frequency to Daily and a specific time
  3. Save the schedule
  4. Trigger the scheduled backup cron manually (or wait for scheduled time)
  5. Check mail log or email client for notification email

Expected: Backup completion notification email is sent to the configured email address

Actual: No email is sent. Root cause: form saves to option 'magellan_backups_email' (with 's'), but wp_mail code reads from 'magellan_backup_email' (without 's'). The mismatch means code always reads empty/missing option and never sends email.

Evidence: [console](sessions/schedule-feature-cluster/Source inspection)

Notes: This is a critical defect. The entire email notification feature is non-functional due to option-name typo. Admins who enable scheduled backups with email notification are silently not receiving emails.

3. [CRITICAL] Users export includes password hashes in plaintext SQL — credential leakage vulnerability

  • Area: Selective Export — Users data dump
  • Persona affected: admin
  • Confidence: 1
  • Session: selective-export-cluster

Steps to reproduce:

  1. Log in as admin
  2. Navigate to Tools > Backups > Selective Export tab
  3. Check the 'Users' checkbox
  4. Click 'Export Selected'
  5. Download the generated SQL file (e.g., export-2026-04-29-1344.sql)
  6. Open the file in a text editor
  7. Inspect the INSERT INTO wp_users statement

Expected: Users export should NOT include user_pass column or password hashes. The export should either omit the user_pass column entirely or include only the user_login and other non-sensitive columns.

Actual: Users export includes all columns from wp_users table, including the user_pass column with bcrypt-hashed passwords (e.g., '$wp$2y$10$7EwJs4UeeKxDynZK79CUWeF4sEoSeGZOmQYpWbBeRZS5/h0/SzBca'). The export file can be downloaded and shared, exposing password hashes that could be brute-forced offline.

Evidence: [console](sessions/selective-export-cluster/Network requests show successful POST to admin-ajax.php with action=mb_run_export returning 200 OK; SQL file generated and served for download at /wp-content/magellan-backups/export-2026-04-29-1344.sql)

Notes: This is a direct credential leakage vulnerability. While WordPress salts password hashes, offline brute-force attacks are feasible against weak passwords. An admin exporting users to share with a third party or for a migration tool would inadvertently share all password hashes with that party. The charter explicitly flags this as a critical security risk per the mission statement: 'Users export including user_pass columns (sensitive leakage)'. Code pattern: class-mb-export.php line 28 calls export_table($wpdb->users) with no column filtering; export_table() line 52 uses SELECT * and line 59 inserts all values.

4. [CRITICAL] ZIP-slip path traversal vulnerability in restore function

  • Area: restore-function-security
  • Persona affected: admin
  • Confidence: 0.95
  • Session: supplementary-gaps

Steps to reproduce:

  1. Create a ZIP file containing an entry with path like 'wp-content/../../../etc/passwd' or 'wp-content/plugins/../../../../../../wp-config.php'
  2. Upload via Upload & Restore interface
  3. Verify the file is extracted outside wp-content directory

Expected: Plugin rejects the ZIP or restricts extraction strictly to wp-content/ subdirectories

Actual: The strpos($name, 'wp-content/') === 0 filter passes entries like 'wp-content/../../../etc/passwd'. Upon extraction, substr removes 'wp-content/' prefix, leaving '/../../../etc/passwd', which PHP resolves to absolute path outside intended directory.

Evidence: [console](sessions/supplementary-gaps/PHP test confirmed 'wp-content/../../../etc/passwd' passes the filter (strpos returns 0); would extract outside wp-content with substr operation)

Notes: This is a textbook zip-slip vulnerability. The fix requires canonicalizing paths before comparing or using basename/dirname filtering instead of string prefix matching.

5. [MAJOR] Backup ZIP omits wp-content/uploads/ despite 'Full Backup' label claim

  • Area: Backup artifact completeness
  • Persona affected: admin
  • Confidence: 1
  • Session: backup-artifact-andlist

Steps to reproduce:

  1. Create a full backup
  2. Unzip the backup file and inspect directory structure: unzip -l backup-*.zip | grep -i uploads

Expected: wp-content/uploads/ directory present in backup ZIP (as promised by 'Full Backup' label)

Actual: wp-content/uploads/ is absent from ZIP; backup includes database, themes, plugins but NOT uploads

Evidence: [console](sessions/backup-artifact-andlist/unzip -l backup-2026-04-29-1343.zip | grep -i uploads returned no matches)

Notes: Uploads directory is typically the largest and most user-critical data (media files, images). Omission violates completeness claim and leaves users with incomplete recovery data.

6. [MAJOR] Progress bar element hardcoded to 100% width; shows no dynamic progress feedback

  • Area: Backup creation UI feedback
  • Persona affected: admin
  • Confidence: 1
  • Session: backup-artifact-andlist

Steps to reproduce:

  1. Navigate to Backup & Restore tab
  2. Inspect the progress bar HTML: mb-progress-fill element
  3. Click 'Create Full Backup' and observe progress bar appearance

Expected: Progress bar starts at 0%, increments to 100% as backup progresses; user sees live feedback

Actual: Progress bar element has inline style='width: 100%;' hardcoded; does not update during backup operation (visible in screenshot immediately after button click)

Evidence: · [console](sessions/backup-artifact-andlist/page.locator('.mb-progress-fill').getAttribute('style') returned 'width)

Notes: Per the progress-indicator oracle (recon S4): hardcoded terminal state without dynamic update is a major UI defect. Admin receives no feedback on backup completion status and may click multiple times, triggering duplicate backups.

7. [MAJOR] Time selector save-roundtrip bug: 24-hour input stored as 12-hour format, re-renders incorrectly on reload

  • Area: Schedule Configuration — time selector
  • Persona affected: admin
  • Confidence: 1
  • Session: schedule-feature-cluster

Steps to reproduce:

  1. Navigate to Schedule tab
  2. Enable scheduled backups
  3. Select a non-default time from dropdown (e.g., 13:00)
  4. Click Save Schedule
  5. Hard reload the page
  6. Observe time selector value

Expected: Time selector displays the previously selected value (13:00)

Actual: After reload, time selector defaults to 00:00 (no matching option for stored '1:00 PM' / '12:00 AM' format). Selected time is lost; form snaps to default.

Evidence: · console

Notes: Root cause: form dropdown uses 24h format (00:00–23:00) but storage uses 12h format (1:00 PM / 12:00 AM). The mismatch causes the form to fail to pre-populate the dropdown with the stored value. Admin's scheduled backup time reverts to midnight (00:00) on every page load.

8. [MAJOR] Dependent form fields remain editable when Enable toggle is OFF

  • Area: Schedule Configuration — form field state gating
  • Persona affected: admin
  • Confidence: 1
  • Session: schedule-feature-cluster

Steps to reproduce:

  1. Navigate to Schedule tab with Enable toggle already ON (from prior save)
  2. Reload page
  3. Uncheck the 'Enable Scheduled Backups' toggle
  4. Attempt to interact with Frequency, Time, and Email fields (should be disabled)
  5. Verify whether fields are still editable

Expected: When Enable is OFF, dependent fields (Frequency, Time, Email) should be disabled and visually grayed out. User should not be able to modify them.

Actual: Dependent fields remain enabled and editable even when Enable toggle is OFF. User can populate Frequency, Time, Email with toggle OFF and save. On re-enable, previously entered dependent values persist.

Evidence: console

Notes: This is a logic gap (STEP 8.9 mandatory probe). Toggle-state does not gate field state. Allows creation of partially-valid configuration when scheduling is disabled, which may lead to confusion when re-enabling.

9. [MAJOR] Restore does not preserve post-backup content; overwrites entire database and wp-content (b-default-scope)

  • Area: Backup Restore & Delete — blast radius
  • Persona affected: admin
  • Confidence: 0.95
  • Session: restore-destructive-andlist

Steps to reproduce:

    1. Create a backup (Backup A)
    1. Create a new post titled 'Post After Backup'
    1. Verify the post is in the site: wp post list
    1. Click Restore to restore Backup A
    1. Confirm restore in the dialog
    1. Check post list again

Expected: Backup A contains the state at the time it was created. Restoring it should restore that state, overwriting any content created after the backup.

Actual: Post created after backup (ID 7, 'Post After Backup') is successfully removed by restore. Restore fully replaces database and wp-content. Any content created after the backup is permanently destroyed. This is the expected behavior but must be explicitly confirmed and clearly warned to the user.

Evidence: console

Notes: Expected behavior but critical for user awareness. This is why b1 (confirmation dialog) is important. The confirmation dialog does mention 'This will overwrite your current site', which partially mitigates this risk, but no detailed preview is provided.

10. [MAJOR] Empty notification email field accepted without validation

  • Area: Schedule Configuration — notification email field
  • Persona affected: admin
  • Confidence: 0.95
  • Session: schedule-feature-cluster

Steps to reproduce:

  1. Navigate to Schedule tab
  2. Enable scheduled backups
  3. Clear the notification email field (empty string)
  4. Click Save Schedule
  5. Verify that form accepted the empty value

Expected: Form should either show a validation error ('Email is required') or at minimum warn the admin that no email will be sent

Actual: Form silently accepts empty email and saves with no error message or warning. No email option is created in the database.

Evidence: console

Notes: This is a UX Problem (STEP 8.9 mandatory probe). Required field should not silently accept empty input. Admin may not realize backup notifications are misconfigured.

Needs human review (confidence < 0.7)

None.

Questions raised

  • [concurrent-trigger-seam (manual backup × cron)] When manual 'Create Full Backup' button and cron event 'mb_scheduled_backup' fire within the same calendar minute (both targeting backup-YYYY-MM-DD-HHmm.zip filename), does the plugin implement a lock, mutex, or collision-detection mechanism to prevent race condition overwrites?
    • Why it matters: If two concurrent writes to the same file occur without synchronization, the resulting backup archive could be truncated, corrupt, or overwritten mid-operation. This would violate the backup plugin's primary reliability guarantee — that backups are valid, intact, and recoverable.
  • [Backup Restore & Delete — safety mechanisms] Recon S1 claimed Delete has no confirmation dialog. Does empirical probe confirm or refute this?
    • Why it matters: Recon is a map, not a hunter. If recon's initial claim about Delete was wrong, subsequent AND-list verdicts may also need re-evaluation. Clarifying this tells us whether recon's other observations (e.g., on Restore UI) are reliable.
  • [Schedule Configuration — weekly frequency] Does the weekly cron schedule fire on a user-configurable day of the week, or on a hardcoded day?
    • Why it matters: Recon noted no day-of-week selector UI. If weekly backups fire on Monday only (regardless of user preference), the feature is less useful and may not match user expectations.
  • [selective-export-import] Does export × import round-trip handle duplicate primary keys gracefully?
    • Why it matters: Source shows export generates plain INSERT statements without DELETE. If the same posts are re-imported, MySQL would reject on duplicate key. Understanding whether the plugin silently skips failed INSERTs or fails catastrophically affects data integrity.

Suggested improvements

  • [Backup artifact security] Add .htaccess protection to wp-content/magellan-backups/ directory (effort: low) (impact: high)
    • Rationale: Deny unauthenticated access via .htaccess with 'Deny from all' or similar directive to prevent unauthenticated backup downloads
  • [Backup artifact naming] Use second-precision or random suffix in backup filenames to prevent collision (effort: low) (impact: medium)
    • Rationale: Change naming pattern from backup-YYYY-MM-DD-HHmm.zip to backup-YYYY-MM-DD-HHmmss-RANDOM.zip to avoid filename collision when multiple backups trigger in same minute
  • [Backup security] Exclude wp_users password hashes from backup SQL dumps (effort: medium) (impact: medium)
    • Rationale: Modify mysqldump or SQL export to redact user_pass column or omit table entirely to prevent password hash exposure in backups
  • [Backup lifecycle] Implement retention policy: keep last N backups, auto-delete older files (effort: medium) (impact: high)
    • Rationale: Add configurable retention setting (e.g., keep_last_n_backups = 5) and delete older files on backup creation or cron to prevent unbounded disk usage
  • [Backup default blast radius] Do NOT register backup cron on activation; wait for explicit user enable (effort: low) (impact: medium)
    • Rationale: Move cron registration to settings save handler; only activate when 'Enable scheduled backups' checkbox is checked to prevent unexpected disk usage
  • [Backup completeness] Include wp-content/uploads/ in 'Full Backup' ZIP (or rename to 'Partial Backup') (effort: medium) (impact: high)
    • Rationale: Either add uploads directory to full backup or change UI label to accurately reflect contents; uploads is the most user-critical data
  • [Backup creation UI] Implement real-time progress feedback for backup creation (effort: medium) (impact: medium)
    • Rationale: Use AJAX to update progress bar width dynamically from 0% to 100% as backup progresses; show item counts and current operation for operator feedback
  • [Selective export security] Exclude or redact sensitive wp_usermeta (session tokens) from selective exports (effort: medium) (impact: medium)
    • Rationale: When exporting Users, exclude wp_usermeta entries for session_tokens and redact IP/user-agent data to prevent session hijacking attacks
  • [Backup Restore & Delete — safety mechanisms] Add pre-restore automatic snapshot (effort: medium) (impact: high)
    • Rationale: Before executing restore, automatically create a backup of the current site state. This provides a 1-click rollback path if the admin realizes they restored the wrong backup or the backup is corrupt.
  • [Backup Restore & Delete — safety mechanisms] Add preview/diff view for Restore (effort: high) (impact: medium)
    • Rationale: Show a summary before restore: backup date, post count, total file size, last modified timestamp. Let admin see what they're about to overwrite.
  • [Backup Restore & Delete — safety mechanisms] Wrap Restore in a transaction (effort: medium) (impact: high)
    • Rationale: Use database transactions (BEGIN TRANSACTION ... COMMIT) around the SQL import. If file restoration fails, rollback the database changes to avoid inconsistent state.
  • [Backup Restore & Delete — safety mechanisms] Add checkpointing for large restores (effort: high) (impact: medium)
    • Rationale: Track restore progress (e.g., 'restored 500 of 1000 files'). If interrupted, admin can resume rather than retry from scratch.
  • [Schedule Configuration — form UX] Standardize time storage format. Either: (a) store and display times in 24-hour format (00:00–23:00), or (b) convert between formats cleanly without mismatch. (effort: low) (impact: high)
    • Rationale: Current mismatch (24h dropdown vs 12h storage) breaks roundtrip. Standardization would fix P1 and improve admin confidence that their settings persisted.
  • [Schedule Configuration — form UX] Add form validation: require notification email field when scheduling is enabled. Show client-side validation error ('Email is required') on empty field. (effort: low) (impact: medium)
    • Rationale: Prevents accidental misconfiguration and makes the requirement explicit. Would address P2.
  • [Schedule Configuration — form UX] Disable dependent form fields (Frequency, Time, Email) when Enable toggle is OFF. Gray them out visually and prevent user input. (effort: low) (impact: medium)
    • Rationale: Makes state intent clear and prevents partially-valid configurations. Addresses P3.
  • [Schedule Configuration — weekly frequency] Add day-of-week selector UI when frequency is set to Weekly (effort: medium) (impact: medium)
    • Rationale: Current UI offers no way for admin to choose which day of the week backups run. Feature is less useful without this option.
  • [Selective Export — data redaction policy] Implement a whitelist of allowed options and columns for each export type. Users export should omit the user_pass column or replace hash values with a placeholder. Options export should exclude authentication keys (auth_key, auth_salt, logged_in_key, logged_in_salt, nonce_key, nonce_salt) and other sensitive options (siteurl, admin_email only if configured as private). (effort: medium) (impact: high)
    • Rationale: Selective exports are designed to share subsets of site data. Exporting password hashes and cryptographic keys defeats the purpose and introduces security risks. A curated export that omits sensitive columns is more useful and safer for legitimate use cases (migrations, testing, shared backups).
  • [Selective Export — SQL statement format] Modify the INSERT statements to use INSERT IGNORE or ON DUPLICATE KEY UPDATE to make re-imports idempotent. Current format (INSERT INTO ... VALUES) creates duplicates if the same export is imported twice. (effort: low) (impact: medium)
    • Rationale: Users expect exports to be re-importable without side effects. The current implementation would create duplicate rows on a second import, which corrupts data and wastes resources.
  • [restore-security] Use basename() or safe path joining for ZIP extraction (effort: low) (impact: high)
    • Rationale: Replace strpos filter with basename() to extract only the filename, or use a path canonicalization function to detect and reject traversal patterns. Current filter is vulnerable to patterns like 'wp-content/../../../etc/passwd'.
  • [backup-concurrency] Add transient-based locking to backup operations (effort: low) (impact: high)
    • Rationale: Wrap backup write operations with WP transient locks (set_transient, get_transient) to serialize concurrent backup attempts and prevent file corruption when manual backup and cron backup execute simultaneously.
  • [selective-export-import] Use INSERT ... ON DUPLICATE KEY UPDATE or truncate before import (effort: low) (impact: medium)
    • Rationale: Either generate INSERT ... ON DUPLICATE KEY UPDATE statements in the export, or DELETE matching records before INSERT during re-import to ensure idempotent round-trip behavior.

What works well (praises)

  • [UI/UX navigation] Clear tab-based UI navigation between Backup & Restore, Selective Export, and Schedule
    • Why: Tabs are well-organized and easy to navigate; users can quickly find the feature they need
  • [Backup operation reliability] Successful backup operation and file listing in Existing Backups table with download/delete/restore actions
    • Why: Backup creation completes reliably and new entries appear in the table with clear action buttons
  • [backup file integrity] Backup archive produced by concurrent-firing test passed full integrity verification (unzip -t: no corruption, no truncation detected)
    • Why: Even under race-condition probing, the resulting file was syntactically valid and complete, suggesting either robust collision handling OR fast mutual exclusion. Reliability appears preserved in observed scenario.
  • [Backup Restore & Delete — safety mechanisms] Delete operation has proper nonce protection
    • Why: DELETE links include _wpnonce parameter, preventing CSRF attacks. Malicious sites cannot trick admin into deleting backups.
  • [Backup Restore & Delete — safety mechanisms] Restore operation is properly capability-gated
    • Why: Both Delete and Restore check current_user_can('manage_options'), preventing subscribers and lower-privilege users from accessing these destructive operations.
  • [Backup Restore & Delete — safety mechanisms] Restore confirmation dialog is clear and informative
    • Why: Dialog includes both backup filename and warning: 'This will overwrite your current site.' Informs admin of the operation's destructive nature.
  • [Schedule Configuration — cron event lifecycle] Cron event properly unregistered when scheduling is disabled
    • Why: When admin disables scheduled backups (unchecks Enable toggle), the wp_schedule_event hook is correctly cleared. No ghost cron event persists after disable. Verified via CLI: cron event list shows mb_scheduled_backup present when enabled, absent when disabled.
  • [Selective Export — multi-type export] Multi-checkbox export correctly generates a single SQL file containing all selected content types in separate table sections.
    • Why: Form submission and AJAX handling work as expected. Users can select Posts, Pages, Users, and Options simultaneously and receive a complete export without loss of content or mismatching of types.
  • [Selective Export — empty-state handling] Exporting a content type with zero matching records generates a valid SQL file with a '-- (empty)' comment.
    • Why: The behavior is correct and informative instead of returning a PHP error or silently failing, which provides clear feedback to the user about the state of the export.
  • [Selective Export — individual export types] Posts, Pages, Users, and Options exports individually generate correct SQL files with all matching records.
    • Why: The export_table() function correctly filters posts by post_type and includes all relevant rows, ensuring data completeness for each export type.
  • [upload-validation] File type validation is robust
    • Why: Plugin correctly rejects non-ZIP files and truncated ZIPs via ZipArchive::open() error handling. Fake ZIP files (text renamed to .zip) are properly rejected with appropriate error code (error 19 for non-ZIP, 35 for truncated).
  • [plugin-lifecycle] Plugin lifecycle (activation/deactivation) is clean
    • Why: Cron events are properly registered on activation and cleaned up on deactivation. Backup files are preserved across deactivation/reactivation cycles, which is correct behavior for preserving user data.
  • [restore-resilience] Corrupted ZIP file handling prevents partial restoration
    • Why: Truncated or damaged ZIP files are correctly rejected before any extraction occurs, preventing partial overwrite of site state. ZipArchive error handling is appropriate and prevents cascading corruption.

Coverage gaps

Session Status Turns Flows Notes
backup-artifact-andlist complete 12/12 7/7 All 8 AND-list anchors (a1–a6, a3-selective-export, a-progress) probed. Full backup creation flow executed via browser, artifact contents inspected via CLI unzip. Selective Users export tested and SQL inspected. Access control tested via curl. Progress bar element examined inline. Scale-sensitive c2 fallback: empirical probe deprioritized out of budget; source pattern identified at includes/class-mb-backup.php — ZIP archive creation uses native PHP zip extension without streaming, risking OOM on large datasets.
concurrent-trigger-cross-feature complete 10/10 2/3 Cross-feature interaction probed: manual backup × cron backup → concurrent trigger within same minute produces SINGLE file output, suggesting either (a) collision detection prevents dual writes, (b) locking mechanism present, or (c) click did not initiate second backup. File integrity verified (unzip -t passed). SFDPOT Operations dimension probed via concurrent timing test. Turn budget exhausted after primary CT-1 probe execution.
schedule-feature-cluster complete 7/8 5/6 All mandatory Step 8.9 probes executed (SCH-3: empty-required-fields, SCH-4: toggle-state-leak). SCH-1 cron disable verified. SCH-2 save-roundtrip bug confirmed (24h dropdown vs 12h storage mismatch). SCH-5 email never sends due to option-name bug (reads from magellan_backup_email, saves to magellan_backups_email). SCH-6 recon disposition: no day-of-week selector present; cron frequency stored and fires correctly (daily/weekly).

Invalid / failed session reports

recon

  • No report.json produced

Token usage & cost

Computed from Claude Code transcripts at ~/.claude/projects/<proj-hash>/. Rates from config/pricing.json. Window: 2026-04-29T13:31:55Z2026-04-29T13:58:18Z (with ±10min buffer for dispatch drift).

Estimated total cost for this run: $18.59

Category Cost % of total
Fresh input $0.06 0.3%
Output $2.23 12.0%
Cache-create (5m) $3.85 20.7%
Cache-create (1h) $3.01 16.2%
Cache-read $9.45 50.8%

Manager (main conversation)

Total: $9.10

Model Messages Input Output Cache-5m Cache-1h Cache-read Cost
claude-sonnet-4-6 99 152 58,863 0 466,806 12,620,553 $7.47
claude-opus-4-7 11 21 15,271 0 20,858 2,076,586 $1.63

Subagents (9 invocations)

Total: $9.49

Model Messages Input Output Cache-5m Cache-1h Cache-read Cost
claude-haiku-4-5-20251001 605 37,629 105,911 1,196,247 0 42,842,252 $6.35
claude-sonnet-4-6 41 6,990 28,726 628,519 0 1,125,608 $3.15
Per-subagent breakdown (9 sessions)
Agent ID Type Models Cost
a200403bc17823152 tester claude-haiku-4-5-20251001 $0.71
a2cc4dafae35fb557 general-purpose claude-sonnet-4-6 $1.38
a55bf0bcce07aa0ab tester claude-haiku-4-5-20251001 $0.66
a5da4e18115aee151 tester claude-haiku-4-5-20251001 $0.65
a5f56e757a8e45177 planner-sonnet claude-sonnet-4-6 $1.77
a6c2bc0c8d753d4d8 tester claude-haiku-4-5-20251001 $1.24
a897625f739ad7ff0 tester claude-haiku-4-5-20251001 $0.75
a925ef76f8c69489e tester claude-haiku-4-5-20251001 $0.87
ad8a4031dd4a189d7 tester claude-haiku-4-5-20251001 $1.48

Recommended next steps

  1. Triage Backup Restore & Delete — safety mechanisms first — highest risk score (9)
  2. Address 4 critical problem(s) before release
  3. Follow up on 3 session(s) with incomplete coverage
  4. Investigate 1 session(s) that failed to produce valid reports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment