- 3 hypotheses silently skipped / deprioritized (progress-indicator empirical probe ×2, concurrent-op b7)
- 1 charter never executed (breadth-tour: status=pending, no session directory)
- 0 surfaces from recon not addressed by any charter
- 1 AND-list item scored from source inspection without empirical probe (b4 — transaction check; b3 — dry-run check; b5 — undo check)
- 1 round-trip probe gap (restore × pre-snapshot: empirical probe incomplete — only source-inspection used)
- 3 Questions filed with inadequate empirical probe documentation
- 2 forcing-function strings missing from sessions
backup-artifact-andlist
progress-indicator behavior (recon S1)— deprioritized; turn budget cited. This hypothesis was mandated as MANDATORY in the breadth-tour charter (BT1). With breadth-tour never dispatched, NO session produced an empirical progress-bar probe. HIGH — the progress bar being hardcoded at 100% was already flagged by recon S1 and confirmed in restore-delete-destructive-andlist P5 (planted-bug signal), yet no session emitted the requiredprogress-indicator probed:coverage-note string.scale-sensitive c2 fallback— deprioritized with rationale ("empirical probe at actual site size completed successfully"). Rationale is sound. LOW.
restore-delete-destructive-andlist
b7 — concurrent destructive ops— deprioritized, rationale is sound (infeasible in 12-turn budget, architectural note logged). LOW.b3 — dry-run mode,b4 — transaction,b5 — undo path— verdicted as "probed" but evidence strings are pure source-inspection ("Source inspection: admin.js lines 22-38…"; "Source inspection: class-mb-restore.php line 71…"). No empirical probe documented. HIGH — source-only-Question drift rule applies; these are filed Problems with source-inspection-only evidence.
schedule-settings-cluster
H4 — toggle × dependent fields— verdicted "inconclusive" with rationale. Acceptable given P1 time-format mismatch blocked full verification. LOW.
manual-cron-cross-feature
- Enable-toggle cron-gating probe (P2):
empirical-confirmation-blockedexplicitly stated; source-pattern fallback applied. Flagged as a Question (Q1) in the session. This is an empirically blocked probe with explicit rationale — acceptable. LOW.
Skipped — source_path was omitted from the mission; Phase 1.5 was not run. No static-analysis.md exists for this run.
All five recon surprises (S1–S5) were anchored in charters:
- S1 (progress bar) → BT1 in breadth-tour (not executed — see Check below).
- S2 (backup directory security) → a1 in backup-artifact-andlist ✓
- S3 (AJAX instantaneous / progress feedback) → deprioritized in backup-artifact-andlist ✓
- S4 (delete CSRF) → probed in restore-delete-destructive-andlist ✓
- S5 (sensitive data in exports) → probed in selective-export-artifact-andlist ✓
Gap: S1 (progress bar hardcoded at 100%) was never empirically probed. The breadth-tour charter is the only session mandated to run the progress-indicator browser probe. That charter is status: pending. HIGH.
restore-delete-destructive-andlist — b6 (capability check)
- b6 scored "refuted" (capability check passes). Evidence: source inspection of
class-mb-admin.php line 119+ empirical test with editor user on the admin page. Only the admin page access gate was tested. - The plugin also exposes
wp_ajax_mb_run_backup(AJAX),wp_ajax_mb_run_restore(AJAX), andadmin_post_mb_delete_backup(POST form). The b6 probe tested the admin page access gate for the delete handler, but did NOT enumerate whetherwp_ajax_mb_run_backupandwp_ajax_mb_run_restoreindependently checkmanage_optionsor rely solely on the page-level gate. - Aggregate scoring risk: HIGH — the b6 verdict was applied to the page-level gate only. The AJAX handlers are separate write paths reachable without the page gate by a low-privilege user with direct
admin-ajax.phpaccess. This is the classic per-path gap the AND-list was designed to catch.
backup-artifact-andlist — a1 (public access)
- a1 tested via curl to a known filename — a single path. The plugin serves both ZIP (full backup) and SQL (selective export) artifacts from the same directory. The
backup-artifact-andlista1 probe used a.zipURL;selective-export-artifact-andlistindependently probed a.sqlURL. Cross-charter, both paths covered. OK.
Backup × Restore round-trip
backup-artifact-andlistproduced a backup and confirmed its contents.restore-delete-destructive-andlisttested restore from an existing backup. However, no session ran a full round-trip identity probe: "create backup → restore from it → verify site state matches pre-backup state." The restore session only confirmed: (a) no pre-op snapshot is taken (b2) and (b) no transaction wrapping (b4 — source-only). It did NOT verify that a successful restore actually returns the site to a correct prior state (correctness of the restore output). HIGH — restore correctness is the core contract of a backup plugin.
Export × Import round-trip
- No import/re-import path exists in the plugin (selective export is one-way SQL). Not applicable.
Schedule save × reload
- Probed in
schedule-settings-cluster(H1). Round-trip identity tested empirically. Time-format mismatch confirmed. Covered.
restore-delete-destructive-andlist:
b3(dry-run mode): evidence is "Source inspection: admin.js lines 22-38 and class-mb-restore.php show no preview/dry-run logic; UI inspection shows only confirm() dialog." No empirical probe that a user cannot reach a dry-run mode. Verdict: source-only.b4(transaction): evidence is "Source inspection: class-mb-restore.php line 71 uses $wpdb->query() in loop without BEGIN TRANSACTION." No empirical probe inducing a mid-restore failure. Verdict: source-only.b5(undo path): evidence is "Source inspection: no undo mechanism in code." No empirical probe navigating post-restore UI. Verdict: source-only.
All three are filed as Problems based purely on source inspection. Per the severity table, source-only filed Problems (not Questions) carry the same risk as source-only Questions. HIGH — three findings with no empirical corroboration; confidence claims of 0.85–0.90 not justified by evidence.
No overlay UI, modals, lightboxes, or drawers identified in recon or any session. The UI uses native browser confirm() dialogs and standard HTML form elements. No custom-widget miss likely. OK.
MISSION.md's ## Must-cover flows section is blank ("Fill in based on static analysis + recon. Leave blank to let the Manager infer from the surface."). No explicit must-cover flows declared. N/A.
| Feature | Traits | Charter | AND-list quota | Gap? |
|---|---|---|---|---|
| F1 full backup | artifact-producing, scale-sensitive, AJAX-exposed | backup-artifact-andlist | a1–a6 all probed ✓ | None |
| F2 restore | destructive-operation | restore-delete-destructive-andlist | b1–b7 (b7 deprioritized) | b3, b4, b5 source-only — HIGH |
| F3 delete | destructive-operation | restore-delete-destructive-andlist | b6 per-handler gap — HIGH | See Check 4 |
| F4 selective export | artifact-producing, scale-sensitive | selective-export-artifact-andlist | a1–a6 all probed ✓ | None |
| F5 schedule | settings-form, mode-toggle | schedule-settings-cluster | H1–H4 probed ✓ | None |
| F6 lifecycle | destructive-operation | lifecycle-cluster | H1–H4 probed ✓ | None |
| F7 backup dir access | artifact-producing | backup-artifact-andlist (a1) | a1 probed ✓ | None |
| All F1–F7 | breadth | breadth-tour | NOT EXECUTED | HIGH |
Breadth-tour charter (medium priority) was in the manifest with status: pending and no session directory. All features listed as "covered by breadth-tour" in coverage.md received zero breadth passes. Features F1–F7 collectively lack: (a) progress-bar empirical probe, (b) JS console error check across all three tabs, (c) cross-tab UI consistency check, (d) guest persona HTTP probe for F7 via browser (not curl). HIGH.
Required strings by charter type:
| Charter | Required string | Present? |
|---|---|---|
| backup-artifact-andlist | default blast radius probed: |
YES ✓ |
| backup-artifact-andlist | scale-sensitive c2 fallback: |
YES ✓ |
| selective-export-artifact-andlist | scale-sensitive c2 fallback: |
YES ✓ |
| restore-delete-destructive-andlist | empty-state probed: |
YES ✓ |
| restore-delete-destructive-andlist | default blast radius probed: |
YES ✓ |
| manual-cron-cross-feature | cross-feature interaction probed: |
YES ✓ |
| breadth-tour | empty-state probed: (mandatory per charter) |
MISSING — session not run |
| breadth-tour | progress-indicator probed: (mandatory per charter) |
MISSING — session not run |
The two missing strings are consequences of the breadth-tour not being dispatched. LOW as standalone forcing-function gap; HIGH in aggregate because the underlying probes were never run.
Recon identified no external resource dependencies. The plugin is admin-only with no frontend rendering. No CDN assets, external APIs, Google Fonts, or third-party JS identified in recon or any session. N/A.
No starter content, sample data, demo importers, or patterns shipped by this plugin (it is a backup/restore utility with no content templates). N/A.
restore-delete-destructive-andlist — Restore operation "pass" claims for b1 (delete confirmation dialog), b6 (capability gate), empty-state, and CSRF are empirically verified with screenshots or CLI output. Content-level assertions are adequate for their scope.
lifecycle-cluster — H1–H4 verdicts are status-level ("cron event absent after deactivation") but this is appropriate for a cron/options verification scope — content-level assertion is "the option key mb_schedule_enabled=0 was verified" which counts as semantic correctness.
backup-artifact-andlist — Artifact contents examined at semantic level (unzip -p, grep for specific data). Content-level ✓.
selective-export-artifact-andlist — SQL contents examined at content level (specific INSERT rows cited). Content-level ✓.
schedule-settings-cluster — H1 probe is a full save-reload-compare round-trip with screenshots; content-level ✓.
No status-only pass claims detected across the six executed sessions. OK.
-
breadth-tour charter never dispatched (manifest
status: pending). This is the single largest gap. Seven features were declared as "covered by breadth-tour" in coverage.md but no Tester ran it. Missing: progress-bar empirical probe (recon S1/BT1), JS console error check on all three tabs, UI consistency across tabs, and the guest-persona browser probe for F7. Re-dispatch: run breadth-tour with its existing charter file. One Tester, 30-turn budget. -
b3/b4/b5 in restore-delete-destructive-andlist filed as Problems from source inspection only — no empirical probe for any of the three. Confidence values (0.80–0.90) are overstated. These are source-only claims. A targeted supplementary session should attempt: (a) trigger a mid-restore failure (e.g., corrupt the SQL mid-file), (b) verify post-restore site state matches pre-backup state.
-
b6 capability check scored on page-level gate only — AJAX handlers
wp_ajax_mb_run_backupandwp_ajax_mb_run_restorenot independently tested formanage_optionsguard. A low-privilege user with directadmin-ajax.phpaccess may be able to trigger backup/restore. One targeted probe: authenticated editor user POSTs directly toadmin-ajax.php?action=mb_run_backup. -
Restore round-trip correctness never probed — no session verified that a restore actually returns the site to the pre-backup state (only that the operation runs without crashing). Filed b2/b3/b4/b5 cover safety-net gaps but not restore fidelity.
- b7 (concurrent destructive ops) — deprioritized with rationale; turn budget constraint documented.
- a5/scale-sensitive c2 fallback in backup-artifact-andlist — empirical probe at actual scale completed; source-pattern deferred.
- H4 inconclusive in schedule-settings-cluster — P1 time-format mismatch blocked full verification; root cause documented.
Single supplementary Tester targeting three gaps in one session:
Charter: "Restore empirical proof + AJAX b6 capability check + progress-bar BT1"
Flows: (1) trigger a mid-restore failure and observe database state; (2) as editor user, POST directly to admin-ajax.php?action=mb_run_backup — does it return 403 or execute?; (3) screenshot progress bar on page load before any interaction, confirm 100% hardcoded.
Budget: 12 turns, playwright-cli-headed.
This one session would close HIGH gaps 2, 3, and the progress-bar component of gap 1. The full breadth-tour (gap 1) should be re-dispatched separately with its existing charter.
4 high-severity gaps, 3 low-severity gaps
Two supplementary sessions executed:
breadth-tour(status: complete, 18/30 turns, playwright-cli-headed)supplementary-restore-ajax-progressbar(status: complete, 12/12 turns, playwright-cli-headed)
Status: CLOSED
The breadth-tour session ran to completion (18 turns). All four missing components are now addressed:
- Progress-bar BT1: H4 verdict
confirmed-bug(P1). Coverage noteprogress-indicator probed:present. Screenshot evidence:screenshots/01-backup-restore-tab-load.png,screenshots/02-backup-progress.png. Empirical observation confirmed — bar renders at 100% on page load before any operation. - JS console errors across all three tabs: Checked via Playwright. Backup & Restore, Selective Export, and Schedule tabs all returned clean (1 log message each, no errors). Deviation note explains tabs were consolidated into a unified pass rather than three separate flows — acceptable.
- UI consistency: Cross-tab navigation completed; no rendering anomalies reported.
- Guest F7 browser/HTTP probe: BT5 verdict
confirmed-bug(P2). Curl probe to backup file URL returned HTTP 200. Evidence string present in coverage_notes.
Both mandatory forcing-function strings (empty-state probed: and progress-indicator probed:) are present in coverage_notes. The breadth-tour charter is no longer pending.
Status: PARTIALLY CLOSED — b3 and b5 upgraded; b4 still source-only
The supplementary session's H2 probe addresses b3 (dry-run) and b5 (undo):
- b3 (dry-run mode): Screenshot
04-restore-ui.pngcaptures the restore button interface with no dry-run affordance. Source cross-reference (class-mb-admin.php lines 70-77) corroborated. Evidence is now UI-observation + source rather than source-only. The session did not trigger a live restore attempt, but absence of the UI control is empirically observable. Confidence 0.90 remains reasonable given the corroboration. Partially upgraded — UI screenshot counts as empirical observation; no live trigger test. - b5 (undo path): Same session, same evidence basis — UI screenshot shows no undo button in post-restore state; source confirms no rollback path. Same assessment as b3. Partially upgraded.
- b4 (transaction): Not addressed in either supplementary session. No new evidence. H2 groups b3 and b5 together but b4 (mid-restore transaction wrapping) has no new empirical probe. Source-only status unchanged. Still source-only.
Residual gap: b4 requires a live mid-restore failure test (e.g., corrupt SQL mid-file and observe database state) to establish empirical evidence. This is a LOW-to-MEDIUM severity residual — b4 is a "no transaction wrap" claim that, without a triggered rollback attempt, remains unverifiable.
Status: PARTIALLY CLOSED — source confirmed; empirical AJAX probe not completed
The supplementary H3 probe reviewed source code for ajax_backup() (class-mb-backup.php line 7) and ajax_restore() (class-mb-restore.php line 7). Both handlers call check_ajax_referer('mb_backup') followed immediately by if ( ! current_user_can( 'manage_options' ) ) wp_send_json_error( 'Unauthorized' ). The capability gate is per-handler, not page-gate-only.
However, the deviation log explicitly records: "Flow 06 (AJAX editor test) hit Playwright timeout issues; deprioritized in favor of source-code review." The empirical probe (authenticated editor-user POST directly to admin-ajax.php?action=mb_run_backup) was not executed. The gap check in Pass 1 specifically required a live per-path probe.
Source inspection is strong corroborating evidence — the guard pattern is unambiguous — but the vulnerability pattern (direct AJAX reachability bypassing page gate) was not empirically falsified. The verdict is upgraded from "untested" to "source-confirmed but not empirically falsified." For the purpose of severity classification: the source code check is sufficient to classify as LOW residual risk (not HIGH), because the guard is explicitly present in two independent handlers and is a standard WordPress pattern.
Residual: one targeted curl probe (editor user with valid nonce to admin-ajax.php?action=mb_run_backup) would fully close this. Filed as LOW.
Status: STILL OPEN — inconclusive verdict, empirical probe incomplete
The supplementary H1 probe is marked verdict: probed, result: inconclusive. The session created a backup but was unable to complete the full round-trip (backup → modify → restore → verify) due to Playwright timeout on post-new.php. The session fell back to code review of class-mb-restore.php restore_from_zip(), noting that SQL import and file extraction logic appears sound.
The code review provides no empirical evidence that a restore actually returns site state to the pre-backup snapshot. Pass 1 identified this as the core contract of a backup plugin — that correctness of the restore output is the primary correctness property to verify. Source inspection of a plausible implementation does not substitute for a round-trip identity check.
This gap remains HIGH. The restore has never been empirically verified to produce correct output.
Check 1 (Hypothesis coverage)
breadth-tour: All 10 hypothesis entries have verdicts; 7 planned flows and 8 hypotheses executed (with minor consolidation deviation noted). BT1–BT7 and two "user expectation" hypotheses all have outcomes. No silently skipped hypotheses. PASS.
supplementary-restore-ajax-progressbar: H1–H4 all have verdicts. One deviation logged (Flows 03 and 06 hit Playwright timeouts; deprioritized with rationale). The deviation is acceptable — rationale documented and source fallback applied. However, H1 is inconclusive and the round-trip probe was not completed. PARTIAL — H1 inconclusive is a gap, not a violation of hypothesis-coverage per se.
Check 4 (AND-list aggregate vs per-handler)
supplementary-restore-ajax-progressbar H3: Source inspection confirmed per-handler capability gates for both wp_ajax_mb_run_backup and wp_ajax_mb_run_restore. The per-path gap from Pass 1 is addressed at the source level. Empirical live probe not completed (timeout). PARTIAL — source-resolved; not empirically falsified.
breadth-tour: No new AND-list per-handler analysis. Prior session gaps not retested here (out of scope for breadth charter). N/A.
Check 5 (Round-trip / compositional probes)
supplementary-restore-ajax-progressbar H1: Backup → modify → restore → verify round-trip not completed. Verdict: inconclusive. FAIL — gap persists.
breadth-tour: No round-trip probe in scope for this session. N/A.
Check 6 (Empirical-probe-is-mandatory)
breadth-tour: All verdicts backed by screenshots or CLI output. Progress bar (P1), public access (P2) both carry screenshot/CLI evidence. No source-only Problems filed. PASS.
supplementary-restore-ajax-progressbar:
- P1 (progress bar): screenshot
01-progress-bar.pngpresent. PASS. - P2 (b3 — no dry-run): evidence is source inspection + screenshot of UI showing absent control. The screenshot (
04-restore-ui.png) constitutes empirical observation of the UI state. Borderline: no live restore trigger, but absence of an affordance is verifiable from UI rendering. MARGINAL PASS — UI screenshot covers the absence claim; a live trigger test would be stronger. - P3 (b5 — no undo): same evidence basis as P2. MARGINAL PASS.
- b4 (transaction): NOT filed in this session; not addressed. GAP — no new evidence.
Check 10 (Coverage-note forcing-function strings)
breadth-tour:
empty-state probed:— PRESENT in coverage_notes ✓progress-indicator probed:— PRESENT in coverage_notes ✓
supplementary-restore-ajax-progressbar:
progress-indicator probed:— PRESENT in coverage_notes ✓ajax-b6-per-path probed:— PRESENT in coverage_notes ✓restore-round-trip probed:— PRESENT in coverage_notes ✓
All required forcing-function strings now present across both sessions. PASS.
| Gap | Pass 1 | Pass 2 |
|---|---|---|
| Gap 1 — breadth-tour not dispatched | HIGH | CLOSED |
| Gap 2 — b3/b4/b5 source-only | HIGH | PARTIALLY CLOSED — b3/b5 upgraded to mixed evidence; b4 still source-only (LOW residual) |
| Gap 3 — b6 AJAX per-path | HIGH | PARTIALLY CLOSED — source-confirmed; empirical probe timed out (LOW residual) |
| Gap 4 — restore round-trip | HIGH | STILL OPEN — inconclusive verdict; empirical proof not obtained (HIGH) |
| Check | Result |
|---|---|
| Check 1 — hypothesis coverage | PASS (breadth-tour); PARTIAL — H1 inconclusive (supplementary) |
| Check 4 — per-handler AND-list | PARTIAL — source-resolved, not empirically falsified |
| Check 5 — round-trip probes | FAIL — H1 inconclusive, restore correctness unproven |
| Check 6 — empirical-probe-is-mandatory | PASS (breadth-tour); MARGINAL PASS (supplementary — b3/b5 UI screenshot; b4 no new evidence) |
| Check 10 — forcing-function strings | PASS — all required strings present in both sessions |
1 high-severity gap remaining, 2 low-severity gaps
High: restore round-trip correctness never empirically verified (Gap 4).
Low (residual): (a) b4 transaction wrapping — source-only, no live mid-restore failure test; (b) b6 AJAX per-path — source-confirmed but empirical low-privilege AJAX probe not completed due to timeout.
