- Total Scenarios: 3
- Scenario Validation Passed: 2
- Scenario Validation Failed: 1
- Scenario Validation Pass Rate: 66.7%
- Raw LLM Layer Passed: 1/2 (50.0%)
- Raw Layer Pass Rate: 1/2 (50.0%)
- Post-Processing Pass Rate (raw-validated scenarios): 0/1 (0.0%)
Post-processing detected issues (dm_notes, core_memories, state mutations) that
the raw narrative validation missed. See errors in individual scenario files.
- Post-Processing Campaign Capture Passed: 1
- Post-Processing Campaign Capture Failed: 0
- Post-Processing Campaign Capture Pass Rate: 100.0%
- Status: ✅ PASS
- Status: ❌ FAIL
- Campaign ID:
A0b6WSmVa22xyVcs48VL - Errors: ['single_organic_level_up_final: codex leveling review did not pass; output=VERDICT: FAIL\n- Blocking: Paladin spell preparation is a player-selectable level-up decision, but the modal planning choices expose only HP and Fighting Style edits. No
level_up_*choice lets the player edit prepared spells before finish.\n- Blocking: The first modal response’s visibleRecommended package:lists HP, Defense Fighting Style, and prepared spells, but does not clearly distinguish automatic gains from editable selections or visibly account for all automatic Level 2 Paladin gains in the package itself.\n- Passing:level_up_nowopens the modal without committing level 2; final level commit appears only onfinish_level_up_return_to_game.\n- Passing: Recommended HP/Fighting Style selections are prefilled, editable, and free-form Fighting Style edit keeps the modal open and updates the recommendation.\n- Passing: After finish, final state haslevel=2,level_up_pending=false,level_up_in_progress=false, no active level-up choices in the final planning block, and resumes real story choices.\n- No legacylevel_up_signal.level_upboolean found; observedlevel_up_signalentries usecurrent_level+target_level.']
- Status: ✅ PASS
- Git HEAD:
df70b184d6a7c749ecdf1d7605903c44161f267e - Test Timestamp:
2026-05-24T04:57:50.582284+00:00 - Server PID:
66812
| Claim | File | Key Field(s) |
|---|---|---|
| Scenario validation passed: 2/3 | run.json | scenarios[].passed, scenarios[].errors |
| Campaign post-processing capture passed: 1/1 | run.json | campaign_capture_status[*].status |
| Streaming evidence normalized | streaming_evidence.json | summary., scenarios[].chunk_count_observed |
| Bundle artifact inventory | artifacts/collection_log.txt | core_files, jsonl_captures, campaigns_dir |
| MCP request/response captured | request_responses.jsonl | Full request/response pairs |
| Local server HTTP request/response captured | http_request_responses.jsonl | http_request/http_response entries |
| LLM request/response stream captured | llm_request_responses.jsonl | request/response entries (type field) |
| Gemini HTTP transport captured | gemini_http_request_responses.jsonl | http_request/http_response/transport_error entries |
| Server execution log | artifacts/server.log | Raw server output |
| Git provenance | metadata.json | git_provenance.git_head = df70b184... |
| Scenario | Status | Campaign ID |
|---|---|---|
| finish_intent_prompt_and_classifier | ✅ Pass | None |
| single_organic_level_up | ❌ Fail | A0b6WSmV... |
| EVIDENCE_SIGNATURE_GUARD | ✅ Pass | N/A |
-
All files in this bundle have corresponding
.sha256checksum files -
Checksums use local basename paths so per-file verification works from each artifact directory
-
⚠️ Server warnings detected (see artifacts/server.log) -
Warning: ACTION_RESOLUTION_MISSING_FIELDS
Proves:
- Core logic and scenario validation for test_level_up_organic
- Scenario execution pass rates (2/3)
Does NOT Prove:
- Production server behavior (tested on local server unless otherwise noted)
- Performance under load (single-request tests)
- Edge cases not covered by scenarios