-
Total Scenarios: 2
-
Scenario Validation Passed: 2
-
Scenario Validation Failed: 0
-
Scenario Validation Pass Rate: 100.0%
-
Raw LLM Layer Passed: 1/1 (100.0%)
-
Post-Processing Campaign Capture Passed: 1
-
Post-Processing Campaign Capture Failed: 0
-
Post-Processing Campaign Capture Pass Rate: 100.0%
- Status: ✅ PASS
- Campaign ID:
C9nl6HW5QSPIcwwCf4aA
- Status: ✅ PASS
- Git HEAD:
0cc04d6c917260fba2a097a2e9953b263da02661 - Test Timestamp:
2026-05-24T03:02:42.728022+00:00 - Server PID:
59255
| Claim | File | Key Field(s) |
|---|---|---|
| Scenario validation passed: 2/2 | run.json | scenarios[].passed, scenarios[].errors |
| Campaign post-processing capture passed: 1/1 | run.json | campaign_capture_status[*].status |
| Streaming evidence normalized | streaming_evidence.json | summary., scenarios[].chunk_count_observed |
| Bundle artifact inventory | artifacts/collection_log.txt | core_files, jsonl_captures, campaigns_dir |
| MCP request/response captured | request_responses.jsonl | Full request/response pairs |
| Local server HTTP request/response captured | http_request_responses.jsonl | http_request/http_response entries |
| LLM request/response stream captured | llm_request_responses.jsonl | request/response entries (type field) |
| Gemini HTTP transport captured | gemini_http_request_responses.jsonl | http_request/http_response/transport_error entries |
| Server execution log | artifacts/server.log | Raw server output |
| Git provenance | metadata.json | git_provenance.git_head = 0cc04d6c... |
| Scenario | Status | Campaign ID |
|---|---|---|
| dual_flag_level_up_over_cc | ✅ Pass | C9nl6HW5... |
| EVIDENCE_SIGNATURE_GUARD | ✅ Pass | N/A |
-
All files in this bundle have corresponding
.sha256checksum files -
Checksums use local basename paths so per-file verification works from each artifact directory
-
⚠️ Server warnings detected (see artifacts/server.log) -
Warning: CRITICAL_SAFEGUARD
Proves:
- Core logic and scenario validation for test_pr7028_modal_priority
- Scenario execution pass rates (2/2)
Does NOT Prove:
- Production server behavior (tested on local server unless otherwise noted)
- Performance under load (single-request tests)
- Edge cases not covered by scenarios