Skip to content

Instantly share code, notes, and snippets.

@jleechan2015
Created May 24, 2026 03:02
Show Gist options
  • Select an option

  • Save jleechan2015/210d318c2b778665699d06db2ac51ba7 to your computer and use it in GitHub Desktop.

Select an option

Save jleechan2015/210d318c2b778665699d06db2ac51ba7 to your computer and use it in GitHub Desktop.
PR #7028 /es E2E Real-LLM Evidence Bundle (HEAD 0cc04d6c)

Evidence Summary: test_pr7028_modal_priority

Test Results

  • Total Scenarios: 2

  • Scenario Validation Passed: 2

  • Scenario Validation Failed: 0

  • Scenario Validation Pass Rate: 100.0%

  • Raw LLM Layer Passed: 1/1 (100.0%)

  • Post-Processing Campaign Capture Passed: 1

  • Post-Processing Campaign Capture Failed: 0

  • Post-Processing Campaign Capture Pass Rate: 100.0%

Scenario Results

dual_flag_level_up_over_cc

  • Status: ✅ PASS
  • Campaign ID: C9nl6HW5QSPIcwwCf4aA

EVIDENCE_SIGNATURE_GUARD

  • Status: ✅ PASS

Provenance Chain

  • Git HEAD: 0cc04d6c917260fba2a097a2e9953b263da02661
  • Test Timestamp: 2026-05-24T03:02:42.728022+00:00
  • Server PID: 59255

Claim → Artifact Map

Claim File Key Field(s)
Scenario validation passed: 2/2 run.json scenarios[].passed, scenarios[].errors
Campaign post-processing capture passed: 1/1 run.json campaign_capture_status[*].status
Streaming evidence normalized streaming_evidence.json summary., scenarios[].chunk_count_observed
Bundle artifact inventory artifacts/collection_log.txt core_files, jsonl_captures, campaigns_dir
MCP request/response captured request_responses.jsonl Full request/response pairs
Local server HTTP request/response captured http_request_responses.jsonl http_request/http_response entries
LLM request/response stream captured llm_request_responses.jsonl request/response entries (type field)
Gemini HTTP transport captured gemini_http_request_responses.jsonl http_request/http_response/transport_error entries
Server execution log artifacts/server.log Raw server output
Git provenance metadata.json git_provenance.git_head = 0cc04d6c...

Coverage Matrix

Scenario Status Campaign ID
dual_flag_level_up_over_cc ✅ Pass C9nl6HW5...
EVIDENCE_SIGNATURE_GUARD ✅ Pass N/A

Evidence Integrity

  • All files in this bundle have corresponding .sha256 checksum files

  • Checksums use local basename paths so per-file verification works from each artifact directory

  • ⚠️ Server warnings detected (see artifacts/server.log)

  • Warning: CRITICAL_SAFEGUARD

What This Evidence Proves vs. Does NOT Prove

Proves:

  • Core logic and scenario validation for test_pr7028_modal_priority
  • Scenario execution pass rates (2/2)

Does NOT Prove:

  • Production server behavior (tested on local server unless otherwise noted)
  • Performance under load (single-request tests)
  • Edge cases not covered by scenarios
{
"test_name": "test_pr7028_modal_priority",
"run_id": "test_pr7028_modal_priority-005-20260524T030242",
"iteration": 5,
"bundle_version": "1.2.0",
"timestamp": "2026-05-24T03:02:42.728022+00:00",
"bundle_timestamp": "2026-05-24T03:02:42.728022+00:00",
"evidence_mode": "lightweight_prompt_tracking",
"evidence_mode_notes": "System instruction captured as filenames + char_count (not full text). Raw LLM request/response payloads captured in request_responses.jsonl. Server logs in artifacts/. Bundle file inventory in artifacts/collection_log.txt.",
"git_provenance": {
"git_head": "0cc04d6c917260fba2a097a2e9953b263da02661",
"git_branch": "fix/modal-lock-level-up-priority",
"merge_base": "9ce8c4bb0675c15ebd6b82a6eb2ceb994d69bc4e",
"commits_ahead_of_main": 21,
"diff_stat_vs_main": ".beads/issues.jsonl | 7 +-\n docs/design/pr-designs/pr-7028.md | 56 +++\n mvp_site/agent_prompts.py | 2 +\n mvp_site/agents.py | 362 +++++++++----------\n mvp_site/constants.py | 6 +\n mvp_site/debug_hybrid_system.py | 382 ---------------------\n mvp_site/equipment_display.py | 182 ----------\n mvp_site/game_state.py | 26 +-\n mvp_site/llm_parser.py | 3 +\n mvp_site/llm_response.py | 265 +++++++++++++-\n mvp_site/llm_service.py | 57 ---\n mvp_site/prompts/character_creation_handoff.md | 3 +\n mvp_site/prompts/god_mode_final_contract.md | 7 +\n mvp_site/prompts/level_up_instruction.md | 21 ++\n mvp_site/settings_validation.py | 18 +-\n mvp_site/tests/test_agents.py | 66 ++++\n mvp_site/tests/test_debug_hybrid_system.py | 25 --\n mvp_site/tests/test_equipment_display.py | 220 ------------\n mvp_site/tests/test_firebase_mock_mode.py | 4 +-\n mvp_site/tests/test_game_state.py | 281 ++++++++-------\n mvp_site/tests/test_json_cleanup_safety.py | 259 --------------\n mvp_site/tests/test_world_logic.py | 4 +-\n mvp_site/world_logic.py | 10 +-\n roadmap/README.md | 1 +\n roadmap/zfc-string-parsing-regex-cleanup-plan.md | 82 +++++\n ...test_campaign_creation_universal_cc_real_e2e.py | 10 +\n testing_mcp/test_god_mode_stat_hydration_real.py | 60 ++--\n testing_mcp/test_pr7028_modal_priority.py | 188 ++++++++++\n 28 files changed, 1064 insertions(+), 1543 deletions(-)",
"working_tree_dirty": false,
"working_tree_staged_changes": 0,
"working_tree_unstaged_changes": 0,
"working_tree_changed_files": [],
"working_tree_diff_sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
},
"server": {
"base_url": "http://127.0.0.1:52394",
"hostname": "127.0.0.1",
"mode": "local",
"port": "52394",
"pid": 59255,
"process_cmdline": "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python -m gunicorn mvp_site.main:app --bind 0.0.0.0:52394 --workers 1 --worker-class gthread --threads 4 --timeout 600 --max-requests 1000 --access-logfile - --error-logfile - --log-level info",
"env_vars": {
"WORLDAI_DEV_MODE": "true",
"TESTING": null,
"MOCK_SERVICES_MODE": "false",
"GOOGLE_APPLICATION_CREDENTIALS": "[SET - file:serviceAccountKey.json]",
"WORLDAI_GOOGLE_APPLICATION_CREDENTIALS": "[SET - file:serviceAccountKey.json]",
"FIRESTORE_EMULATOR_HOST": null,
"PORT": "52394",
"FIREBASE_PROJECT_ID": "worldarchitecture-ai",
"GEMINI_API_KEY": "[SET - 39 chars]",
"LLM_REQUEST_RESPONSE_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/llm_request_responses_1779591673071.jsonl",
"HTTP_REQUEST_RESPONSE_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/http_request_responses_1779591673071.jsonl",
"GEMINI_HTTP_REQUEST_RESPONSE_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/gemini_http_request_responses_1779591673071.jsonl",
"MCP_TEST_PROVIDER_HTTP_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/provider_http_request_responses_1779591673071.jsonl"
},
"lsof_output": "COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME\nPython 59255 jleechan 5u IPv4 0x373ef97acbe17144 0t0 TCP *:52394 (LISTEN)\nPython 59268 jleechan 5u IPv4 0x373ef97acbe17144 0t0 TCP *:52394 (LISTEN)",
"ps_output": "PID USER ELAPSED ARGS\n59255 jleechan 01:17 /opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python -m gunicorn mvp_site.main:app --bind 0.0.0.0:52394 --workers 1 --worker-class gthread --threads 4 --timeout 600 --max-requests 1000 --access-logfile - --error-logfile - --log-level info"
},
"provenance": {
"git_fetch_origin_main": {
"returncode": 0,
"stdout": null,
"stderr": "From https://github.com/jleechanorg/worldarchitect.ai\n * branch main -> FETCH_HEAD\nAuto packing the repository in background for optimum performance.\nSee \"git help gc\" for manual housekeeping.\nwarning: The last gc run reported the following. Please correct the root cause\nand remove /Users/jleechan/projects/worldarchitect.ai/.git/worktrees/worktree_babysit_bulk/gc.log\nAutomatic cleanup will not be performed until the file is removed.\n\nwarning: There are too many unreachable loose objects; run 'git prune' to remove them."
},
"git_head": "0cc04d6c917260fba2a097a2e9953b263da02661",
"git_branch": "fix/modal-lock-level-up-priority",
"merge_base": "9ce8c4bb0675c15ebd6b82a6eb2ceb994d69bc4e",
"commits_ahead_of_main": 21,
"diff_stat_vs_main": ".beads/issues.jsonl | 7 +-\n docs/design/pr-designs/pr-7028.md | 56 +++\n mvp_site/agent_prompts.py | 2 +\n mvp_site/agents.py | 362 +++++++++----------\n mvp_site/constants.py | 6 +\n mvp_site/debug_hybrid_system.py | 382 ---------------------\n mvp_site/equipment_display.py | 182 ----------\n mvp_site/game_state.py | 26 +-\n mvp_site/llm_parser.py | 3 +\n mvp_site/llm_response.py | 265 +++++++++++++-\n mvp_site/llm_service.py | 57 ---\n mvp_site/prompts/character_creation_handoff.md | 3 +\n mvp_site/prompts/god_mode_final_contract.md | 7 +\n mvp_site/prompts/level_up_instruction.md | 21 ++\n mvp_site/settings_validation.py | 18 +-\n mvp_site/tests/test_agents.py | 66 ++++\n mvp_site/tests/test_debug_hybrid_system.py | 25 --\n mvp_site/tests/test_equipment_display.py | 220 ------------\n mvp_site/tests/test_firebase_mock_mode.py | 4 +-\n mvp_site/tests/test_game_state.py | 281 ++++++++-------\n mvp_site/tests/test_json_cleanup_safety.py | 259 --------------\n mvp_site/tests/test_world_logic.py | 4 +-\n mvp_site/world_logic.py | 10 +-\n roadmap/README.md | 1 +\n roadmap/zfc-string-parsing-regex-cleanup-plan.md | 82 +++++\n ...test_campaign_creation_universal_cc_real_e2e.py | 10 +\n testing_mcp/test_god_mode_stat_hydration_real.py | 60 ++--\n testing_mcp/test_pr7028_modal_priority.py | 188 ++++++++++\n 28 files changed, 1064 insertions(+), 1543 deletions(-)",
"working_tree_staged_changes": 0,
"working_tree_unstaged_changes": 0,
"working_tree_untracked_files": 0,
"working_tree_changed_files": [],
"working_tree_diff_sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"working_tree_dirty": false,
"server": {
"base_url": "http://127.0.0.1:52394",
"hostname": "127.0.0.1",
"mode": "local",
"port": "52394",
"pid": 59255,
"process_cmdline": "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python -m gunicorn mvp_site.main:app --bind 0.0.0.0:52394 --workers 1 --worker-class gthread --threads 4 --timeout 600 --max-requests 1000 --access-logfile - --error-logfile - --log-level info",
"env_vars": {
"WORLDAI_DEV_MODE": "true",
"TESTING": null,
"MOCK_SERVICES_MODE": "false",
"GOOGLE_APPLICATION_CREDENTIALS": "[SET - file:serviceAccountKey.json]",
"WORLDAI_GOOGLE_APPLICATION_CREDENTIALS": "[SET - file:serviceAccountKey.json]",
"FIRESTORE_EMULATOR_HOST": null,
"PORT": "52394",
"FIREBASE_PROJECT_ID": "worldarchitecture-ai",
"GEMINI_API_KEY": "[SET - 39 chars]",
"LLM_REQUEST_RESPONSE_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/llm_request_responses_1779591673071.jsonl",
"HTTP_REQUEST_RESPONSE_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/http_request_responses_1779591673071.jsonl",
"GEMINI_HTTP_REQUEST_RESPONSE_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/gemini_http_request_responses_1779591673071.jsonl",
"MCP_TEST_PROVIDER_HTTP_CAPTURE_PATH": "/tmp/worldarchitect.ai/fix_modal-lock-level-up-priority/test_pr7028_modal_priority/iteration_005/provider_http_request_responses_1779591673071.jsonl"
},
"lsof_output": "COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME\nPython 59255 jleechan 5u IPv4 0x373ef97acbe17144 0t0 TCP *:52394 (LISTEN)\nPython 59268 jleechan 5u IPv4 0x373ef97acbe17144 0t0 TCP *:52394 (LISTEN)",
"ps_output": "PID USER ELAPSED ARGS\n59255 jleechan 01:17 /opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python -m gunicorn mvp_site.main:app --bind 0.0.0.0:52394 --workers 1 --worker-class gthread --threads 4 --timeout 600 --max-requests 1000 --access-logfile - --error-logfile - --log-level info"
},
"timestamp": "2026-05-24T03:02:42.423098+00:00",
"test_file": "/Users/jleechan/projects/worktree_babysit_bulk/testing_mcp/test_pr7028_modal_priority.py"
},
"summary": {
"total_scenarios": 2,
"passed": 2,
"failed": 0,
"campaign_capture_total": 1,
"campaign_capture_passed": 1,
"campaign_capture_failed": 0,
"raw_passed": 1,
"raw_total": 1,
"raw_pass_rate": "100.0%"
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment