jleechan2015/README.md

Created May 20, 2026 20:20

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/jleechan2015/027f63bed7e826003638ca1aad06963d.js"></script>
Save jleechan2015/027f63bed7e826003638ca1aad06963d to your computer and use it in GitHub Desktop.

Download ZIP

PR #6958 level_up_entry_offer evidence iteration_014 — 3/3 PASS @ 91174d019b0bb2c677065042530f0967f1cf183c

Raw

README.md

Evidence Package: level_up_entry_offer_pr6958

Package Manifest

Test Name: level_up_entry_offer_pr6958
Run ID: level_up_entry_offer_pr6958-014-20260520T201714
Iteration: 14
Bundle Version: 1.2.0
Collected At (UTC): 2026-05-20T20:17:14.029698+00:00
Repository: worldarchitect.ai
Branch: fix/6926-review-comments
Commit: 91174d019b0bb2c677065042530f0967f1cf183c
Merge Base: f457ae58ab501c948aab8e9ff110c54899836f20
Commits Ahead of Main: 70

Git Provenance

.beads/issues.jsonl                                |   9 +
 .github/workflows/design-doc-gate.yml              |   3 +-
 docs/design/pr-designs/pr-6958.html                | 311 ++++++++++
 docs/design/pr-designs/pr-6958.md                  | 104 ++++
 mvp_site/agents.py                                 |  29 +-
 mvp_site/llm_parser.py                             |  41 ++
 mvp_site/llm_providers/gemini_provider.py          |   9 +-
 mvp_site/prompts/level_up_instruction.md           |  34 +-
 mvp_site/prompts/planning_protocol.md              |  27 +-
 mvp_site/prompts/rewards_system_instruction.md     |  26 +-
 mvp_site/rewards_engine.py                         | 630 +++++++++++----------
 mvp_site/schemas/prompt_tool_contracts.json        |   4 +-
 mvp_site/tests/data/modal_routing_fixtures.json    |   3 +-
 mvp_site/tests/test_agents.py                      |  47 +-
 mvp_site/tests/test_canonicalize_invariants.py     |  31 +-
 mvp_site/tests/test_freeze_time_choices.py         |  67 ++-
 mvp_site/tests/test_prompts.py                     |  23 +
 mvp_site/tests/test_rewards_engine.py              | 545 ++++++++++++++++--
 mvp_site/tests/test_rewards_engine_wiring.py       |  39 +-
 mvp_site/tests/test_streaming_orchestrator.py      | 297 +++++++++-
 .../tests/test_testing_utils_centralization.py     | 129 +++--
 mvp_site/tests/test_world_logic.py                 | 152 ++++-
 mvp_site/world_logic.py                            |  98 +++-
 roadmap/README.md                                  |   1 +
 .../nextsteps-2026-05-19-pr6958-review-fixes.md    |  94 +++
 testing_mcp/lib/server_utils.py                    |   9 +-
 testing_mcp/test_level_up_entry_offer_pr6958.py    | 386 +++++++++++++
 .../test_level_up_rewards_planning_atomicity.py    |  64 ++-
 ..._level_up_rewards_planning_atomicity_browser.py |  51 +-
 29 files changed, 2674 insertions(+), 589 deletions(-)

Server Runtime

Port: 58917
PID: 73637
Command: /opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python -m gunicorn mvp_site.main:app --bind 0.0.0.0:58917 --workers 1 --worker-class gthread --threads 4 --timeout 600 --max-requests 1000 --access-logfile - --error-logfile - --log-level info

Environment Variables

WORLDAI_DEV_MODE: true
TESTING: None
MOCK_SERVICES_MODE: false
GOOGLE_APPLICATION_CREDENTIALS: [SET - file:serviceAccountKey.json]
WORLDAI_GOOGLE_APPLICATION_CREDENTIALS: [SET - file:serviceAccountKey.json]
FIRESTORE_EMULATOR_HOST: None
PORT: 58917
FIREBASE_PROJECT_ID: worldarchitecture-ai
GEMINI_API_KEY: [SET - 39 chars]
LLM_REQUEST_RESPONSE_CAPTURE_PATH: /tmp/worldarchitect.ai/fix_6926-review-comments/level_up_entry_offer_pr6958/iteration_014/llm_request_responses_1779308040617.jsonl
HTTP_REQUEST_RESPONSE_CAPTURE_PATH: /tmp/worldarchitect.ai/fix_6926-review-comments/level_up_entry_offer_pr6958/iteration_014/http_request_responses_1779308040617.jsonl
GEMINI_HTTP_REQUEST_RESPONSE_CAPTURE_PATH: /tmp/worldarchitect.ai/fix_6926-review-comments/level_up_entry_offer_pr6958/iteration_014/gemini_http_request_responses_1779308040617.jsonl
MCP_TEST_PROVIDER_HTTP_CAPTURE_PATH: /tmp/worldarchitect.ai/fix_6926-review-comments/level_up_entry_offer_pr6958/iteration_014/provider_http_request_responses_1779308040617.jsonl

Files in This Bundle

README.md - This manifest
methodology.md - Testing methodology
evidence.md - Evidence summary with Claim→Artifact Map and Coverage Matrix
notes.md - Additional context, TODOs, follow-ups
metadata.json - Machine-readable metadata
assertions.json - Strict before/after parity assertions (if present)
run.json - Test results
- streaming_evidence.json - Normalized streaming evidence summary
- request_responses.jsonl - Raw MCP request/response payloads (if present)
- llm_request_responses.jsonl - Raw LLM request/response payloads (if present)
- http_request_responses.jsonl - Raw local-server HTTP request/response payloads (if present)
- gemini_http_request_responses.jsonl - Raw Gemini transport HTTP traces (if present)
- artifacts/ - Additional evidence files

Raw

evidence.md

Evidence Summary: level_up_entry_offer_pr6958

Test Results

Total Scenarios: 3
Scenario Validation Passed: 3
Scenario Validation Failed: 0
Scenario Validation Pass Rate: 100.0%
Raw LLM Layer Passed: 2/2 (100.0%)

⚠️ Multi-Campaign Isolation Note

This evidence bundle contains 2 campaigns:

0 shared campaign(s) reused across multiple tests
2 independent campaign(s) each used by one test only

Why: Each test uses its own campaign to prevent state bleed

Claim Scoping: Each scenario result below includes its campaign_id. Claims about specific scenarios reference ONLY that scenario's campaign. Aggregate claims (e.g., "18/18 passed") span all campaigns but each individual result is traceable to its campaign.

Post-Processing Campaign Capture Passed: 2
Post-Processing Campaign Capture Failed: 0
Post-Processing Campaign Capture Pass Rate: 100.0%

Scenario Results

entry_offer_level_up_now_only

Status: ✅ PASS
Campaign ID: v20ZBNHIYFTZh0P0Kaym

modal_mechanic_plus_finish_freeze_time

Status: ✅ PASS
Campaign ID: l0UR1tf0k3qEiBxyX9QY

EVIDENCE_SIGNATURE_GUARD

Status: ✅ PASS

Provenance Chain

Git HEAD: 91174d019b0bb2c677065042530f0967f1cf183c
Test Timestamp: 2026-05-20T20:17:14.029698+00:00
Server PID: 73637

Claim → Artifact Map

Claim	File	Key Field(s)
Scenario validation passed: 3/3	run.json	scenarios[].passed, scenarios[].errors
Campaign post-processing capture passed: 2/2	run.json	campaign_capture_status[*].status
Streaming evidence normalized	streaming_evidence.json	summary., scenarios[].chunk_count_observed
Bundle artifact inventory	artifacts/collection_log.txt	core_files, jsonl_captures, campaigns_dir
MCP request/response captured	request_responses.jsonl	Full request/response pairs
Local server HTTP request/response captured	http_request_responses.jsonl	http_request/http_response entries
LLM request/response stream captured	llm_request_responses.jsonl	request/response entries (type field)
Gemini HTTP transport captured	gemini_http_request_responses.jsonl	http_request/http_response/transport_error entries
Server execution log	artifacts/server.log	Raw server output
Git provenance	metadata.json	git_provenance.git_head = `91174d01...`

Coverage Matrix

Scenario	Status	Campaign ID
entry_offer_level_up_now_only	✅ Pass	`v20ZBNHI...`
modal_mechanic_plus_finish_freeze_time	✅ Pass	`l0UR1tf0...`
EVIDENCE_SIGNATURE_GUARD	✅ Pass	N/A

Evidence Integrity

All files in this bundle have corresponding .sha256 checksum files
Checksums use local basename paths so per-file verification works from each artifact directory
⚠️ Server warnings detected (see artifacts/server.log)
Warning: ACTION_RESOLUTION_MISSING_FIELDS
Warning: INVENTORY_SAFEGUARD

What This Evidence Proves vs. Does NOT Prove

Proves:

Core logic and scenario validation for level_up_entry_offer_pr6958
Scenario execution pass rates (3/3)

Does NOT Prove:

Production server behavior (tested on local server unless otherwise noted)
Performance under load (single-request tests)
Edge cases not covered by scenarios

Raw

methodology.md

Methodology: level_up_entry_offer_pr6958

Test Type

Real API test against MCP server (not mock mode).

Test Mode

TESTING env var: None
MOCK_SERVICES_MODE env var: false
Mode: Real API calls via MCP HTTP JSON-RPC

Execution Environment

Server running at port 58917
Process: /opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python -m gunicorn mvp_site.main:app --bind 0.0.0.0:58917 --workers 1 --worker-class gthread --threads 4 --timeout 600 --max-requests 1000 --access-logfile - --error-logfile - --log-level info

Test Isolation Design

Multi-campaign architecture is BY DESIGN for test isolation.

Total Campaigns: 2
Shared Campaigns: 0 (used by multiple scenarios)
Independent Campaigns: 2 (single-scenario campaigns)
Isolated Tests: 0 (explicit isolated: True scenarios)
Rationale: Each test uses its own campaign to prevent state bleed

No scenarios in this run were marked isolated: True; campaign usage still follows multi-campaign separation to avoid state bleed. Campaign separation in this run still prevents state bleed across scenarios that use different campaign IDs.

Evidence Capture

Git provenance captured at test start
Raw request/response payloads captured for each MCP call
Server runtime info captured via lsof/ps
Streaming evidence normalized into streaming_evidence.json
Raw local-server HTTP request/response payloads captured in http_request_responses.jsonl
Raw LLM request/response payloads captured in llm_request_responses.jsonl
Raw Gemini HTTP transport payloads captured in gemini_http_request_responses.jsonl
Raw LLM response text captured in server.log (artifacts/server.log)

Evidence Mode

System instruction capture: filenames + char_count (lightweight). Raw LLM request/response payloads captured in request_responses.jsonl when raw payload capture is enabled.

Validation Criteria

Test scenarios validate that:

MCP server processes actions correctly
State updates are returned as expected
Server processes all requests successfully (validation warnings may be logged but requests succeed)

Note: Server warnings (e.g., validation, entity tracking) may appear in logs. Check artifacts/server.log for full server output.

Warning parser for notes: counts each log line matching \bWARNING\b|SYSTEM WARNING: once.

Raw

notes.md

Notes: level_up_entry_offer_pr6958

Run Information

Run ID: level_up_entry_offer_pr6958-014-20260520T201714
Iteration: 14
Bundle Version: 1.2.0
Timestamp: 2026-05-20T20:17:14.029698+00:00

Evidence Integrity

All files in this bundle have corresponding .sha256 checksum files
Checksums use local basename paths so per-file verification works from each artifact directory

Scenario Summary

Total: 3
Passed: 3
Failed: 0

Post-Processing Capture Summary

Campaigns with capture status: 2
Capture Passed: 2
Capture Failed: 0

Warning/Error Summary

Server Warnings: 72 warnings in server.log
Warning Parser: line-level regex \bWARNING\b|SYSTEM WARNING: (one count per matching line)
Key Warning Categories:
- ACTION_RESOLUTION_MISSING_FIELDS
- INVENTORY_SAFEGUARD

jleechan2015/README.md

Select an option

No results found

Select an option

No results found

Evidence Package: level_up_entry_offer_pr6958

Package Manifest

Git Provenance

Server Runtime

Environment Variables

Files in This Bundle

Evidence Summary: level_up_entry_offer_pr6958

Test Results

⚠️ Multi-Campaign Isolation Note

Scenario Results

entry_offer_level_up_now_only

modal_mechanic_plus_finish_freeze_time

EVIDENCE_SIGNATURE_GUARD

Provenance Chain

Claim → Artifact Map

Coverage Matrix

Evidence Integrity

What This Evidence Proves vs. Does NOT Prove

Methodology: level_up_entry_offer_pr6958

Test Type

Test Mode

Execution Environment

Test Isolation Design

Evidence Capture

Evidence Mode

Validation Criteria

Notes: level_up_entry_offer_pr6958

Run Information

Evidence Integrity

Scenario Summary

Post-Processing Capture Summary

Warning/Error Summary

Follow-up Items

Additional Context