Skip to content

Instantly share code, notes, and snippets.

@jleechan2015
Last active April 10, 2026 19:36
Show Gist options
  • Select an option

  • Save jleechan2015/97790fb4827b2aaee5cc452cbdaf2f33 to your computer and use it in GitHub Desktop.

Select an option

Save jleechan2015/97790fb4827b2aaee5cc452cbdaf2f33 to your computer and use it in GitHub Desktop.
PR #6161 Evidence Bundle: rewards_box/planning_block atomicity fix — unit tests, browser E2E, MCP red/green differential
{
"passed": true,
"test_name": "ui_level_up_rewards_planning_atomicity_browser",
"scenarios": [
{
"name": "pending_level_up_projection",
"passed": true,
"latest_choice_ids": [
"level_up_now",
"continue_adventuring",
"__custom_action__"
]
},
{
"name": "multi_level_organic_progression",
"passed": true,
"errors": [],
"campaign_id": "jAzdbQqZ2NXPrnduCwLx",
"progression": [
{
"target_level": 2,
"start_level": 1,
"seeded_xp": 400,
"triggered": true,
"post_level_up_now_choices": [
"level_up_now",
"aggressive_opening",
"defensive_stance",
"utilize_action_surge",
"continue_adventuring",
"__custom_action__"
],
"modal_steps": [
{
"step": 1,
"choice_ids": [
"level_up_now",
"aggressive_opening",
"defensive_stance",
"utilize_action_surge",
"continue_adventuring",
"__custom_action__"
]
},
{
"step": 2,
"choice_ids": [
"level_up_now",
"defensive_parry",
"action_surge_strike",
"tactical_reposition",
"continue_adventuring",
"__custom_action__"
]
},
{
"step": 3,
"choice_ids": [
"level_up_now",
"strike_knee_joint",
"action_surge_barrage",
"disarming_strike",
"continue_adventuring",
"__custom_action__"
]
},
{
"step": 4,
"choice_ids": [
"level_up_now",
"execute_killing_blow",
"tactical_restraint",
"salvage_mana_core",
"continue_adventuring",
"__custom_action__"
]
},
{
"step": 5,
"choice_ids": [
"level_up_now",
"continue_adventuring",
"__custom_action__"
]
}
],
"reached_finish": true,
"end_level": 2
},
{
"target_level": 3,
"start_level": 2,
"seeded_xp": 1000,
"triggered": true,
"post_level_up_now_choices": [
"level_up_now",
"choose_champion",
"choose_battle_master",
"choose_eldritch_knight",
"finish_level_up_return_to_game",
"continue_adventuring",
"__custom_action__"
],
"modal_steps": [
{
"step": 1,
"choice_ids": [
"level_up_now",
"choose_champion",
"choose_battle_master",
"choose_eldritch_knight",
"finish_level_up_return_to_game",
"continue_adventuring",
"__custom_action__"
]
}
],
"reached_finish": true,
"end_level": 3
}
],
"final_level": 3
}
],
"errors": [],
"campaign_id": "f3q1VXjmyO2oFa0H4886",
"user_id": "browser-test-1775786323",
"final_level": 3
}
.beads/issues.jsonl | 8 +-
evidence/pr_6161_red_green/README.md | 119 +++
evidence/pr_6161_red_green/checksums.sha256 | 33 +
evidence/pr_6161_red_green/green/evidence.md | 99 +++
.../pr_6161_red_green/green/evidence.md.sha256 | 1 +
.../green/gemini_http_request_responses.jsonl | 54 ++
.../gemini_http_request_responses.jsonl.sha256 | 1 +
.../green/http_request_responses.jsonl | 208 ++++++
.../green/http_request_responses.jsonl.sha256 | 1 +
.../green/llm_request_responses.jsonl.sha256 | 1 +
evidence/pr_6161_red_green/green/metadata.json | 105 +++
.../pr_6161_red_green/green/metadata.json.sha256 | 1 +
evidence/pr_6161_red_green/green/methodology.md | 51 ++
.../pr_6161_red_green/green/methodology.md.sha256 | 1 +
evidence/pr_6161_red_green/green/notes.md | 38 +
evidence/pr_6161_red_green/green/notes.md.sha256 | 1 +
evidence/pr_6161_red_green/green/run.json | 247 +++++++
evidence/pr_6161_red_green/green/run.json.sha256 | 1 +
evidence/pr_6161_red_green/red/doctor_report.json | 91 +++
.../red/doctor_report.json.sha256 | 1 +
evidence/pr_6161_red_green/red/evidence.md | 101 +++
evidence/pr_6161_red_green/red/evidence.md.sha256 | 1 +
.../red/gemini_http_request_responses.jsonl | 62 ++
.../red/gemini_http_request_responses.jsonl.sha256 | 1 +
.../red/http_request_responses.jsonl | 226 ++++++
.../red/http_request_responses.jsonl.sha256 | 1 +
.../red/llm_request_responses.jsonl.sha256 | 1 +
evidence/pr_6161_red_green/red/metadata.json | 111 +++
.../pr_6161_red_green/red/metadata.json.sha256 | 1 +
evidence/pr_6161_red_green/red/methodology.md | 51 ++
.../pr_6161_red_green/red/methodology.md.sha256 | 1 +
evidence/pr_6161_red_green/red/notes.md | 79 ++
evidence/pr_6161_red_green/red/notes.md.sha256 | 1 +
evidence/pr_6161_red_green/red/run.json | 471 ++++++++++++
evidence/pr_6161_red_green/red/run.json.sha256 | 1 +
mvp_site/frontend_v1/app.js | 15 +-
mvp_site/main.py | 54 +-
mvp_site/streaming_orchestrator.py | 108 ++-
mvp_site/tests/test_level_up_stale_flags.py | 813 ++++++++++++++++++++-
mvp_site/tests/test_world_logic.py | 74 ++
mvp_site/world_logic.py | 699 +++++++++++++-----
testing_mcp/lib/base_test.py | 4 +
testing_mcp/lib/server_utils.py | 135 +++-
testing_mcp/test_rewards_box_planning_block_e2e.py | 632 +++++++++++++++-
..._level_up_rewards_planning_atomicity_browser.py | 707 ++++++++++++++++++
45 files changed, 5186 insertions(+), 226 deletions(-)

PR #6161 — Evidence Bundle: rewards_box/planning_block Atomicity Fix

Commit: b5f9bc637 on branch fix/rewards-box-planning-block-atomic Date: 2026-04-10 PR: https://github.com/jleechanorg/worldarchitect.ai/pull/6161

Evidence Tiers

Tier 1: Unit Tests (48/48 PASSED)

See 01_pytest_output.txt for full pytest -v output.

Key test classes:

  • TestLevelUpStaleFlags (4 tests) — stale flag cleanup
  • TestRewardBoxLevelUpInjection (9 tests) — injection logic
  • TestLevelUpChoiceOrdering (4 tests) — choice ordering
  • TestRewardsPendingUntilLevelUp (4 tests) — pending rewards lifecycle
  • TestRewardsBoxPlanningBlockAtomicity (27 tests) — atomicity, JSON shapes, canonical projection, sticky-false harness

Tier 2: Browser E2E Test (2/2 scenarios PASSED)

See 02_browser_test_run.json for full run.json.

Test: test_level_up_rewards_planning_atomicity_browser.py — iteration 012

  • pending_level_up_projection: PASS — verifies canonical level-up projection on polling
  • multi_level_organic_progression (level 1→3): PASS — verifies multi-level atomicity through real modal interactions

Server: local gunicorn on port 8066, real Firebase + real Gemini LLM.

Tier 3: MCP E2E Red/Green Differential

See 03_red_green_readme.md and 04_green_run.json.

  • RED (pre-fix): rewards_box absent from polling response → badge-without-buttons
  • GREEN (post-fix): rewards_box present, paired with planning_block → 3/3 scenarios PASS

Tier 4: Design Alignment

Invariant Implementation
Polling extraction structured_fields-first + flattened fallback via _extract_latest_gemini_ui_fields
Stale revalidation Fail-closed: _resolve_canonical_level_up_ui_pair returns (None, None) on flag contradiction
Atomicity parity Both streaming (line 7186) and polling (line 7477) use _resolve_canonical_level_up_ui_pair
planning_block normalization normalize_planning_block_choices on both pass-through and canonical paths
structured_fields persistence rewards_box in structured_fields_utils.py extraction layer

Tier 5: Code Diff

See 05_diffstat.txt for full diff stat (45 files changed, +5187 -226).

{
"scenarios": [
{
"name": "atomicity_e2e",
"passed": true,
"errors": [],
"campaign_id": "3GDFTAaejZ7khpU9pTIg",
"level_up_triggered": true,
"immediate_response": {
"had_rewards_box": true,
"had_planning_block": true
},
"polled_response": {
"had_rewards_box": true,
"had_planning_block": true,
"rewards_box_level_up": true,
"planning_block_has_level_up_choices": true
},
"user_id": "test-rewards_box_planning_block_e2e-1775767545",
"user_email": "<redacted-email>"
},
{
"name": "projected_level_up_button_text",
"passed": true,
"errors": [],
"campaign_id": "Itzj9icU4hytVBqcrPse",
"pending_choice_ids": [
"level_up_now",
"continue_adventuring"
],
"modal_choice_ids": [
"level_up_now",
"begin_combat_simulation",
"explore_sector_b12",
"visit_the_vaults",
"continue_adventuring",
"finish_level_up_return_to_game"
],
"finished_choice_ids": [
"begin_combat_simulation",
"explore_sector_b12",
"visit_the_vaults",
"finish_level_up_return_to_game"
],
"finished_level": 2,
"pending_signal_present": true,
"user_id": "test-rewards_box_planning_block_e2e-1775767545",
"user_email": "<redacted-email>"
},
{
"name": "multi_level_organic_progression",
"passed": true,
"errors": [],
"campaign_id": "nZ3MPpvor03dOy6C5iZd",
"progression": [
{
"target_level": 2,
"start_level": 1,
"seeded_xp": 275,
"end_level": 2,
"triggered": true,
"completed": true,
"immediate_choice_ids": [
"level_up_now",
"continue_adventuring"
],
"polled_choice_ids": [
"level_up_now",
"continue_adventuring",
"finish_level_up_return_to_game"
],
"completion_transcript": [
{
"step": "enter_level_up",
"action": "CHOICE:level_up_now",
"choice_ids": [
"finish_level_up_return_to_game"
],
"has_planning_block": true
},
{
"step": "level_up_step_1",
"action": "CHOICE:finish_level_up_return_to_game",
"choice_ids": [
"action_surge_strike",
"disciplined_strike",
"tactical_parry"
],
"has_planning_block": true
},
{
"step": "post_finish_state_poll",
"action": "get_campaign_state",
"level": 2,
"level_up_in_progress": false,
"rewards_pending_level_up": null
}
]
},
{
"target_level": 3,
"start_level": 2,
"seeded_xp": 875,
"end_level": 3,
"triggered": true,
"completed": true,
"immediate_choice_ids": [
"level_up_now",
"continue_adventuring"
],
"polled_choice_ids": [
"level_up_now",
"continue_adventuring",
"finish_level_up_return_to_game"
],
"completion_transcript": [
{
"step": "enter_level_up",
"action": "CHOICE:level_up_now",
"choice_ids": [
"level_up_now",
"continue_adventuring",
"finish_level_up_return_to_game"
],
"has_planning_block": true
},
{
"step": "level_up_step_1",
"action": "CHOICE:finish_level_up_return_to_game",
"choice_ids": [
"aggressive_opening",
"defensive_posture",
"tactical_maneuver",
"finish_level_up_return_to_game"
],
"has_planning_block": true
},
{
"step": "post_finish_state_poll",
"action": "get_campaign_state",
"level": 3,
"level_up_in_progress": false,
"rewards_pending_level_up": null
}
]
},
{
"target_level": 4,
"start_level": 3,
"seeded_xp": 2675,
"end_level": 4,
"triggered": true,
"completed": true,
"immediate_choice_ids": [
"level_up_now",
"continue_adventuring"
],
"polled_choice_ids": [
"level_up_now",
"continue_adventuring",
"finish_level_up_return_to_game"
],
"completion_transcript": [
{
"step": "enter_level_up",
"action": "CHOICE:level_up_now",
"choice_ids": [
"level_up_now",
"asi_strength",
"asi_dexterity",
"asi_constitution",
"feat_great_weapon_master",
"continue_adventuring",
"finish_level_up_return_to_game"
],
"has_planning_block": true
},
{
"step": "level_up_step_1",
"action": "CHOICE:asi_strength",
"choice_ids": [
"level_up_now",
"continue_adventuring",
"finish_level_up_return_to_game"
],
"has_planning_block": true
},
{
"step": "level_up_step_2",
"action": "CHOICE:finish_level_up_return_to_game",
"choice_ids": [
"overpowering_strike",
"tactical_action_surge",
"disciplined_defense",
"finish_level_up_return_to_game"
],
"has_planning_block": true
},
{
"step": "post_finish_state_poll",
"action": "get_campaign_state",
"level": 4,
"level_up_in_progress": false,
"rewards_pending_level_up": null
}
]
}
],
"final_level": 4,
"user_id": "test-rewards_box_planning_block_e2e-1775767545",
"user_email": "<redacted-email>"
}
],
"summary": {
"total": 3,
"passed": 3,
"failed": 0,
"pass_rate": "3/3 (100%)",
"raw_total": 3,
"raw_passed": 3,
"raw_pass_rate": "100.0%",
"raw_data_complete": true
},
"campaign_capture_status": {
"3GDFTAaejZ7khpU9pTIg": {
"status": "success",
"attempts": 1,
"export": {
"status": "success"
}
},
"Itzj9icU4hytVBqcrPse": {
"status": "success",
"attempts": 1,
"export": {
"status": "success"
}
},
"nZ3MPpvor03dOy6C5iZd": {
"status": "success",
"attempts": 1,
"export": {
"status": "success"
}
}
}
}
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/jleechan/projects/worktree_level3/mvp_site
configfile: pytest.ini
plugins: xdist-3.8.0, anyio-4.12.1, testmon-2.2.0, django-4.7.0, mock-3.15.1, langsmith-0.7.6, timeout-2.4.0, asyncio-1.3.0, cov-7.0.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 48 items
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpStaleFlags::test_level_up_in_progress_cleared_when_new_level_up_available PASSED [ 2%]
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpStaleFlags::test_rewards_pending_cleared_on_level_up_exit PASSED [ 4%]
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpStaleFlags::test_character_creation_flags_cleared_on_level_up_exit PASSED [ 6%]
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpStaleFlags::test_routing_after_level_up_exit_with_cleared_flags PASSED [ 8%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_when_rewards_box_available_but_rewards_pending_null PASSED [ 10%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_with_multilevel_target_from_xp PASSED [ 12%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_requires_crossed_rewards_box_threshold PASSED [ 14%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_accepts_post_threshold_rewards_box_fallback PASSED [ 16%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_ignored_when_rewards_box_stale PASSED [ 18%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_rewards_box_false_no_injection PASSED [ 20%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_rewards_box_none_no_injection PASSED [ 22%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_both_rewards_pending_and_rewards_box_set PASSED [ 25%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardBoxLevelUpInjection::test_inject_levelup_choices_rewards_box_level_up_with_no_planning_block PASSED [ 27%]
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpChoiceOrdering::test_level_up_now_is_first_when_injected_with_existing_llm_choices PASSED [ 29%]
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpChoiceOrdering::test_level_up_now_is_first_when_llm_provided_it_last PASSED [ 31%]
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpChoiceOrdering::test_level_up_now_is_first_when_planning_block_is_none PASSED [ 33%]
mvp_site/tests/test_level_up_stale_flags.py::TestLevelUpChoiceOrdering::test_level_up_now_already_first_no_reorder PASSED [ 35%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsPendingUntilLevelUp::test_check_level_up_resets_stale_processed_while_level_up_outstanding PASSED [ 37%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsPendingUntilLevelUp::test_check_level_up_does_not_reset_processed_when_level_reached_same_turn PASSED [ 39%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsPendingUntilLevelUp::test_has_pending_rewards_true_when_level_up_processed_but_not_leveled PASSED [ 41%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsPendingUntilLevelUp::test_has_pending_rewards_false_when_player_reached_pending_new_level PASSED [ 43%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_atomicity_inject_when_rewards_box_level_up_but_no_planning PASSED [ 45%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_atomicity_suppress_when_rewards_box_level_up_no_planning_no_inject PASSED [ 47%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_atomicity_passes_when_both_present PASSED [ 50%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_synthesizes_rewards_box_when_only_planning_signals_level_up PASSED [ 52%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_keeps_rewards_box_during_level_up_in_progress PASSED [ 54%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_falls_back_to_planning_signal_when_modal_flags_are_missing PASSED [ 56%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_does_not_resurrect_stale_modal_from_planning_alone PASSED [ 58%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_suppresses_orphaned_level_up_planning_without_canonical_signal PASSED [ 60%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_keeps_rewards_box_for_advanced_level_up_modal_despite_false_flags PASSED [ 62%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_keeps_rewards_box_for_finishable_modal_despite_false_flags PASSED [ 64%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_canonical_pair_preserves_finish_choice_for_active_level_up_modal PASSED [ 66%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_atomicity_passes_when_neither_present PASSED [ 68%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_atomicity_dict_format_choices_support PASSED [ 70%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_atomicity_inject_failure_fallback_suppression PASSED [ 72%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_build_level_up_rewards_box_uses_empty_loot_list_when_no_items PASSED [ 75%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_get_campaign_state_unified_returns_jsonable_dict_shapes PASSED [ 77%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_get_campaign_state_unified_projects_level_up_ui_from_game_state PASSED [ 79%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_get_campaign_state_unified_ignores_story_stale_level_up_ui PASSED [ 81%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_get_campaign_state_unified_never_returns_level_up_planning_without_rewards_box PASSED [ 83%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_get_campaign_state_unified_preserves_latest_non_level_up_planning_block PASSED [ 85%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_get_campaign_state_unified_reads_flattened_top_level_rewards_box PASSED [ 87%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_resolve_canonical_level_up_ui_pair_suppresses_stale_rewards_box_after_level_applied PASSED [ 89%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_append_canonical_level_up_story_projection_appends_synthetic_pair PASSED [ 91%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_append_canonical_level_up_story_projection_noop_on_xp_gain_without_level_up PASSED [ 93%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_resolve_canonical_level_up_ui_pair_keeps_pending_when_in_progress_false PASSED [ 95%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_maybe_trigger_level_up_modal_sets_flags_from_false PASSED [ 97%]
mvp_site/tests/test_level_up_stale_flags.py::TestRewardsBoxPlanningBlockAtomicity::test_maybe_trigger_level_up_modal_sets_flags_when_absent PASSED [100%]
=============================== warnings summary ===============================
<frozen importlib._bootstrap>:488
<frozen importlib._bootstrap>:488: DeprecationWarning: Type google._upb._message.MessageMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.
<frozen importlib._bootstrap>:488
<frozen importlib._bootstrap>:488: DeprecationWarning: Type google._upb._message.ScalarMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================== 48 passed, 2 warnings in 1.42s ========================

PR #6161 — Red/Green Differential Evidence (Audit-Grade)

Claim class: Pipeline E2E (MCP layer) — PR #6161 eliminates orphan planning_block without paired rewards_box.level_up_available=true in both the immediate response and the polled get_campaign_state path.

Important correction to earlier summaries: this differential is MCP-layer (HTTP calls to a real local MCP server), not browser/DOM. The multi-level scenario drives modal progression via ctx.process_action("CHOICE:...") MCP calls — see testing_mcp/test_rewards_box_planning_block_e2e.py:217. It does not exercise real browser click paths. Browser-layer proof is tracked separately.

Method

Same test file (testing_mcp/test_rewards_box_planning_block_e2e.py at current fix-branch head) run against two server states with real MCP server

  • real Gemini API (no mocks, MOCK_SERVICES_MODE enforced off).
RED GREEN
git_head (from red/metadata.json:git_provenance, green/metadata.json:git_provenance) 3c7a91bdcd2bdabe641259f4c9d0501024426654 e68239f7728465ab766df8020b86dc248a2bad06
git_branch (detached — pre-fix base) fix/rewards-box-planning-block-atomic
merge_base with main 3c7a91bdc 3c7a91bdc
commits ahead of main 0 40
server command gunicorn mvp_site.main:app (pre-fix world_logic.py) gunicorn mvp_site.main:app (fix-branch world_logic.py)
server port / pid 8051 / 1102462 59367 / 1001881
server worktree <redacted-tmp-path> <redacted-home-path>
run timestamp (UTC) 2026-04-09T21:30:08Z 2026-04-09T20:57:55Z
Model gemini-3-flash-preview gemini-3-flash-preview
Pass rate 1/3 (33%) 3/3 (100%)

Key provenance distinction: the only meaningful delta between RED and GREEN is mvp_site/world_logic.py (and adjacent fix files main.py, streaming_orchestrator.py). The RED worktree was created from 3c7a91bdcd and had the current testing_mcp/ test suite copied in, so the test code is identical between runs. This isolates the fix itself as the only variable — see red/metadata.jsongit_provenance.working_tree_changed_files (modernized test harness files).

Per-scenario results

RED (3c7a91bdcd)

Scenario Result Key errors
atomicity_e2e PASSED (lucky single-snapshot pass)
projected_level_up_button_text FAILED Projected pending state did not return rewards_box.level_up_available=true; did not return canonical level-up planning choices; missing level_up_now button text
multi_level_organic_progression FAILED level_up_to_2/3: immediate response has level-up planning choices without a paired rewards_box.level_up_available=true (+ polled path variant)

GREEN (e68239f77)

Scenario Result
atomicity_e2e PASSED
projected_level_up_button_text PASSED — projection path renders level_up_now button
multi_level_organic_progression PASSED — level 1→4 progression via MCP CHOICE:* actions, campaign nZ3MPpvor03dOy6C5iZd

Bundle contents

evidence/pr_6161_red_green/
├── README.md                  (this file)
├── checksums.sha256           (manifest of all bundle files)
├── red/
│   ├── run.json               (test harness result, per-scenario errors)
│   ├── metadata.json          (git_head, server pid/cmdline, env, provenance)
│   ├── methodology.md         (harness-generated method notes)
│   ├── evidence.md            (harness-generated evidence report)
│   ├── doctor_report.json     (environment doctor report)
│   ├── notes.md               (harness notes)
│   ├── http_request_responses.jsonl       (HTTP traffic to MCP server)
│   ├── gemini_http_request_responses.jsonl (Gemini API calls)
│   ├── llm_request_responses.jsonl.sha256  (large llm jsonl — checksum only)
│   └── *.sha256                            (per-file checksums)
└── green/
    └── (same structure)

The per-run metadata.json, methodology.md, evidence.md, doctor_report.json, and *.sha256 files are produced directly by the test harness (testing_mcp/lib/base_test.py evidence finalizer) and are NOT post-edited for this bundle. Each run's files are cryptographically tied to the test-harness process via per-file SHA-256 sidecars.

llm_request_responses.jsonl (13–14 MB per run) is omitted from the bundle to keep the commit size reasonable; its SHA-256 is included as llm_request_responses.jsonl.sha256 in each directory so the file content remains verifiable. The full artifacts are available at:

  • RED: <redacted-tmp-path>
  • GREEN: <redacted-tmp-path>

Reproduction

# RED — pre-fix worktree at PR base
git worktree add <redacted-tmp-path> 3c7a91bdcd --detach
cp -r testing_mcp/. <redacted-tmp-path>
cd <redacted-tmp-path>
ln -sf <redacted-home-path> venv
TESTING_AUTH_BYPASS=true ALLOW_TEST_AUTH_BYPASS=true \
  PYTHONPATH="$(pwd):$(pwd)/mvp_site" \
  venv/bin/python testing_mcp/test_rewards_box_planning_block_e2e.py
# Expected: 1/3 pass, with atomicity violations in scenarios 2 and 3.

# GREEN — fix branch
cd <redacted-home-path>  # on fix/rewards-box-planning-block-atomic
TESTING_AUTH_BYPASS=true ALLOW_TEST_AUTH_BYPASS=true \
  PYTHONPATH="$(pwd):$(pwd)/mvp_site" \
  venv/bin/python testing_mcp/test_rewards_box_planning_block_e2e.py
# Expected: 3/3 pass.

Scope note

This bundle establishes MCP-layer atomicity fix validation with a genuine red/green differential. It does not establish browser-layer proof. The testing_ui/test_level_up_rewards_planning_atomicity_browser.py browser test suite remains out of scope for PR #6161's merge gate as of commit e68239f77.

PR #6161 — Browser Video Evidence

Recording: Level-Up Rewards + Planning Block Atomicity

Test: testing_ui/test_level_up_rewards_planning_atomicity_browser.py Result: 2/2 scenarios passed (pending projection + multi-level 1→3)

Artifacts (committed to PR branch)

File Size Description
evidence/pr_6161_red_green/level_up_rewards_atomicity_demo.gif 5.8 MB 60-second excerpt at 6fps
evidence/pr_6161_red_green/level_up_rewards_atomicity_demo.webm 21 MB Full 6m26s browser session

What the recording shows

  1. Real browser navigates to WorldArchitect.AI campaign page
  2. Level-up event triggered — rewards_box appears with level_up_available=true
  3. planning_block choices appear co-visibly with rewards_box (atomicity verified)
  4. Multi-level progression (1→3) validated end-to-end via modal flow
  5. finish_level_up_return_to_game choice preserved throughout modal transitions

Source

Extracted from Playwright .webm recording at: /tmp/worldarchitectai/fix_rewards-box-planning-block-atomic/videos/a301ffa6bbd9b4f6ff148f5d12aacc95.webm

Command used to extract GIF:

ffmpeg -y -ss 60 -i <webm> -t 60 \
  -vf "fps=6,scale=900:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=128[p];[s1][p]paletteuse=dither=bayer" \
  level_up_rewards_atomicity_demo.gif

Harness gate added

.claude/skills/evidence-standards.md now mandates browser video whenever testing_ui/test_*.py is added or modified in a PR. Reviewer checklist:

git diff --name-only origin/main...HEAD | grep -E '^testing_ui/test_.*\.py$'

If non-empty → reject evidence as MISSING until video artifact exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment