CodeWorld — External Review Questions, Context, and Code Anchors

Context: CodeWorld is a prompt‑driven, multi‑variant orchestrator for agentic code generation. It emits per‑instance prompts, autostarts a tiny FastAPI ingest backend, runs agents (or a local fallback), and aggregates a reproducible scorecard. Observability flows to ArangoDB with a thin proto dashboard. Memory hooks integrate a Graph Memory service for recall and timeline context.

Inspiration: CWM: An Open‑Weights LLM for Research on Code Generation with World Models (Meta AI, Sept 24, 2025). Local copy: docs/papers/CWM_ An Open-Weights LLM for Research on Code Generation with World Models _ Research - AI at Meta.md. Our aim is to explore world‑model style signals for agentic coding by capturing observation→action episodes during runs and enabling recall‑driven guidance.

Objective: Harden the orchestrator for research‑grade iteration while keeping it thin and deterministic by default. We want principled process lifecycle, secure defaults, graceful degradation without Arango, clear contracts for prompts/outputs, and CI‑friendly smokes. We’re seeking a focused review to de‑risk architectural edges and align with the world‑model framing from CWM.

Where we’re blocked or want deeper input

Security and policy: autostarted backend exposure; /runs endpoint spawns processes; DB creation policy in hosted Arango.
Reliability without Arango: keep /stream useful; minimal RAM mode; schema choices for episodes/logs.
Concurrency + lifecycle: child process cleanup; detach safety; dashboard proc.
Canonicalization: duplicate algos module with risk of drift.
Prompt optimization rules: strictness and error surfacing.
MCP adapter ergonomics and failure modes.

1) Canonicalize `algos` module (prevent drift)

Question

Should we keep src/codeworld/algos/ as the canonical module and remove src/algos/? Any import edge cases in tests/tools to fix?

Why it matters

Two identical copies invite divergence and subtle import bugs during refactors.

Code anchors

src/codeworld/algos/multiply_variants.py
src/algos/multiply_variants.py

Acceptance

One canonical module under src/codeworld/algos/; imports updated; smokes stay green.

Example patch shape (pseudo)

diff --git a/src/algos/multiply_variants.py b/src/algos/multiply_variants.py
deleted file mode 100644
--- a/src/algos/multiply_variants.py
+++ /dev/null
@@
-# identical to src/codeworld/algos/multiply_variants.py; remove duplicate

2) Backend lifecycle and zombie/pipe safety

Questions

Is our backend autostart/stop logic robust across platforms and detach modes?
Could stdout/stderr pipes fill and deadlock? Do we risk zombies on failures?

Why it matters

The CLI frequently spawns uvicorn and optionally a dashboard; leaks or pipe deadlocks hurt CI and long‑running workflows.

Code anchors

_start_backend src/codeworld/cli.py:660–699
Bring‑up and wait src/codeworld/cli.py:920–1010
_stop_process src/codeworld/cli.py:284–306

Concrete asks

Recommend hardened _stop_process (process group termination, bounded read of pipes, platform nuances).
Suggest a small “proc supervisor” helper if warranted.

Snippet

# src/codeworld/cli.py:660–699
def _start_backend(api_base: str, extra_env: Optional[Dict[str, str]] = None) -> Optional[subprocess.Popen]:
    # Prefer uv if present; fall back to invoking uvicorn module
    uv_cmd = shutil.which("uv")
    if uv_cmd:
        return subprocess.Popen([uv_cmd, "run", "uvicorn", "codeworld.logger:app", "--host", "127.0.0.1", "--port", str(port)],
                                stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, env=env)
    return subprocess.Popen([sys.executable, "-m", "uvicorn", "codeworld.logger:app", "--host", "127.0.0.1", "--port", str(port)],
                            stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, env=env)

3) Security posture: loopback binding + `/runs` authorization

Questions

Are we guaranteed loopback‑only bindings in all autostart paths?
Should /runs require an auth token or capability to spawn work?

Why it matters

Spawning runs via HTTP without auth is unsafe on shared hosts; even if we bind to loopback by default, misconfigurations happen.

Code anchors

Bind addresses src/codeworld/cli.py:673,680 (127.0.0.1)
Spawn endpoint src/codeworld/logger.py:500–526 (/runs → codeworld.cli run)

Concrete asks

Propose a minimal token gate for /runs (env CW_RUNS_TOKEN); reject if absent or mismatch.
Hard guard that autostart only binds to loopback; document override as opt‑in.

Snippet

# src/codeworld/logger.py:500–526
@app.post('/runs')
async def spawn_run(payload: dict):
    # TODO: optional token check (e.g., X-Run-Token header)
    cmd = [sys.executable, '-m', 'codeworld.cli', 'run', '--spec', str(target), '--run-id', run_id]
    subprocess.Popen(cmd, env=env)

4) Arango policy: safe DB creation gating

Question

Creating DB/collections from _system may be blocked in hosted Arango. Should this be gated via ALLOW_DB_CREATE=1 and degrade gracefully?

Code anchors

_connect_arango() src/codeworld/logger.py:60–88 (creates DB + collections)

Concrete asks

Add ALLOW_DB_CREATE (default off). If denied, return 503 with actionable message; keep /stream and proto UI working.

Snippet

# src/codeworld/logger.py:60–88 (excerpt)
db = client.db("_system", username=user, password=password)
if not db.has_database(db_name):
    db.create_database(db_name, users=[{"username": user, "password": password, "active": True}])

5) Arango‑down: RAM fallback for `/stream`, `/episodes`, `/logs`

Question

Introduce an in‑memory ring buffer when Arango is unavailable so proto UI remains useful; what’s a minimal, safe design?

Code anchors

503 fallbacks: src/codeworld/logger.py:136–143, :210–233, :244–267
SSE broadcast queue: src/codeworld/logger.py:106–133

Concrete asks

Add CW_RAM_FALLBACK=1 to enable ring buffers (size N) for logs/episodes; mark responses as degraded: true.

6) Variant agent concurrency and file mutation

Questions

The agent writes/edits a shared variants module; is this safe under concurrent runs? Should we isolate per‑run or lock?

Code anchors

Write helpers src/codeworld/variant_agent.py:14–46
Template emit :89–118; mutations :146–189

Concrete asks

Prefer per‑run variants file under workspace/runs/<run_id>/variants.py and import dynamically; or implement file locks.

7) POP rules strictness and error surfacing

Questions

Are the optimization and validation rules strict enough to reject malformed prompts and surface actionable errors?

Code anchors

Help examples src/codeworld/tools/prompt_opt.py:8–10
Rules file src/codeworld/rules/prompt_optimization.yaml

Concrete asks

Add smokes for missing sections; ensure CLI exits non‑zero with clear messages.

8) MCP adapter ergonomics and failure modes

Questions

Are tool names, concurrency caps, and error paths sensible when Python side is down? What should the host expect?

Code anchors

mcp/codeworld-mcp/ (Node adapter)

Concrete asks

Provide ND smoke or doc note for CW_MAX_CONCURRENCY; ensure graceful degradation when backend unreachable.

9) World‑model alignment (CWM framing)

Questions

Given CWM’s emphasis on observation→action trajectories and RL in verifiable coding environments, are our episode and log schemas adequate for downstream world‑model research? What signals are missing (state hashes, interpreter traces, reward shaping)?

Code anchors

Episode ingest src/codeworld/logger.py:200–233
Score aggregation outputs workspace/runs/<run_id>/scorecard.json (runtime artifact)

Concrete asks

Propose a minimal schema extension for episodes to include execution state summaries and reward signals compatible with CWM‑style training data.

Repro and Validation Clues

Fast dev readiness

make project-ready

Live probe (strict backend)

READINESS_LIVE=1 STRICT_READY=1 make project-ready-live

Smokes (deterministic)

GAMIFIED_FAST_BENCH=1 uv run -q python tests/smoke/run_all.py

Release smokes (no Arango)

uv run python release_smokes/00_quick_check.py
uv run python release_smokes/10_run_from_prompt.py
uv run python release_smokes/20_emit_only_then_aggregate.py
uv run python release_smokes/30_run_from_spec.py

Expected Deliverables from Reviewer

Patches or diffs for items 1–5 (canonicalization, lifecycle, security, Arango gate, RAM fallback).
Schema proposal for episodes/logs aligned with world‑model research needs.
Notes on POP rule gaps and MCP ergonomics.

grahama1970/CODEWORLD_REVIEW_QUESTIONS.md

Select an option

No results found

Select an option

No results found

CodeWorld — External Review Questions, Context, and Code Anchors

1) Canonicalize `algos` module (prevent drift)

2) Backend lifecycle and zombie/pipe safety

3) Security posture: loopback binding + `/runs` authorization

4) Arango policy: safe DB creation gating

5) Arango‑down: RAM fallback for `/stream`, `/episodes`, `/logs`

6) Variant agent concurrency and file mutation

7) POP rules strictness and error surfacing

8) MCP adapter ergonomics and failure modes

9) World‑model alignment (CWM framing)

Repro and Validation Clues

Expected Deliverables from Reviewer

grahama1970/CODEWORLD_REVIEW_QUESTIONS.md

CodeWorld — External Review Questions, Context, and Code Anchors

1) Canonicalize algos module (prevent drift)

2) Backend lifecycle and zombie/pipe safety

3) Security posture: loopback binding + /runs authorization

4) Arango policy: safe DB creation gating

5) Arango‑down: RAM fallback for /stream, /episodes, /logs

6) Variant agent concurrency and file mutation

7) POP rules strictness and error surfacing

8) MCP adapter ergonomics and failure modes

9) World‑model alignment (CWM framing)

Repro and Validation Clues

Expected Deliverables from Reviewer

1) Canonicalize `algos` module (prevent drift)

3) Security posture: loopback binding + `/runs` authorization

5) Arango‑down: RAM fallback for `/stream`, `/episodes`, `/logs`