Graham grahama1970

Mini‑Agent + Router Readiness — All‑Smokes Orchestration Timeout (Request for Focused Help) Created: 2025-09-27 TTL: Please treat as private and ephemeral; delete within 15 minutes after review.

Context and Goal

Project: LiteLLM fork with env‑gated codex‑agent provider and Mini‑Agent + Readiness system.
Deploy gate we want: make project-ready-all must pass, i.e., EVERY smoke (smoke, smoke_optional, ndsmoke, ndsmoke_e2e) green.
Current state: All failing clusters are fixed; individual smokes pass in isolation. The composite readiness check all_smokes times out in the harness despite extending per‑check timeout.

What’s Working (evidence-based)

CodeWorld — External Review Questions, Context, and Code Anchors

Context: CodeWorld is a prompt‑driven, multi‑variant orchestrator for agentic code generation. It emits per‑instance prompts, autostarts a tiny FastAPI ingest backend, runs agents (or a local fallback), and aggregates a reproducible scorecard. Observability flows to ArangoDB with a thin proto dashboard. Memory hooks integrate a Graph Memory service for recall and timeline context.

Inspiration: CWM: An Open‑Weights LLM for Research on Code Generation with World Models (Meta AI, Sept 24, 2025). Local copy: docs/papers/CWM_ An Open-Weights LLM for Research on Code Generation with World Models _ Research - AI at Meta.md. Our aim is to explore world‑model style signals for agentic coding by capturing observation→action episodes during runs and enabling recall‑driven guidance.

Objective: Harden the orchestrator for research‑grade iteration while keeping it thin and deterministic by default. We want principled process lifecycle, secure defaults,

External Review Prompt (Extractor — Canonical v1)

Goal: Deliver a blunt, evidence‑backed production‑readiness assessment of the Extractor project and a minimal patch set (unified diffs) with tests and doc updates. Keep changes surgical. Ship safety first.

Reviewer persona & tone

Principal SRE/DevEx + AppSec; fluent with Python/uv, Typer/FastAPI, Vite/React, ArangoDB, CI.
Be terse, specific, and fail‑closed. Unverified claims must be marked 🔴 (blocking) or 🟡 (needs proof). No hand‑waving.

Project context (declare at top of your report)

Project: Extractor — Self‑Correcting Agentic Document Processing System (multi‑stage pipeline + tabbed UX)

AGENTS.md

Repository Guidelines

Based on OpenAI Prompting Guide.

Agent Quickstart (Codex CLI)

Activation: Start with the prompt:
Activate the current dir as project using serena

Here is the final clear and concise estimate with math for how many full-time developers a 16 Nvidia H200 GPU cluster can support running Grok’s Kimi-k2, based purely on token throughput:

Given:

Average tokens per developer per day: ~5,800,000 tokens
Seconds per day: 86,400 seconds
Per-GPU token throughput (prefill + decode combined) from Grok Kimi-k2 benchmarks: ~4,000 tokens/sec
Number of GPUs in cluster: 16

	# ======================================
	# CORE
	# ======================================
	model = "gpt-5"
	model_reasoning_effort = "high"

	# Disable all sandboxing (no filesystem/network restrictions)
	sandbox_mode = "danger-full-access"

	# Never prompt for approvals (Codex will run commands directly)

	# codex_exec.py
	"""
	Async wrapper for running `codex exec ...` with robust timeout, streaming, and termination.

	Key features:
	- Overall and idle timeouts (wall and silence).
	- Graceful shutdown (SIGTERM) → hard kill (SIGKILL) with process-group awareness.
	- Stream readers that cannot deadlock; cancellation-safe finalization.
	- Rolling capture limits to avoid unbounded memory growth.
	- Optional binary or decoded text outputs.

	# --- Place below in .zshrc ---------

	# === Self-contained codexer with resume seed builder ===


	# --- codexer: simple conversation loader for the Codex CLI --------------------
	# Features:
	# --resume Append the last conversation (user + assistant) to the seed
	# --limit N Include only the last N lines of that conversation
	# --id SESSION_ID Resume a specific session id (instead of the most recent)

	#!/usr/bin/env python3

	"""
	LiteLLM Call - Easy async LLM batch runner with automatic image support

	WHAT IT DOES:
	- Run multiple LLM prompts in parallel for speed
	- Automatically detects and includes images from URLs or local files
	- Works with any LiteLLM-supported model (OpenAI, Anthropic, Ollama, etc.)
	- Handles all image processing automatically (compression, base64 encoding)

	#!/usr/bin/env python3
	"""
	Codebase Indexer for Semantic Code Search

	A tool for indexing code repositories into ArangoDB with semantic embeddings,
	enabling intelligent code search beyond simple text matching.

	Key Features:
	- Extracts functions/classes using tree-sitter AST parsing
	- Generates semantic embeddings using nomic-embed-code model (1024-dim)