Skip to content

Instantly share code, notes, and snippets.

@grahama1970
grahama1970 / reviewbundle.X3ZU3p.md
Created September 27, 2025 20:54
LiteLLM all-smokes readiness: ask + diffs (TTL 15m)

Mini‑Agent + Router Readiness — All‑Smokes Orchestration Timeout (Request for Focused Help) Created: 2025-09-27 TTL: Please treat as private and ephemeral; delete within 15 minutes after review.

  1. Context and Goal
  • Project: LiteLLM fork with env‑gated codex‑agent provider and Mini‑Agent + Readiness system.
  • Deploy gate we want: make project-ready-all must pass, i.e., EVERY smoke (smoke, smoke_optional, ndsmoke, ndsmoke_e2e) green.
  • Current state: All failing clusters are fixed; individual smokes pass in isolation. The composite readiness check all_smokes times out in the harness despite extending per‑check timeout.
  1. What’s Working (evidence-based)
@grahama1970
grahama1970 / CODEWORLD_REVIEW_QUESTIONS.md
Created September 27, 2025 14:03
CodeWorld — External Review Questions (context + code anchors)

CodeWorld — External Review Questions, Context, and Code Anchors

Context: CodeWorld is a prompt‑driven, multi‑variant orchestrator for agentic code generation. It emits per‑instance prompts, autostarts a tiny FastAPI ingest backend, runs agents (or a local fallback), and aggregates a reproducible scorecard. Observability flows to ArangoDB with a thin proto dashboard. Memory hooks integrate a Graph Memory service for recall and timeline context.

Inspiration: CWM: An Open‑Weights LLM for Research on Code Generation with World Models (Meta AI, Sept 24, 2025). Local copy: docs/papers/CWM_ An Open-Weights LLM for Research on Code Generation with World Models _ Research - AI at Meta.md. Our aim is to explore world‑model style signals for agentic coding by capturing observation→action episodes during runs and enabling recall‑driven guidance.

Objective: Harden the orchestrator for research‑grade iteration while keeping it thin and deterministic by default. We want principled process lifecycle, secure defaults,

@grahama1970
grahama1970 / REVIEW_BUNDLE_PROMPT.md
Created September 26, 2025 21:49
Extractor External Review — 2025-09-26

External Review Prompt (Extractor — Canonical v1)

Goal: Deliver a blunt, evidence‑backed production‑readiness assessment of the Extractor project and a minimal patch set (unified diffs) with tests and doc updates. Keep changes surgical. Ship safety first.

Reviewer persona & tone

  • Principal SRE/DevEx + AppSec; fluent with Python/uv, Typer/FastAPI, Vite/React, ArangoDB, CI.
  • Be terse, specific, and fail‑closed. Unverified claims must be marked 🔴 (blocking) or 🟡 (needs proof). No hand‑waving.

Project context (declare at top of your report)

  • Project: Extractor — Self‑Correcting Agentic Document Processing System (multi‑stage pipeline + tabbed UX)
@grahama1970
grahama1970 / config.toml
Created September 9, 2025 13:05
codex config.toml
# ======================================
# CORE
# ======================================
model = "gpt-5"
model_reasoning_effort = "high"
# Disable all sandboxing (no filesystem/network restrictions)
sandbox_mode = "danger-full-access"
# Never prompt for approvals (Codex will run commands directly)
@grahama1970
grahama1970 / AGENTS.md
Last active September 6, 2025 17:15
Instructions for Codex and GPT-5

AGENTS.md

Repository Guidelines

Based on OpenAI Prompting Guide.

Agent Quickstart (Codex CLI)

  • Activation: Start with the prompt:
    Activate the current dir as project using serena
@grahama1970
grahama1970 / codex_call.py
Last active August 29, 2025 14:26
Codexer: Async wrapper for running `codex exec` with robust supervision: supports overall and idle timeouts, graceful shutdown (SIGTERM→SIGKILL), non-deadlocking streamed I/O, rolling capture limits, optional binary/text output, safe logging with redaction & controlled environment handling. Requires Python 3.10+, loguru.
# codex_exec.py
"""
Async wrapper for running `codex exec ...` with robust timeout, streaming, and termination.
Key features:
- Overall and idle timeouts (wall and silence).
- Graceful shutdown (SIGTERM) → hard kill (SIGKILL) with process-group awareness.
- Stream readers that cannot deadlock; cancellation-safe finalization.
- Rolling capture limits to avoid unbounded memory growth.
- Optional binary or decoded text outputs.
@grahama1970
grahama1970 / codexer.sh
Last active August 27, 2025 22:26
Codexer: shell helper for codex that pipes a startup checklist plus the last Codex conversation into codex. Supports --resume, --limit, --id, --index, and --list. Finds session JSONL under ~/.codex/sessions, orders by timestamp, prints User/Assistant lines.
# --- Place below in .zshrc ---------
# === Self-contained codexer with resume seed builder ===
# --- codexer: simple conversation loader for the Codex CLI --------------------
# Features:
# --resume Append the last conversation (user + assistant) to the seed
# --limit N Include only the last N lines of that conversation
# --id SESSION_ID Resume a specific session id (instead of the most recent)
@grahama1970
grahama1970 / 01_litellm_call.py
Last active August 18, 2025 20:41
Fast async Python CLI to batch run prompts via LiteLLM with robust image support. Supports local/remote images in prompts, pre-downloads and inlines them, cache-enabled with Redis fallback, and flexible prompt input (files, stdin, or inline). Uses Typer for CLI.
#!/usr/bin/env python3
"""
LiteLLM Call - Easy async LLM batch runner with automatic image support
WHAT IT DOES:
- Run multiple LLM prompts in parallel for speed
- Automatically detects and includes images from URLs or local files
- Works with any LiteLLM-supported model (OpenAI, Anthropic, Ollama, etc.)
- Handles all image processing automatically (compression, base64 encoding)
@grahama1970
grahama1970 / codebase_indexer.py
Created August 15, 2025 14:21
codebase indexer for an agent
#!/usr/bin/env python3
"""
Codebase Indexer for Semantic Code Search
A tool for indexing code repositories into ArangoDB with semantic embeddings,
enabling intelligent code search beyond simple text matching.
Key Features:
- Extracts functions/classes using tree-sitter AST parsing
- Generates semantic embeddings using nomic-embed-code model (1024-dim)
@grahama1970
grahama1970 / scratch.md
Created August 1, 2025 13:44
perplexity estimate

Here is the final clear and concise estimate with math for how many full-time developers a 16 Nvidia H200 GPU cluster can support running Grok’s Kimi-k2, based purely on token throughput:

Given:

  • Average tokens per developer per day: ~5,800,000 tokens
  • Seconds per day: 86,400 seconds
  • Per-GPU token throughput (prefill + decode combined) from Grok Kimi-k2 benchmarks: ~4,000 tokens/sec
  • Number of GPUs in cluster: 16

Step 1: Calculate tokens per second per developer