bigsnarfdude · April 7, 2026 20:38 · bigsnarfdude · Apr 7, 2026
diff --git a/rrma_diagram.txt b/rrma_diagram.txt

  ---
  RRMA v4.7 — Complete System Diagram

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                          HUMAN (you)                                        │
  │  bash v4/outer-loop.sh domains/<domain> [max_gens] [num_agents] [turns] [min]│
  └─────────────────────────┬───────────────────────────────────────────────────┘
                            │
                            ▼
  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                     outer-loop.sh  (THE GARDENER)                           │
  │                                                                             │
  │  Reads: taste.md (inherited principles from prior runs)                     │
  │  Writes: outer-loop.log, meta-blackboard.md, taste.md (appends lessons)    │
  │  Backs up: blackboard.md.genN each generation                              │
  │                                                                             │
  │  ┌─ Gen 1 only ──────────────────────────────────────┐                     │
  │  │  calibrate.sh                                      │                     │
  │  │  └→ Claude + WebSearch → calibration.md            │                     │
  │  │     (SOTA, papers, known techniques, baselines)    │                     │
  │  └────────────────────────────────────────────────────┘                     │
  │                                                                             │
  │  ┌─ Per Generation ───────────────────────────────────────────────────┐     │
  │  │                                                                     │     │
  │  │  STEP 1: launch-agents.sh ──────────────────────────────────┐      │     │
  │  │  │                                                           │      │     │
  │  │  │  Pre-flight:                                              │      │     │
  │  │  │  ├─ refresh_context.py → stoplight.md + recent_experiments.md    │     │
  │  │  │  ├─ memory_system.py seed → domain/memory/               │      │     │
  │  │  │  ├─ memory_system.py recall → memory context per agent   │      │     │
  │  │  │  ├─ Create workspace/agent0/, workspace/agent1/ ...      │      │     │
  │  │  │  │  (each seeded from best/train.py or best/config.yaml) │      │     │
  │  │  │  └─ Rotate old logs: agent0.jsonl → agent0_s1.jsonl      │      │     │
  │  │  │                                                           │      │     │
  │  │  │  Spawns (in screen sessions, 15s apart):                  │      │     │
  │  │  │  ├─ rrma-worker0  (claude agent)                         │      │     │
  │  │  │  ├─ rrma-worker1  (claude agent)                         │      │     │
  │  │  │  ├─ rrma-workerN  (claude agent)                         │      │     │
  │  │  │  └─ rrma-meta     (meta-loop.sh)                         │      │     │
  │  │  └──────────────────────────────────────────────────────────┘      │     │
  │  │                                                                     │     │
  │  │  STEP 2: Monitor Loop (every N minutes) ────────────────────┐      │     │
  │  │  │  ├─ Check: are workers still alive?                       │      │     │
  │  │  │  ├─ refresh_context.py → update stoplight + recent_exp   │      │     │
  │  │  │  ├─ diagnose.py ─────────────────────────────────┐       │      │     │
  │  │  │  │    └→ trustloop_scorer.score_domain()          │       │      │     │
  │  │  │  │    └→ Compute PQ (0-30 scale)                  │       │      │     │
  │  │  │  │    └→ Emit: decision + .nudge_data.json        │       │      │     │
  │  │  │  └────────────────────────────────────────────────┘       │      │     │
  │  │  │                                                           │      │     │
  │  │  │  Decision routing:                                        │      │     │
  │  │  │  ├─ CONTINUE    → do nothing, keep monitoring             │      │     │
  │  │  │  ├─ TOO_EARLY   → do nothing (< 8 experiments)           │      │     │
  │  │  │  ├─ NUDGE       → Claude writes observation → blackboard │      │     │
  │  │  │  │                 + constraints → program.md             │      │     │
  │  │  │  │                 (max 3 nudges, then escalate)          │      │     │
  │  │  │  ├─ STOP_HACKING→ Claude rewrites program.md             │      │     │
  │  │  │  │                 (force papers, ablations, explanations)│      │     │
  │  │  │  ├─ REDESIGN    → Claude diagnoses scaffold block         │      │     │
  │  │  │  │                 → minimal fix to program.md            │      │     │
  │  │  │  └─ STOP_DONE   → re-evaluate for unexplored dirs        │      │     │
  │  │  │                    if found → downgrade to NUDGE          │      │     │
  │  │  │                    else → final meta-blackboard + taste   │      │     │
  │  │  └──────────────────────────────────────────────────────────┘      │     │
  │  │                                                                     │     │
  │  │  STEP 3: stop-agents.sh (kill all screen sessions)                  │     │
  │  └─────────────────────────────────────────────────────────────────────┘     │
  │                                                                             │
  │  Loop back to STEP 1 for next generation (with updated program.md)          │
  └─────────────────────────────────────────────────────────────────────────────┘

  ---
  The Agents (Workers)

  ┌─────────────────────────────────────────────────────────────────┐
  │                  WORKER AGENT (1 of N)                           │
  │                  screen: rrma-workerN                            │
  │                  log: logs/agentN.jsonl                          │
  │                                                                  │
  │  READS (on startup, in order):                                   │
  │  ├─ program_static.md ← immutable rules (harness, scoring,      │
  │  │                       lifecycle). Read ONCE.                  │
  │  ├─ program.md ← dynamic guidance (constraints, regime,          │
  │  │               closed brackets). Gardener rewrites this.       │
  │  ├─ stoplight.md ← 30-line compressed run state                 │
  │  │                  (replaces 600+ line blackboard reads)        │
  │  ├─ recent_experiments.md ← last 5 experiments, structured      │
  │  ├─ best/train.py (or config.yaml) ← current best config       │
  │  ├─ meta-blackboard.md ← meta-agent reflections (if exists)     │
  │  ├─ calibration.md ← literature baseline (if exists)            │
  │  └─ [memory context] ← from memory_system.py recall             │
  │                                                                  │
  │  EXPERIMENT LOOP (repeats until max_turns):                      │
  │  │                                                               │
  │  │  1. Think: read stoplight → identify gap or hypothesis        │
  │  │  2. Edit: modify workspace/agentN/train.py (or config.yaml)  │
  │  │     ↑ ONLY edits own workspace copy — no contention           │
  │  │  3. Run: bash run.sh <name> "<description>" <design_type>     │
  │  │     └→ run.sh picks up workspace via $CLAUDE_AGENT_ID         │
  │  │     └→ GPU access serialized via flock                        │
  │  │  4. Record: append result to results.tsv                      │
  │  │     format: id  score  keep/discard  "description"  agent  design  time│
  │  │  5. Reflect: append to shared telemetry files                 │
  │  │  6. Repeat                                                    │
  │  │                                                               │
  │  WRITES:                                                         │
  │  ├─ results.tsv ← append experiment result line                  │
  │  ├─ blackboard.md ← append findings/observations (shared)       │
  │  ├─ MISTAKES.md ← what failed + why + lesson                    │
  │  ├─ DESIRES.md ← tools/context/capabilities agents wish for     │
  │  ├─ LEARNINGS.md ← discovered facts about the domain            │
  │  └─ workspace/agentN/train.py ← edited config (ephemeral)       │
  └─────────────────────────────────────────────────────────────────┘

  ---
  The Meta-Agent

  ┌─────────────────────────────────────────────────────────────────┐
  │                  META-AGENT (meta-loop.sh)                       │
  │                  screen: rrma-meta                                │
  │                  Role: observe + reflect (NEVER directs agents)   │
  │                                                                  │
  │  Every N minutes:                                                │
  │  ├─ refresh_context.py → update stoplight + recent_experiments   │
  │  ├─ Read: stoplight.md, recent_experiments.md, best/config.yaml  │
  │  ├─ Read: previous meta-blackboard.md (if exists)               │
  │  └─ Claude (3 turns) → generate new meta-blackboard.md          │
  │                                                                  │
  │  meta-blackboard.md contains (~120 lines max):                   │
  │  ├─ Current best + config                                        │
  │  ├─ What works (ranked by impact)                                │
  │  ├─ Dead ends (grouped by category)                              │
  │  ├─ Patterns noticed (process-level)                             │
  │  ├─ Blind spots (never-tried approaches)                         │
  │  ├─ Stepping stones (non-winning but promising)                  │
  │  ├─ Surprises (expected vs actual)                               │
  │  ├─ Devil's advocate (why best score might be misleading)        │
  │  └─ Self-reflection (compare to prior cycle)                     │
  │                                                                  │
  │  Written atomically (.tmp + mv)                                  │
  └─────────────────────────────────────────────────────────────────┘

  ---
  TrustLoop (Behavioral IDS)

  ┌─────────────────────────────────────────────────────────────────┐
  │                  trustloop_scorer.py                              │
  │                  (Central Nervous System)                         │
  │                                                                  │
  │  INPUT: results.tsv, blackboard.md, MISTAKES.md, DESIRES.md,    │
  │         LEARNINGS.md, traces (.jsonl)                            │
  │                                                                  │
  │  PRODUCES DomainReport:                                          │
  │  ├─ Experiment Classification                                    │
  │  │   BREAKTHROUGH │ INCREMENTAL │ PLATEAU │ REGRESSION │ CRASH  │
  │  │                                                               │
  │  ├─ Novelty Score (0-1 per experiment)                           │
  │  │   70% description similarity + 30% design label match        │
  │  │                                                               │
  │  ├─ Agent Efficiency                                             │
  │  │   success rate, waste ratio, best contribution per agent      │
  │  │                                                               │
  │  ├─ Redundancy Detection                                         │
  │  │   near-duplicate configs flagged                              │
  │  │                                                               │
  │  ├─ Anomaly Detection                                            │
  │  │   crash streaks (3+), deep stagnation (30+ no breakthrough),  │
  │  │   score jumps, resource waste                                 │
  │  │                                                               │
  │  ├─ Workflow Checks (14 checks)                                  │
  │  │   agent diversity, blackboard usage, format validation        │
  │  │                                                               │
  │  ├─ Insight Extraction                                           │
  │  │   winning strategies, dead ends, recurring mistakes,          │
  │  │   unaddressed desires                                         │
  │  │                                                               │
  │  ├─ Telemetry Parsing                                            │
  │  │   structured MISTAKES, DESIRES, LEARNINGS content             │
  │  │                                                               │
  │  └─ Action Items                                                 │
  │      owner: hitl|gardener, layer: harness|program|agent|scaffold │
  │                                                                  │
  │  CONSUMED BY:                                                    │
  │  ├─ diagnose.py → PQ score + decision logic                     │
  │  ├─ refresh_context.py → stoplight + recent_experiments          │
  │  └─ trustloop_mcp.py → Claude Code inspection tools              │
  └─────────────────────────────────────────────────────────────────┘

  ---
  Process Quality & Decision Matrix

  ┌─────────────────────────────────────────────────────────────────┐
  │                  diagnose.py                                     │
  │                                                                  │
  │  Process Quality (PQ) 0-30:                                      │
  │  ├─ Papers cited?          +3 (>3: +3 more)                     │
  │  ├─ Explanatory reasoning? +3 (>10: +3 more)                    │
  │  ├─ Ablations?             +3 (>3: +3 more)                     │
  │  ├─ Simplifications?       +3                                    │
  │  ├─ Design diversity?      +3 (>5 unique)                        │
  │  ├─ Blackboard usage?      +3 (>100 lines)                      │
  │  ├─ Desires written?       +3                                    │
  │  └─ Learnings written?     +3 (>5)                               │
  │                                                                  │
  │  Decision Matrix:                                                │
  │  ┌──────────────┬──────────────────────────────────────────┐     │
  │  │ < 8 exps     │ TOO_EARLY                                │     │
  │  │ PQ<10, >15   │ STOP_HACKING (rewrite program.md)       │     │
  │  │ crash streak │ NUDGE (fix harness/config)               │     │
  │  │ stagnation   │ NUDGE (inject observation)               │     │
  │  │ flat+PQ≥10   │                                          │     │
  │  │  +blind spots│ REDESIGN (change scaffold)               │     │
  │  │  -blind spots│ STOP_DONE (search exhausted)             │     │
  │  │ otherwise    │ CONTINUE                                 │     │
  │  └──────────────┴──────────────────────────────────────────┘     │
  │                                                                  │
  │  Output: decision string + .nudge_data.json                      │
  │  (gardener_fixes, dead_ends, tool_issues, dominant_axis)         │
  └─────────────────────────────────────────────────────────────────┘

  ---
  Domain File Layout

  domains/<domain>/
  ├── config.yaml              ← domain configuration
  ├── run.sh                   ← harness: takes config → outputs score
  ├── solve.py                 ← (some domains) the code agents edit
  │
  ├── program_static.md        ← IMMUTABLE rules (read once by agents)
  ├── program.md               ← DYNAMIC guidance (gardener rewrites)
  │
  ├── blackboard.md            ← shared append-only state (agents write)
  ├── stoplight.md             ← AUTO-GENERATED 30-line compressed state
  ├── recent_experiments.md    ← AUTO-GENERATED last 5 experiments
  ├── meta-blackboard.md       ← meta-agent reflections
  ├── calibration.md           ← literature search (gen 1)
  │
  ├── results.tsv              ← all experiment results (append-only)
  │   format: id  score  keep/discard  "desc"  agent  design  time
  │
  ├── best/                    ← current best configuration
  │   ├── train.py (or config.yaml)
  │   └── config_hash
  │
  ├── workspace/               ← EPHEMERAL, gitignored
  │   ├── agent0/train.py      ← agent 0's isolated copy
  │   ├── agent1/train.py      ← agent 1's isolated copy
  │   └── agentN/train.py
  │
  ├── logs/
  │   ├── agent0.jsonl          ← full Claude conversation trace
  │   ├── agent1.jsonl
  │   └── agentN.jsonl
  │
  ├── memory/                   ← persistent domain memory (v4.7+)
  │
  ├── DESIRES.md               ← agent telemetry: what they wish for
  ├── MISTAKES.md              ← agent telemetry: structured failures
  ├── LEARNINGS.md             ← agent telemetry: discovered facts
  │
  └── .nudge_data.json         ← diagnose.py output for gardener

  ---
  Memory System (v4.7+)

  ┌─────────────────────────────────────────────────────────────────┐
  │                  memory_system.py                                 │
  │                                                                  │
  │  Commands:                                                       │
  │  ├─ seed <domain>     → create domain/memory/ if missing         │
  │  ├─ scan <dir>        → parse frontmatter + mtime → manifest     │
  │  ├─ retrieve <dir> <q>→ Haiku picks top-5 relevant files         │
  │  ├─ recall <dir> <q>  → scan → retrieve → verify → load         │
  │  └─ staleness <dir>   → age report                               │
  │                                                                  │
  │  Staleness levels:                                               │
  │  ├─ Fresh   ≤1 day                                               │
  │  ├─ Recent  1-7 days                                             │
  │  ├─ Aging   7-30 days  (wrapped with ⚠️ verify warning)          │
  │  └─ Stale   >30 days   (wrapped with ⚠️ verify warning)          │
  │                                                                  │
  │  Memory file format:                                             │
  │  ---                                                             │
  │  name: finding_name                                              │
  │  type: user|feedback|project|reference                           │
  │  verify_against: results.tsv|blackboard.md                       │
  │  claim: "the specific claim to verify"                           │
  │  ---                                                             │
  │  Content (max 30 lines)                                          │
  └─────────────────────────────────────────────────────────────────┘

  ---
  MCP Servers (optional, for Claude Code inspection)

  ┌──────────────────────────┐     ┌──────────────────────────────┐
  │  rrma_mcp.py (read-only) │     │  trustloop_mcp.py (traces)   │
  │                          │     │                              │
  │  Tools:                  │     │  Tools:                      │
  │  ├─ list_domains         │     │  ├─ trustloop_status         │
  │  ├─ domain_summary       │     │  ├─ trustloop_agent          │
  │  ├─ read_artifact        │     │  │   (summary/thinking/      │
  │  ├─ query_results        │     │  │    timeline modes)         │
  │  └─ check_status         │     │  ├─ trustloop_influence      │
  │                          │     │  └─ trustloop_compare         │
  └──────────────────────────┘     └──────────────────────────────┘

  ---
  v2 Legacy Components (still present)

  core/launch.sh    ← v2 launcher (git worktrees, 3 agent designs)
  core/operator.sh  ← v2 HITL controls:
                       claim, request, direct, queue, ban, fact,
                       hunch, strategy, pause, resume, repurpose

  ---
  End-to-End Data Flow (one experiment)

  Agent reads stoplight.md → forms hypothesis
    → edits workspace/agentN/train.py
    → bash run.sh exp-name "description" design_type
      → run.sh copies workspace config, runs training (flock for GPU)
      → outputs score to stdout
    → agent appends to results.tsv
    → agent appends to blackboard.md (finding)
    → agent appends to MISTAKES.md / LEARNINGS.md / DESIRES.md
    → [N minutes later] refresh_context.py regenerates stoplight.md
    → [N minutes later] meta-loop reads → updates meta-blackboard.md
    → [N minutes later] diagnose.py → trustloop_scorer → decision
    → outer-loop acts on decision (CONTINUE/NUDGE/REDESIGN/STOP)

  ---
  taste.md — The Gardener's 11 Principles

  1. Less protocol = better science
  2. Config-tuning ≠ research (high score + low PQ = hacking)
  3. Simplification = maturity
  4. Plateau = mapping the basin, not failure
  5. Re-evaluate old failures post-breakthrough
  6. Plan on stagnation, not round count
  7. Watch axis lock-in (all agents same dimension)
  8. Confirmation across agents = confidence
  9. Low PQ + rising score = STOP_HACKING
  10. High PQ + flat + no blind spots = STOP_DONE
  11. High PQ + flat + blind spots = REDESIGN

  Updated automatically after each generation with new lessons learned.

	---
	RRMA v4.7 — Complete System Diagram

	┌─────────────────────────────────────────────────────────────────────────────┐
	│ HUMAN (you) │
	│ bash v4/outer-loop.sh domains/<domain> [max_gens] [num_agents] [turns] [min]│
	└─────────────────────────┬───────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────────────────────┐
	│ outer-loop.sh (THE GARDENER) │
	│ │
	│ Reads: taste.md (inherited principles from prior runs) │
	│ Writes: outer-loop.log, meta-blackboard.md, taste.md (appends lessons) │
	│ Backs up: blackboard.md.genN each generation │
	│ │
	│ ┌─ Gen 1 only ──────────────────────────────────────┐ │
	│ │ calibrate.sh │ │
	│ │ └→ Claude + WebSearch → calibration.md │ │
	│ │ (SOTA, papers, known techniques, baselines) │ │
	│ └────────────────────────────────────────────────────┘ │
	│ │
	│ ┌─ Per Generation ───────────────────────────────────────────────────┐ │
	│ │ │ │
	│ │ STEP 1: launch-agents.sh ──────────────────────────────────┐ │ │
	│ │ │ │ │ │
	│ │ │ Pre-flight: │ │ │
	│ │ │ ├─ refresh_context.py → stoplight.md + recent_experiments.md │ │
	│ │ │ ├─ memory_system.py seed → domain/memory/ │ │ │
	│ │ │ ├─ memory_system.py recall → memory context per agent │ │ │
	│ │ │ ├─ Create workspace/agent0/, workspace/agent1/ ... │ │ │
	│ │ │ │ (each seeded from best/train.py or best/config.yaml) │ │ │
	│ │ │ └─ Rotate old logs: agent0.jsonl → agent0_s1.jsonl │ │ │
	│ │ │ │ │ │
	│ │ │ Spawns (in screen sessions, 15s apart): │ │ │
	│ │ │ ├─ rrma-worker0 (claude agent) │ │ │
	│ │ │ ├─ rrma-worker1 (claude agent) │ │ │
	│ │ │ ├─ rrma-workerN (claude agent) │ │ │
	│ │ │ └─ rrma-meta (meta-loop.sh) │ │ │
	│ │ └──────────────────────────────────────────────────────────┘ │ │
	│ │ │ │
	│ │ STEP 2: Monitor Loop (every N minutes) ────────────────────┐ │ │
	│ │ │ ├─ Check: are workers still alive? │ │ │
	│ │ │ ├─ refresh_context.py → update stoplight + recent_exp │ │ │
	│ │ │ ├─ diagnose.py ─────────────────────────────────┐ │ │ │
	│ │ │ │ └→ trustloop_scorer.score_domain() │ │ │ │
	│ │ │ │ └→ Compute PQ (0-30 scale) │ │ │ │
	│ │ │ │ └→ Emit: decision + .nudge_data.json │ │ │ │
	│ │ │ └────────────────────────────────────────────────┘ │ │ │
	│ │ │ │ │ │
	│ │ │ Decision routing: │ │ │
	│ │ │ ├─ CONTINUE → do nothing, keep monitoring │ │ │
	│ │ │ ├─ TOO_EARLY → do nothing (< 8 experiments) │ │ │
	│ │ │ ├─ NUDGE → Claude writes observation → blackboard │ │ │
	│ │ │ │ + constraints → program.md │ │ │
	│ │ │ │ (max 3 nudges, then escalate) │ │ │
	│ │ │ ├─ STOP_HACKING→ Claude rewrites program.md │ │ │
	│ │ │ │ (force papers, ablations, explanations)│ │ │
	│ │ │ ├─ REDESIGN → Claude diagnoses scaffold block │ │ │
	│ │ │ │ → minimal fix to program.md │ │ │
	│ │ │ └─ STOP_DONE → re-evaluate for unexplored dirs │ │ │
	│ │ │ if found → downgrade to NUDGE │ │ │
	│ │ │ else → final meta-blackboard + taste │ │ │
	│ │ └──────────────────────────────────────────────────────────┘ │ │
	│ │ │ │
	│ │ STEP 3: stop-agents.sh (kill all screen sessions) │ │
	│ └─────────────────────────────────────────────────────────────────────┘ │
	│ │
	│ Loop back to STEP 1 for next generation (with updated program.md) │
	└─────────────────────────────────────────────────────────────────────────────┘

	---
	The Agents (Workers)

	┌─────────────────────────────────────────────────────────────────┐
	│ WORKER AGENT (1 of N) │
	│ screen: rrma-workerN │
	│ log: logs/agentN.jsonl │
	│ │
	│ READS (on startup, in order): │
	│ ├─ program_static.md ← immutable rules (harness, scoring, │
	│ │ lifecycle). Read ONCE. │
	│ ├─ program.md ← dynamic guidance (constraints, regime, │
	│ │ closed brackets). Gardener rewrites this. │
	│ ├─ stoplight.md ← 30-line compressed run state │
	│ │ (replaces 600+ line blackboard reads) │
	│ ├─ recent_experiments.md ← last 5 experiments, structured │
	│ ├─ best/train.py (or config.yaml) ← current best config │
	│ ├─ meta-blackboard.md ← meta-agent reflections (if exists) │
	│ ├─ calibration.md ← literature baseline (if exists) │
	│ └─ [memory context] ← from memory_system.py recall │
	│ │
	│ EXPERIMENT LOOP (repeats until max_turns): │
	│ │ │
	│ │ 1. Think: read stoplight → identify gap or hypothesis │
	│ │ 2. Edit: modify workspace/agentN/train.py (or config.yaml) │
	│ │ ↑ ONLY edits own workspace copy — no contention │
	│ │ 3. Run: bash run.sh <name> "<description>" <design_type> │
	│ │ └→ run.sh picks up workspace via $CLAUDE_AGENT_ID │
	│ │ └→ GPU access serialized via flock │
	│ │ 4. Record: append result to results.tsv │
	│ │ format: id score keep/discard "description" agent design time│
	│ │ 5. Reflect: append to shared telemetry files │
	│ │ 6. Repeat │
	│ │ │
	│ WRITES: │
	│ ├─ results.tsv ← append experiment result line │
	│ ├─ blackboard.md ← append findings/observations (shared) │
	│ ├─ MISTAKES.md ← what failed + why + lesson │
	│ ├─ DESIRES.md ← tools/context/capabilities agents wish for │
	│ ├─ LEARNINGS.md ← discovered facts about the domain │
	│ └─ workspace/agentN/train.py ← edited config (ephemeral) │
	└─────────────────────────────────────────────────────────────────┘

	---
	The Meta-Agent

	┌─────────────────────────────────────────────────────────────────┐
	│ META-AGENT (meta-loop.sh) │
	│ screen: rrma-meta │
	│ Role: observe + reflect (NEVER directs agents) │
	│ │
	│ Every N minutes: │
	│ ├─ refresh_context.py → update stoplight + recent_experiments │
	│ ├─ Read: stoplight.md, recent_experiments.md, best/config.yaml │
	│ ├─ Read: previous meta-blackboard.md (if exists) │
	│ └─ Claude (3 turns) → generate new meta-blackboard.md │
	│ │
	│ meta-blackboard.md contains (~120 lines max): │
	│ ├─ Current best + config │
	│ ├─ What works (ranked by impact) │
	│ ├─ Dead ends (grouped by category) │
	│ ├─ Patterns noticed (process-level) │
	│ ├─ Blind spots (never-tried approaches) │
	│ ├─ Stepping stones (non-winning but promising) │
	│ ├─ Surprises (expected vs actual) │
	│ ├─ Devil's advocate (why best score might be misleading) │
	│ └─ Self-reflection (compare to prior cycle) │
	│ │
	│ Written atomically (.tmp + mv) │
	└─────────────────────────────────────────────────────────────────┘

	---
	TrustLoop (Behavioral IDS)

	┌─────────────────────────────────────────────────────────────────┐
	│ trustloop_scorer.py │
	│ (Central Nervous System) │
	│ │
	│ INPUT: results.tsv, blackboard.md, MISTAKES.md, DESIRES.md, │
	│ LEARNINGS.md, traces (.jsonl) │
	│ │
	│ PRODUCES DomainReport: │
	│ ├─ Experiment Classification │
	│ │ BREAKTHROUGH │ INCREMENTAL │ PLATEAU │ REGRESSION │ CRASH │
	│ │ │
	│ ├─ Novelty Score (0-1 per experiment) │
	│ │ 70% description similarity + 30% design label match │
	│ │ │
	│ ├─ Agent Efficiency │
	│ │ success rate, waste ratio, best contribution per agent │
	│ │ │
	│ ├─ Redundancy Detection │
	│ │ near-duplicate configs flagged │
	│ │ │
	│ ├─ Anomaly Detection │
	│ │ crash streaks (3+), deep stagnation (30+ no breakthrough), │
	│ │ score jumps, resource waste │
	│ │ │
	│ ├─ Workflow Checks (14 checks) │
	│ │ agent diversity, blackboard usage, format validation │
	│ │ │
	│ ├─ Insight Extraction │
	│ │ winning strategies, dead ends, recurring mistakes, │
	│ │ unaddressed desires │
	│ │ │
	│ ├─ Telemetry Parsing │
	│ │ structured MISTAKES, DESIRES, LEARNINGS content │
	│ │ │
	│ └─ Action Items │
	│ owner: hitl\|gardener, layer: harness\|program\|agent\|scaffold │
	│ │
	│ CONSUMED BY: │
	│ ├─ diagnose.py → PQ score + decision logic │
	│ ├─ refresh_context.py → stoplight + recent_experiments │
	│ └─ trustloop_mcp.py → Claude Code inspection tools │
	└─────────────────────────────────────────────────────────────────┘

	---
	Process Quality & Decision Matrix

	┌─────────────────────────────────────────────────────────────────┐
	│ diagnose.py │
	│ │
	│ Process Quality (PQ) 0-30: │
	│ ├─ Papers cited? +3 (>3: +3 more) │
	│ ├─ Explanatory reasoning? +3 (>10: +3 more) │
	│ ├─ Ablations? +3 (>3: +3 more) │
	│ ├─ Simplifications? +3 │
	│ ├─ Design diversity? +3 (>5 unique) │
	│ ├─ Blackboard usage? +3 (>100 lines) │
	│ ├─ Desires written? +3 │
	│ └─ Learnings written? +3 (>5) │
	│ │
	│ Decision Matrix: │
	│ ┌──────────────┬──────────────────────────────────────────┐ │
	│ │ < 8 exps │ TOO_EARLY │ │
	│ │ PQ<10, >15 │ STOP_HACKING (rewrite program.md) │ │
	│ │ crash streak │ NUDGE (fix harness/config) │ │
	│ │ stagnation │ NUDGE (inject observation) │ │
	│ │ flat+PQ≥10 │ │ │
	│ │ +blind spots│ REDESIGN (change scaffold) │ │
	│ │ -blind spots│ STOP_DONE (search exhausted) │ │
	│ │ otherwise │ CONTINUE │ │
	│ └──────────────┴──────────────────────────────────────────┘ │
	│ │
	│ Output: decision string + .nudge_data.json │
	│ (gardener_fixes, dead_ends, tool_issues, dominant_axis) │
	└─────────────────────────────────────────────────────────────────┘

	---
	Domain File Layout

	domains/<domain>/
	├── config.yaml ← domain configuration
	├── run.sh ← harness: takes config → outputs score
	├── solve.py ← (some domains) the code agents edit
	│
	├── program_static.md ← IMMUTABLE rules (read once by agents)
	├── program.md ← DYNAMIC guidance (gardener rewrites)
	│
	├── blackboard.md ← shared append-only state (agents write)
	├── stoplight.md ← AUTO-GENERATED 30-line compressed state
	├── recent_experiments.md ← AUTO-GENERATED last 5 experiments
	├── meta-blackboard.md ← meta-agent reflections
	├── calibration.md ← literature search (gen 1)
	│
	├── results.tsv ← all experiment results (append-only)
	│ format: id score keep/discard "desc" agent design time
	│
	├── best/ ← current best configuration
	│ ├── train.py (or config.yaml)
	│ └── config_hash
	│
	├── workspace/ ← EPHEMERAL, gitignored
	│ ├── agent0/train.py ← agent 0's isolated copy
	│ ├── agent1/train.py ← agent 1's isolated copy
	│ └── agentN/train.py
	│
	├── logs/
	│ ├── agent0.jsonl ← full Claude conversation trace
	│ ├── agent1.jsonl
	│ └── agentN.jsonl
	│
	├── memory/ ← persistent domain memory (v4.7+)
	│
	├── DESIRES.md ← agent telemetry: what they wish for
	├── MISTAKES.md ← agent telemetry: structured failures
	├── LEARNINGS.md ← agent telemetry: discovered facts
	│
	└── .nudge_data.json ← diagnose.py output for gardener

	---
	Memory System (v4.7+)

	┌─────────────────────────────────────────────────────────────────┐
	│ memory_system.py │
	│ │
	│ Commands: │
	│ ├─ seed <domain> → create domain/memory/ if missing │
	│ ├─ scan <dir> → parse frontmatter + mtime → manifest │
	│ ├─ retrieve <dir> <q>→ Haiku picks top-5 relevant files │
	│ ├─ recall <dir> <q> → scan → retrieve → verify → load │
	│ └─ staleness <dir> → age report │
	│ │
	│ Staleness levels: │
	│ ├─ Fresh ≤1 day │
	│ ├─ Recent 1-7 days │
	│ ├─ Aging 7-30 days (wrapped with ⚠️ verify warning) │
	│ └─ Stale >30 days (wrapped with ⚠️ verify warning) │
	│ │
	│ Memory file format: │
	│ --- │
	│ name: finding_name │
	│ type: user\|feedback\|project\|reference │
	│ verify_against: results.tsv\|blackboard.md │
	│ claim: "the specific claim to verify" │
	│ --- │
	│ Content (max 30 lines) │
	└─────────────────────────────────────────────────────────────────┘

	---
	MCP Servers (optional, for Claude Code inspection)

	┌──────────────────────────┐ ┌──────────────────────────────┐
	│ rrma_mcp.py (read-only) │ │ trustloop_mcp.py (traces) │
	│ │ │ │
	│ Tools: │ │ Tools: │
	│ ├─ list_domains │ │ ├─ trustloop_status │
	│ ├─ domain_summary │ │ ├─ trustloop_agent │
	│ ├─ read_artifact │ │ │ (summary/thinking/ │
	│ ├─ query_results │ │ │ timeline modes) │
	│ └─ check_status │ │ ├─ trustloop_influence │
	│ │ │ └─ trustloop_compare │
	└──────────────────────────┘ └──────────────────────────────┘

	---
	v2 Legacy Components (still present)

	core/launch.sh ← v2 launcher (git worktrees, 3 agent designs)
	core/operator.sh ← v2 HITL controls:
	claim, request, direct, queue, ban, fact,
	hunch, strategy, pause, resume, repurpose

	---
	End-to-End Data Flow (one experiment)

	Agent reads stoplight.md → forms hypothesis
	→ edits workspace/agentN/train.py
	→ bash run.sh exp-name "description" design_type
	→ run.sh copies workspace config, runs training (flock for GPU)
	→ outputs score to stdout
	→ agent appends to results.tsv
	→ agent appends to blackboard.md (finding)
	→ agent appends to MISTAKES.md / LEARNINGS.md / DESIRES.md
	→ [N minutes later] refresh_context.py regenerates stoplight.md
	→ [N minutes later] meta-loop reads → updates meta-blackboard.md
	→ [N minutes later] diagnose.py → trustloop_scorer → decision
	→ outer-loop acts on decision (CONTINUE/NUDGE/REDESIGN/STOP)

	---
	taste.md — The Gardener's 11 Principles

	1. Less protocol = better science
	2. Config-tuning ≠ research (high score + low PQ = hacking)
	3. Simplification = maturity
	4. Plateau = mapping the basin, not failure
	5. Re-evaluate old failures post-breakthrough
	6. Plan on stagnation, not round count
	7. Watch axis lock-in (all agents same dimension)
	8. Confirmation across agents = confidence
	9. Low PQ + rising score = STOP_HACKING
	10. High PQ + flat + no blind spots = STOP_DONE
	11. High PQ + flat + blind spots = REDESIGN

	Updated automatically after each generation with new lessons learned.
Indicator	Points
Papers cited	+3 (>3 papers: +3 more)
Explanatory reasoning	+3 (>10 explanations: +3 more)
Ablations	+3 (>3 ablations: +3 more)
Simplifications	+3
Design diversity	+3 (>5 unique designs)
Blackboard usage	+3 (>100 lines)
Desires written	+3
Learnings written	+3 (>5 learnings)
Condition	Decision	Action
< 8 experiments	`TOO_EARLY`	Keep monitoring
PQ < 10, > 15 experiments	`STOP_HACKING`	Rewrite program.md (force papers, ablations, explanations)
Crash streak or scaffold desires	`NUDGE`	Inject observation + constraints
Stagnation without flatness	`NUDGE`	Inject observation + constraints
3+ nudges without progress	escalate	→ `REDESIGN`
Flat + PQ ≥ 10 + blind spots	`REDESIGN`	Diagnose scaffold block → minimal fix to program.md
Flat + PQ ≥ 10 + no blind spots	`STOP_DONE`	Re-evaluate; if unexplored dirs found → NUDGE; else finalize
Otherwise	`CONTINUE`	Keep monitoring
Old (v4.5)	New (v4.6)	Lines
`program.md` (monolithic 261 lines)	`program_static.md` (read once) + `program.md` (dynamic)	98 + 95
`blackboard.md` (627+ lines, re-read every cycle)	`stoplight.md` (30 lines, auto-refreshed)	43
grep results.tsv (growing)	`recent_experiments.md` (last 5, structured)	~30
Age	Level	Behavior
≤ 1 day	Fresh	Used as-is
1–7 days	Recent	Used as-is
7–30 days	Aging	Wrapped with verification warning
> 30 days	Stale	Wrapped with verification warning
Tool	Purpose
`rrma_list_domains()`	List all domains
`rrma_domain_summary(domain)`	Quick overview (config, results count, best score)
`rrma_read_artifact(domain, type)`	Read: blackboard, program, results, experiments, desires, learnings, mistakes, calibration, config
`rrma_query_results(domain, filters)`	Grep results.tsv
`rrma_check_status(domain)`	Active screens, file mtimes, artifact freshness
Tool	Purpose
`trustloop_status()`	Overview: agent count, steps, thinking blocks, tool calls, experiments, best score
`trustloop_agent(id, mode)`	Per-agent: summary, thinking blocks, timeline
`trustloop_influence()`	Cross-agent influence analysis
`trustloop_compare(ids)`	Side-by-side agent comparison
#	Principle
1	Less protocol = better science. Plain blackboard beats structured CLAIM/RESPONSE.
2	Config-tuning ≠ research. High scores from parameter hacking → low PQ → stop.
3	Simplification = maturity. Dropping complexity + higher scores = understanding.
4	Plateau = mapping the basin, not failure. Long plateaus with high PQ mean agents are mapping the search space.
5	Re-evaluate old failures post-breakthrough. Context changes flip what works.
6	Plan on stagnation, not round count. Trigger replanning at < 0.5% improvement for 15+ experiments.
7	Watch axis lock-in. If all agents explore one dimension + flat scores, make others visible.
8	Confirmation is a feature. Multiple agents confirming the same thing = confidence.
9	Low PQ + rising score = hacking. Stop and force real research.
10	High PQ + flat + no blind spots = done. Search exhausted.
11	High PQ + flat + blind spots = redesign. Scaffold is blocking exploration.
File	Purpose	Format
`DESIRES.md`	Tools, context, or capabilities agents wish they had	Free-form
`MISTAKES.md`	Experiments that failed	Structured: what / result / lesson
`LEARNINGS.md`	Discovered facts about the environment	Free-form
Machine	Specs	Role
nigel	RTX 4070 Ti SUPER 16GB, Ubuntu 24.04, torch 2.10.0+cu128	GPU experiment execution
Local Mac	M2 Pro 32GB	MCP servers, scoring, monitoring