Ralph Loop: A Persistent Multi-Agent Build Orchestrator for Hermes

What It Is

Ralph Loop is a Hermes skill that turns any PRD (Product Requirements Document) into an autonomous, self-correcting build pipeline. It implements a "Ralph Wiggum Loop" -- the same high-level task is executed repeatedly against the current repository state until a strict verification condition is met (<RALPH_DONE>).

Key invariant: Progress lives 100% in files and git -- never in conversation context. Each iteration starts with fresh context, eliminating context rot on 100+ phase projects.

Core Cycle

Read PRD.md + progress.json + git status
    ↓
Implement one atomic verifiable chunk (via subagent swarm)
    ↓
Run tests + linter + self-critique
    ↓
Git commit with descriptive message
    ↓
If verification passes and all phases done → <RALPH_DONE>
else → schedule next iteration

Architecture

State files (external, not in context):

PRD.md -- Numbered phases with checkboxes: 1. [ ] Setup, 2. [x] Core, etc.
progress.json -- Iteration counter, completed phases, last commit SHA, auto-improve counter
ralph_log.md -- Full history of every iteration with timestamps and results

Config files (per-project overrides):

~/.hermes/skills/ralph_orchestrator/config/agents.yaml -- Default agent definitions
<project>/.ralph/agents.yaml -- Project-specific agent overrides (adds roles like "graphics")

Configurable Agent System

Each agent has:

description / when_to_use -- Routing logic for which agent handles what
personality -- Injected into subagent system prompt
toolsets -- Which tools the subagent can access (e.g., ["terminal", "file", "web"])
llm_endpoint -- Model override (default: deepseek/deepseek-v3.2)
temperature -- Sampling temperature (coder: 0.7, tester: 0.3, reviewer: 0.5)
enabled -- Toggle agents on/off per-project

Default agents:

coder -- Implements the current phase
tester -- Writes and runs tests
reviewer -- Self-critique for correctness, security, PRD compliance

Custom agents can be added per-project. For example, a game project might add a "graphics" agent for sprite/texture work.

Execution Modes

Ralph has three execution modes, auto-detected at runtime:

Mode	Detection	Behavior
Real	`delegate_task` available in `hermes_tools`	Spawns actual AI subagents via `delegate_task`
Simulation	In sandbox but `delegate_task` unavailable	Creates placeholder files, always fails verification
Standalone	Not in sandbox (direct Python)	Simulation mode

Mode detection (real_mode/phase_detector.py):

Check for HERMES_RPC_SOCKET env var (set in execute_code sandbox)
Try from hermes_tools import delegate_task
If both succeed → real mode

For real subagents, run via Hermes CLI:

hermes -s ralph_orchestrator chat -q "run Ralph loop in /path/to/project"

The Python script alone (without Hermes) cannot spawn real subagents because delegate_task is not exposed in the execute_code sandbox by default.

Safety Systems

Sandbox modifications (code_execution_tool.py):

Added delegate_task to SANDBOX_ALLOWED_TOOLS (alongside terminal, read_file, etc.)
Added delegate_task stub to auto-generated hermes_tools.py in sandbox
Sandbox timeout: 900 seconds (15 min)
Max tool calls: 50 per script

Delegate task limits (delegate_tool.py):

MAX_CONCURRENT_CHILDREN = 10 (parallel subagents)
MAX_DEPTH = 2 (no recursive delegation: parent→child→grandchild blocked)
DEFAULT_MAX_ITERATIONS = 500 per subagent
Blocked tools for children: delegate_task, clarify, memory, send_message, execute_code, restart_chat

Safety manager (real_mode/safety_manager.py):

Tracks agent counts across nested delegations
Enforces 10 agents per script, 50 total across hierarchy
Singleton pattern for cross-invocation tracking

Self-Improvement

Every 10 successful iterations or on any failure, Ralph appends a lesson to its own SKILL.md:

## Lesson learned (iteration 42)
- Time: 2026-03-15T...
- Verification: PASS/FAIL
- Issues: (if any)
- Summary: Auto-generated improvement note

The orchestrator literally evolves based on its own execution experience.

Verification Checklist

Before accepting an iteration, ALL must pass:

Test suite runs without failures
Linter passes
All phases marked complete ([x])
Self-critique passes
Git working tree clean

If any check fails, the loop continues but does not commit.

Current Status

Working:

PRD parsing, progress tracking, git integration
Configurable agent system with per-project overrides
Phase detection and mode switching
Sandbox delegate_task integration (code_execution_tool patches applied)
Safety limits (concurrent agents, depth, iteration caps)
Self-improvement via SKILL.md lessons
Cron scheduling for continuous operation

Requires Hermes restart to activate:

Patches to code_execution_tool.py (adding delegate_task to SANDBOX_ALLOWED_TOOLS and hermes_tools stub generator)

Not yet tested end-to-end:

Full real-mode execution: PRD → real subagents → verification → commit → next phase

Files

~/.hermes/skills/ralph_orchestrator/
├── SKILL.md                          # Self-improving docs + lesson log
├── config/
│   └── agents.yaml                   # Default agent definitions
├── scripts/
│   └── ralph_orchestrator.py         # Main orchestrator (~1100 lines)
└── real_mode/
    ├── phase_detector.py             # Detect real/sim/standalone mode
    ├── safety_manager.py             # Agent count limits
    ├── real_subagent_swarm.py        # Bridges to delegate_task
    └── test_*.py                     # Integration tests

# Modified Hermes files (merged):
~/.hermes/hermes-agent/tools/
├── code_execution_tool.py            # Added delegate_task to SANDBOX_ALLOWED_TOOLS
└── delegate_tool.py                  # Subagent spawning (upstream)

Usage Examples

# Initialize a project
python3 ralph_orchestrator.py init "# My Project

## Phases
1. [ ] Setup project structure
2. [ ] Implement core algorithm
3. [ ] Write tests
4. [ ] Documentation"

# Run (simulation mode - for testing orchestration logic)
python3 ralph_orchestrator.py run

# Run (real mode - via Hermes CLI)
hermes -s ralph_orchestrator chat -q "run Ralph loop in /path/to/project"

# Schedule continuous operation
python3 ralph_orchestrator.py create-cron "every 10 minutes"

# Monitor
python3 ralph_orchestrator.py status
tail -f ralph_log.md

Error in user YAML: (<unknown>): mapping values are not allowed in this context at line 2 column 95

---
name: ralph_orchestrator
description: Ralph Wiggum Loop orchestrator for long-run autonomous development. Infinite loop: PRD.md + progress.json -> subagent swarm -> git commit -> verify -> repeat until <RALPH_DONE>. Self-improves.
category: autonomous-ai-agents
---

Ralph Orchestrator

Persistent Ralph Loop for 100+ phase projects. External state only (no context rot).

What is the Ralph Wiggum Loop?

A self-referential infinite loop that executes the same high-level task against the current repository state until a strict verification condition is met. Progress lives 100% in files/git (never in conversation context). Each iteration starts with fresh context to prevent context rot.

Core cycle:

Read PRD.md + progress.json + git status + test results
    ↓
Implement one atomic verifiable chunk
    ↓
Run tests + linter + self-critique
    ↓
Git commit with descriptive message
    ↓
If verification passes and all phases done:
    Output "<RALPH_DONE>" and exit
else:
    Schedule next iteration (repeat)

Features

✅ In-memory iteration state (PRD, progress, logs)
✅ Subagent swarm for parallel implementation/testing/review
✅ Trajectory compression to prevent context overflow
✅ Git integration with auto-commits
✅ Full verification: tests, linter, self-critique
✅ Max iteration cap (default 200) with pause/resume
✅ Auto-self-improvement: appends lessons every 10 successful iterations or on failure
✅ ralph_log.md with full iteration history
✅ Natural-language cron triggers

⚠️ IMPORTANT: Simulation Mode vs Real Execution

The Python script (ralph_orchestrator.py) runs in SIMULATION MODE by default because it cannot call delegate_task from the execute_code sandbox.

What Simulation Mode Does:

Creates placeholder files to test orchestration logic
Does NOT actually implement phases
Always fails verification to prevent false progress claims
Phases are never marked complete in simulation

For Real Implementation (with AI subagents):

# Run via Hermes CLI to enable delegate_task
hermes -s ralph_orchestrator chat -q "run Ralph loop in /path/to/project"

Why this separation exists:

delegate_task is not available in execute_code sandbox
Python script can only simulate orchestration logic
Real subagent delegation requires Hermes CLI execution

Quick Start

# 1. Initialize your project (creates PRD.md + progress.json)
cd your-project
python3 ~/.hermes/skills/ralph_orchestrator/scripts/ralph_orchestrator.py init "Your high-level PRD with numbered phases here"

# 2. Edit PRD.md to add proper phases with checkboxes:
#    1. [ ] Setup: Initialize project structure
#    2. [ ] Core: Implement main algorithm
#    3. [ ] Tests: Write unit tests

# 3. Run the Ralph loop (runs until done or max iterations)
python3 ~/.hermes/skills/ralph_orchestrator/scripts/ralph_orchestrator.py run

# 4. Monitor progress
python3 ~/.hermes/skills/ralph_orchestrator/scripts/ralph_orchestrator.py status

Command-Line Interface

ralph_orchestrator.py <command> [args] [options]

Commands:
  init "PRD content"    Create PRD.md and progress.json
  run                  Start the Ralph loop (infinite until done)
  status               Show current state and progress
  create-cron "schedule"  Print cron command for scheduled execution

Options:
  --repo PATH          Repository root (default: current directory)
  --max-iterations N   Maximum iterations before pause (default: 200)

Examples:
  python3 ralph_orchestrator.py init "# My Project\n\n## Phases\n1. [ ] Setup\n2. [ ] Implementation"
  python3 ralph_orchestrator.py run --max-iterations 500
  python3 ralph_orchestrator.py create-cron "every 10 minutes"

PRD.md Format

The PRD must contain numbered phases with checkboxes:

# Project PRD

## Phases

1. [ ] Setup: Initialize git repository, install dependencies, create structure
2. [ ] Core: Implement the main algorithm in src/main.py
3. [ ] API: Build REST endpoints in src/api.py
4. [ ] Tests: Write unit tests for core module (target 90% coverage)
5. [ ] Docs: Create README.md with usage examples
6. [ ] Deployment: Configure production environment

## Verification

- All tests must pass (pytest -q)
- Linter clean (flake8)
- No uncommitted changes
- Each phase must produce expected outputs (see phase descriptions)

Phases are parsed by regex: ^(\d+)\.\s+\[([ xX])\]\s+(.+?)(?::\s+(.+))?$

progress.json Structure

Created automatically. Tracks:

{
  "version": "1.0",
  "created_at": "2025-03-14T...",
  "iteration": 42,
  "phases": [...],
  "completed_phases": ["phase_1", "phase_2"],
  "logs": [...],
  "last_commit": "abc123...",
  "verification_passed": true,
  "auto_improve_counter": 4,
  "settings": {
    "max_iterations": 200,
    "subagent_parallel": true,
    "self_critique_enabled": true
  }
}

Subagent Swarm

During each iteration, the following subagents are spawned (if Hermes integration available):

coder: Implements the current phase
tester: Designs and runs tests for the implementation
reviewer: Self-critique for correctness, performance, security, PRD compliance

Subagents run in parallel (when possible) and their outputs are compressed if context exceeds 50%.

Verification Checklist

Before accepting an iteration, the following must pass:

Test suite runs without failures (run_tests())
Linter passes (run_linter())
All phases marked complete (all_phases_completed())
Self-critique passes (self_critique())
Git working tree clean (git status)

If any check fails, the loop continues (does not commit).

Git Integration

Auto-commits after each successful iteration
Commit format: Ralph iter N: completed phase X - title
Stores last commit SHA in progress.json
Requires git repository (initialized automatically if missing)

Auto-Improvement

After every 10 successful iterations or on any failure, the skill appends a lesson to its own SKILL.md:

## Lesson learned (iteration 42)
- Time: 2025-03-14T...
- Verification: PASS/FAIL
- Issues: (if any)
- Summary: Auto-generated improvement note

This is the self-improving aspect: the orchestration skill evolves based on its own execution experience.

Cron Scheduling

Schedule continuous operation until done:

# Natural language conversion to cron
python3 ralph_orchestrator.py create-cron "every 10 minutes"
# Output: */10 * * * * cd /path/to/project && python3 ~/.hermes/skills/ralph_orchestrator/scripts/ralph_orchestrator.py run

# Or use schedule_cronjob via Hermes
schedule_cronjob 'cd /path && python3 ~/.hermes/skills/ralph_orchestrator/scripts/ralph_orchestrator.py run' every 10m

The cron job will invoke a fresh agent session each time with no context carryover (as required by Ralph Loop). State is persisted in progress.json.

Logging

Full log to ralph_log.md:

## Iteration 42 [✓] - 2025-03-14T23:10:00Z

Iteration 42 results:
- coder: completed
- tester: completed
- reviewer: completed
✓ Verification passed
✓ Committed: [abc123] Ralph iter 42: completed phase 5 - Tests...

---

Also logs to stderr with timestamps.

Limits and Safety

Maximum iterations: 200 by default (configurable via --max-iterations)
If max reached, loop pauses and prints message. Re-run run to continue.
Each iteration sleeps 2 seconds between cycles to avoid tight loops
KeyboardInterrupt (Ctrl+C) saves state and exits cleanly
Git working tree is checked before commits; if dirty, iteration fails and retries

Architecture Notes

External state only: No conversation memory, no in-memory accumulation across iterations. Fresh state every time via RalphState.load().

Subagent delegation: The Python script cannot use delegate_task (not available in execute_code sandbox). For real subagents, run Ralph through Hermes chat: hermes -s ralph_orchestrator chat -q \"run Ralph loop in /path\". The script falls back to sequential simulation.

Trajectory compression: Summarizes previous iteration context to fit within token limits. Full history lives in ralph_log.md.

Self-critique: Customizable per-phase validation. Can specify expected_files in phase dict for existence checks.

Troubleshooting

No phases found: Ensure PRD.md has numbered phases with checkboxes like 1. [ ] Phase title

Subagents not spawning: delegate_task is not available in the execute_code sandbox. For real subagents, run Ralph through Hermes chat instead of the Python script. Use: hermes -s ralph_orchestrator chat -q \"run Ralph loop in /path/to/project\"

Git errors: Ensure .git exists and you have commit rights. Ralph auto-initializes git if missing (first run only).

Loop not exiting: Check that all phases are marked [x] and verification passes. Use status command.

Example Workflow

# Create a new project
mkdir my-project && cd my-project
echo "# Test Project" > README.md
git init && git add . && git commit -m "Initial"

# Initialize Ralph
python3 ~/.hermes/skills/ralph_orchestrator/scripts/ralph_orchestrator.py init "# Test PRD\n\n## Phases\n1. [ ] Add hello world\n2. [ ] Write tests\n3. [ ] Document"

# Edit PRD.md to flesh out details
$EDITOR PRD.md

# Start loop (runs in foreground)
python3 ralph_orchestrator.py run

# In another terminal, monitor
python3 ralph_orchestrator.py status
tail -f ralph_log.md

# When done, you'll see <RALPH_DONE> printed

Real Subagent Workflow (Hermes Chat)

The ralph_orchestrator.py script runs in the execute_code sandbox, which does not have access to delegate_task. For real subagent delegation, you must orchestrate Ralph through Hermes Agent directly:

Manual Orchestration Example

# 1. Parse PRD.md to find next incomplete phase
# 2. delegate_task for CODER: "Implement Phase X: ..."
# 3. delegate_task for TESTER: "Test Phase X implementation"
# 4. delegate_task for REVIEWER: "Review Phase X code"
# 5. Update progress.json with results
# 6. Git commit
# 7. Repeat for next phase

Using Hermes CLI

# Run Ralph coordination from within Hermes
hermes -s ralph_orchestrator chat -q "run Ralph loop in /path/to/project"

Current Limitations

delegate_task unavailable in sandbox: The hermes_tools module in execute_code only exposes basic tools (terminal, read_file, write_file, etc.), not delegate_task.
Hermes CLI hangs non-interactively: subprocess.run(["hermes", "chat", "-q", ...]) times out because Hermes expects interactive input.
Solution: Run Ralph coordination from within Hermes chat, not from standalone Python.

Workaround Until Fix

Use the Python script for PRD parsing and progress tracking, but manually delegate tasks:

# Initialize project
python3 ~/.hermes/skills/ralph_orchestrator/scripts/ralph_orchestrator.py init "Your PRD"

# Then for each phase, manually run:
hermes chat -q "delegate_task goal='Implement Phase 1...' context='...' toolsets=['terminal','file','web']"
# ... tester, reviewer, update progress.json, commit

Notes

Designed for massive projects with 100+ phases
Works with any language/toolchain (tests/linter are extensible)
Does not require Hermes integration but leverages it when available
For production use, consider adding custom phase_validators and CI integration

danieljue/RalphLoop Gist.md

Ralph Loop: A Persistent Multi-Agent Build Orchestrator for Hermes

What It Is

Core Cycle

Architecture

Configurable Agent System

Execution Modes

Safety Systems

Self-Improvement

Verification Checklist

Current Status

Files

Usage Examples

Ralph Orchestrator

What is the Ralph Wiggum Loop?

Features

⚠️ IMPORTANT: Simulation Mode vs Real Execution

Quick Start

Command-Line Interface

PRD.md Format

progress.json Structure

Subagent Swarm

Verification Checklist

Git Integration

Auto-Improvement

Cron Scheduling

Logging

Limits and Safety

Architecture Notes

Troubleshooting

Example Workflow

Real Subagent Workflow (Hermes Chat)

Manual Orchestration Example

Using Hermes CLI

Current Limitations

Workaround Until Fix

Notes