Three distinct mental models have emerged in the PR #201 discussion for how stages, sandboxes, scripts, and agents relate to each other. They agree on many things but differ on where orchestration logic lives and what runs inside vs. outside the sandbox.
All three models agree on:
- Pre-scripts and post-scripts run outside the sandbox (post needs
git push,gh pr; pre may gather data the sandbox can't access). - The sandbox enforces network, filesystem, and process isolation.
- Tool/API servers run on the host and are made available to the sandbox.
- The workspace is created outside and mounted into the sandbox.
- One agent per sandbox invocation (no multi-agent
agents[]list in a single sandbox).
The sandbox-run command is the orchestrator. It reads a stage definition and executes the full lifecycle: pre-script, sandbox launch with main script, post-script. The main script runs inside the sandbox and can orchestrate multi-agent flows (e.g., a code-then-review loop) by invoking agent runtimes directly.
sequenceDiagram
participant E as Entrypoint
participant SR as sandbox-run
participant H as Host
participant S as Sandbox
participant A1 as Code Agent
participant A2 as Review Agent
E->>SR: sandbox-run code
SR->>SR: Read stage definition (stages/code/code.yaml)
SR->>H: Create workspace directory
SR->>H: Provision tool servers on host
rect rgb(240, 248, 255)
note right of SR: Outside sandbox
SR->>H: Run pre.sh (clone, checkout, token gen)
H-->>SR: Workspace prepared
end
rect rgb(255, 243, 224)
note right of SR: Inside sandbox
SR->>S: Launch sandbox (mount workspace, expose tool servers)
S->>S: Execute main.sh
loop Ralph loop (up to MAX_RETRIES)
S->>A1: Invoke code agent (agents/code.md)
A1-->>S: Code changes committed locally
S->>A2: Invoke review agent (agents/review.md)
A2-->>S: Review verdict (exit code)
alt Review passed
S-->>SR: Exit 0
else Review failed
S->>S: Feed review output back as context
end
end
S-->>SR: Exit code
end
rect rgb(240, 248, 255)
note right of SR: Outside sandbox
SR->>H: Run post.sh (git push, gh pr create)
H-->>SR: PR created
end
# stages/code/code.yaml
parameters:
timeout: 120
sandbox:
image: registry.access.redhat.com/ubi9/ubi:latest
firewall_file: firewall.yaml
filesystem:
workspace: /home/agent/workspace
readonly:
- /home/agent/.config
scripts:
pre: pre.sh # runs outside sandbox, on host
main: main.sh # runs inside sandbox
post: post.sh # runs outside sandbox, on host
env:
MODEL: claude-sonnet-4-20250514
MAX_RETRIES: 2
TIMEOUT_MINUTES: 90
# Files/dirs made available inside the sandbox
context:
agents: ../../agents- main.sh runs inside the sandbox. It has access to the mounted workspace, agent definitions (via
context), and tool servers exposed to the sandbox network. - sandbox-run is the single orchestration unit. One call to
sandbox-run codedoes everything: pre, sandbox+main, post. - Multi-agent flows are expressed in main.sh, a reviewed shell script inside the sandbox. The runner doesn't know about agent sequencing — that's the script's job.
- Pre/post scripts cannot see the sandbox internals — they operate on the workspace directory on the host before/after the sandbox runs.
The main script runs outside the sandbox and calls fullsend sandbox run for each agent invocation. Each call spins up a separate sandbox for a single agent. The main script is the orchestrator and has full host access.
sequenceDiagram
participant E as Entrypoint
participant M as main.sh (host)
participant SR as sandbox-run
participant S1 as Sandbox (code)
participant A1 as Code Agent
participant S2 as Sandbox (review)
participant A2 as Review Agent
E->>M: Execute main.sh for code stage
loop Ralph loop (up to MAX_RETRIES)
M->>SR: fullsend sandbox run code
SR->>SR: Read agent definition
SR->>S1: Launch sandbox
S1->>A1: Run code agent
A1-->>S1: Code changes
S1-->>SR: Exit code
SR-->>M: Result
M->>SR: fullsend sandbox run review
SR->>SR: Read agent definition
SR->>S2: Launch sandbox
S2->>A2: Run review agent
A2-->>S2: Review verdict
S2-->>SR: Exit code
SR-->>M: Result + feedback file
alt Review passed
M->>M: Break loop
else Review failed
M->>M: Feed /tmp/review-feedback.md to next iteration
end
end
M->>M: git push, gh pr create
#!/usr/bin/env bash
set -euo pipefail
FEEDBACK="/tmp/review-feedback.md"
for i in $(seq 1 "${MAX_RETRIES:-2}"); do
echo "=== iteration $i ==="
# Code agent in its own sandbox
fullsend sandbox run code \
${i -gt 1 && echo --extra-context "$FEEDBACK"}
# Review agent in its own sandbox
fullsend sandbox run review > "$FEEDBACK"
[[ $? -eq 0 ]] && exit 0
echo "Review failed, feeding back to code agent..."
done
exit 1- main.sh runs outside the sandbox, on the host. It has full access to host tools (
git push,gh pr, etc.). - Each
sandbox runis one agent, one sandbox. The sandbox-run command is simple — it just runs a single agent in a single sandbox. - Pre/post scripts may be absorbed into main.sh since main.sh already runs on the host and controls the full flow.
- The workspace is shared via the filesystem — each sandbox mounts the same workspace directory, so the review agent sees the code agent's changes.
- Orchestration logic is visible and reviewable in main.sh, but runs with full host privileges.
There is no main script. The runner does exactly one thing: run one agent in one sandbox. All orchestration — including code-then-review loops — is expressed in the CI pipeline definition (GitHub Actions, Tekton, GitLab CI). Validation loops are declared in the harness config and executed by the runner as a deterministic post-agent check.
sequenceDiagram
participant CI as CI Pipeline (Actions/Tekton)
participant R as Runner
participant H as Host
participant S as Sandbox
participant A as Agent
participant V as validate.sh
CI->>R: fullsend run triage (harness/triage.yaml)
R->>R: Read harness/triage.yaml
R->>H: Provision tool servers (gh-server, etc.)
R->>H: Run pre_script
rect rgb(255, 243, 224)
note right of R: Validation loop (in runner)
loop Up to max_iterations
R->>S: Launch sandbox
S->>A: Run triage agent
A-->>S: Result
S-->>R: Exit code + output
R->>V: Run validation script (deterministic)
alt Validation passed
V-->>R: Exit 0
else Validation failed
V-->>R: Exit 1 + feedback
R->>R: Append feedback to agent prompt
end
end
end
R->>H: Run post_script
R-->>CI: Exit code
note over CI: CI decides what to run next
CI->>R: fullsend run code (harness/code.yaml)
R->>S: Launch sandbox, run code agent
S-->>R: Done
R-->>CI: Exit code
alt Code succeeded
CI->>R: fullsend run review (harness/review.yaml)
R->>S: Launch sandbox, run review agent
S-->>R: Review verdict
R-->>CI: Exit code
end
# harness/triage.yaml
agent: agents/triage.md
policy: policies/triage-write.yaml
skills:
- skills/triage-coordination
- skills/detect-duplicates
tools_binaries:
- tools/claude
api_servers:
- api-servers/gh-server
pre_script: scripts/triage-pre.sh
post_script: scripts/triage-post.sh
validation_loop:
script: scripts/validate-triage.sh
max_iterations: 2
feedback_mode: append
runtime:
timeout_minutes: 30- The runner is maximally simple: one agent, one sandbox, one harness file. No multi-agent orchestration.
- Orchestration lives in CI, which already has mature sequencing, conditionals, and parallelism. A GitHub Actions workflow step runs
fullsend run code, and a subsequent step conditionally runsfullsend run review. - Validation loops are declarative, not scripted. The harness config says "run this deterministic check after the agent, retry N times with feedback." The runner handles the loop.
- Shared resources are first-class: policies, skills, tools, and API servers each live in their own top-level directory and are referenced by path from the harness YAML. No per-stage directory silos.
- No main.sh concept — the code-then-review loop is a CI concern, not a runner concern.
| Dimension | A: sandbox-run lifecycle | B: External main.sh | C: CI orchestrates |
|---|---|---|---|
| Where does orchestration live? | main.sh inside the sandbox | main.sh on the host | CI pipeline definition |
| What does sandbox-run do? | Full lifecycle: pre + sandbox(main) + post | Single agent, single sandbox | Single agent, single sandbox |
| Multi-agent loops | Shell script inside sandbox | Shell script outside sandbox | CI workflow steps or declarative validation_loop |
| Pre/post scripts | Explicit in stage YAML, run by sandbox-run | Absorbed into main.sh (or explicit) | Explicit in harness YAML, run by runner |
| main.sh runs where? | Inside sandbox | Outside sandbox (host) | N/A (no main.sh) |
| Who provisions tool servers? | sandbox-run, before launching sandbox | sandbox-run per invocation, or main.sh | Runner, before launching sandbox |
| Config file unit | Per-stage directory (stages/code/) | Per-agent or per-stage | Per-agent harness (harness/code.yaml) |
| Code-review loop expressed in | stages/code/main.sh | main.sh on host | GitHub Actions / Tekton workflow |
| Complexity budget | Runner is medium; main.sh can be complex | Runner is simple; main.sh is complex | Runner is simple; CI config may be complex |
-
Security of the orchestration loop. In Model A, main.sh runs inside the sandbox, so the loop logic is constrained by sandbox policy. In Model B, main.sh runs on the host with full privileges — a bug or injection in main.sh has broader blast radius. In Model C, CI pipelines are already hardened and access-controlled.
-
Passing context between agents. Models A and B share a workspace across agent invocations (Model A in-sandbox, Model B via host mount). Model C relies on CI artifacts or a shared workspace mount between separate runner invocations. How review feedback flows back to the code agent differs in each.
-
Portability across CI systems. Model C pushes orchestration into CI, meaning the loop logic must be reimplemented for each CI system (Actions, Tekton, GitLab CI). Models A and B keep orchestration in fullsend-owned scripts, portable across CI systems.
-
Validation loops vs. scripted loops. Model C's declarative
validation_loophandles the common case (deterministic check, retry with feedback) without custom scripting. Models A and B require the user to write the loop in main.sh. Are there orchestration patterns that don't fit the declarative model? -
Composability. Can these models be combined? For example, Model C's minimal runner + Model A's main.sh for stages that need custom orchestration beyond what CI or validation_loop can express.