Stage orchestration models

Three distinct mental models have emerged in the PR #201 discussion for how stages, sandboxes, scripts, and agents relate to each other. They agree on many things but differ on where orchestration logic lives and what runs inside vs. outside the sandbox.

Common ground

All three models agree on:

Pre-scripts and post-scripts run outside the sandbox (post needs git push, gh pr; pre may gather data the sandbox can't access).
The sandbox enforces network, filesystem, and process isolation.
Tool/API servers run on the host and are made available to the sandbox.
The workspace is created outside and mounted into the sandbox.
One agent per sandbox invocation (no multi-agent agents[] list in a single sandbox).

Model A: Sandbox-run owns the full lifecycle (Ralph)

The sandbox-run command is the orchestrator. It reads a stage definition and executes the full lifecycle: pre-script, sandbox launch with main script, post-script. The main script runs inside the sandbox and can orchestrate multi-agent flows (e.g., a code-then-review loop) by invoking agent runtimes directly.

sequenceDiagram
    participant E as Entrypoint
    participant SR as sandbox-run
    participant H as Host
    participant S as Sandbox
    participant A1 as Code Agent
    participant A2 as Review Agent

    E->>SR: sandbox-run code
    SR->>SR: Read stage definition (stages/code/code.yaml)
    SR->>H: Create workspace directory
    SR->>H: Provision tool servers on host

    rect rgb(240, 248, 255)
        note right of SR: Outside sandbox
        SR->>H: Run pre.sh (clone, checkout, token gen)
        H-->>SR: Workspace prepared
    end

    rect rgb(255, 243, 224)
        note right of SR: Inside sandbox
        SR->>S: Launch sandbox (mount workspace, expose tool servers)
        S->>S: Execute main.sh

        loop Ralph loop (up to MAX_RETRIES)
            S->>A1: Invoke code agent (agents/code.md)
            A1-->>S: Code changes committed locally
            S->>A2: Invoke review agent (agents/review.md)
            A2-->>S: Review verdict (exit code)
            alt Review passed
                S-->>SR: Exit 0
            else Review failed
                S->>S: Feed review output back as context
            end
        end

        S-->>SR: Exit code
    end

    rect rgb(240, 248, 255)
        note right of SR: Outside sandbox
        SR->>H: Run post.sh (git push, gh pr create)
        H-->>SR: PR created
    end

Stage definition shape

# stages/code/code.yaml
parameters:
  timeout: 120

sandbox:
  image: registry.access.redhat.com/ubi9/ubi:latest
  firewall_file: firewall.yaml
  filesystem:
    workspace: /home/agent/workspace
    readonly:
      - /home/agent/.config

scripts:
  pre: pre.sh      # runs outside sandbox, on host
  main: main.sh    # runs inside sandbox
  post: post.sh    # runs outside sandbox, on host

env:
  MODEL: claude-sonnet-4-20250514
  MAX_RETRIES: 2
  TIMEOUT_MINUTES: 90

# Files/dirs made available inside the sandbox
context:
  agents: ../../agents

Key properties

main.sh runs inside the sandbox. It has access to the mounted workspace, agent definitions (via context), and tool servers exposed to the sandbox network.
sandbox-run is the single orchestration unit. One call to sandbox-run code does everything: pre, sandbox+main, post.
Multi-agent flows are expressed in main.sh, a reviewed shell script inside the sandbox. The runner doesn't know about agent sequencing — that's the script's job.
Pre/post scripts cannot see the sandbox internals — they operate on the workspace directory on the host before/after the sandbox runs.

Model B: Main script orchestrates from outside (Ascerra)

The main script runs outside the sandbox and calls fullsend sandbox run for each agent invocation. Each call spins up a separate sandbox for a single agent. The main script is the orchestrator and has full host access.

sequenceDiagram
    participant E as Entrypoint
    participant M as main.sh (host)
    participant SR as sandbox-run
    participant S1 as Sandbox (code)
    participant A1 as Code Agent
    participant S2 as Sandbox (review)
    participant A2 as Review Agent

    E->>M: Execute main.sh for code stage

    loop Ralph loop (up to MAX_RETRIES)
        M->>SR: fullsend sandbox run code
        SR->>SR: Read agent definition
        SR->>S1: Launch sandbox
        S1->>A1: Run code agent
        A1-->>S1: Code changes
        S1-->>SR: Exit code
        SR-->>M: Result

        M->>SR: fullsend sandbox run review
        SR->>SR: Read agent definition
        SR->>S2: Launch sandbox
        S2->>A2: Run review agent
        A2-->>S2: Review verdict
        S2-->>SR: Exit code
        SR-->>M: Result + feedback file

        alt Review passed
            M->>M: Break loop
        else Review failed
            M->>M: Feed /tmp/review-feedback.md to next iteration
        end
    end

    M->>M: git push, gh pr create

Main script shape (from Ascerra's comment)

#!/usr/bin/env bash
set -euo pipefail
FEEDBACK="/tmp/review-feedback.md"

for i in $(seq 1 "${MAX_RETRIES:-2}"); do
  echo "=== iteration $i ==="

  # Code agent in its own sandbox
  fullsend sandbox run code \
    ${i -gt 1 && echo --extra-context "$FEEDBACK"}

  # Review agent in its own sandbox
  fullsend sandbox run review > "$FEEDBACK"

  [[ $? -eq 0 ]] && exit 0
  echo "Review failed, feeding back to code agent..."
done
exit 1

Key properties

main.sh runs outside the sandbox, on the host. It has full access to host tools (git push, gh pr, etc.).
Each sandbox run is one agent, one sandbox. The sandbox-run command is simple — it just runs a single agent in a single sandbox.
Pre/post scripts may be absorbed into main.sh since main.sh already runs on the host and controls the full flow.
The workspace is shared via the filesystem — each sandbox mounts the same workspace directory, so the review agent sees the code agent's changes.
Orchestration logic is visible and reviewable in main.sh, but runs with full host privileges.

Model C: CI pipeline orchestrates, runner is minimal (Maruiz)

There is no main script. The runner does exactly one thing: run one agent in one sandbox. All orchestration — including code-then-review loops — is expressed in the CI pipeline definition (GitHub Actions, Tekton, GitLab CI). Validation loops are declared in the harness config and executed by the runner as a deterministic post-agent check.

sequenceDiagram
    participant CI as CI Pipeline (Actions/Tekton)
    participant R as Runner
    participant H as Host
    participant S as Sandbox
    participant A as Agent
    participant V as validate.sh

    CI->>R: fullsend run triage (harness/triage.yaml)
    R->>R: Read harness/triage.yaml
    R->>H: Provision tool servers (gh-server, etc.)
    R->>H: Run pre_script

    rect rgb(255, 243, 224)
        note right of R: Validation loop (in runner)
        loop Up to max_iterations
            R->>S: Launch sandbox
            S->>A: Run triage agent
            A-->>S: Result
            S-->>R: Exit code + output

            R->>V: Run validation script (deterministic)
            alt Validation passed
                V-->>R: Exit 0
            else Validation failed
                V-->>R: Exit 1 + feedback
                R->>R: Append feedback to agent prompt
            end
        end
    end

    R->>H: Run post_script
    R-->>CI: Exit code

    note over CI: CI decides what to run next

    CI->>R: fullsend run code (harness/code.yaml)
    R->>S: Launch sandbox, run code agent
    S-->>R: Done
    R-->>CI: Exit code

    alt Code succeeded
        CI->>R: fullsend run review (harness/review.yaml)
        R->>S: Launch sandbox, run review agent
        S-->>R: Review verdict
        R-->>CI: Exit code
    end

Harness definition shape (from Maruiz's comment)

# harness/triage.yaml
agent: agents/triage.md
policy: policies/triage-write.yaml
skills:
  - skills/triage-coordination
  - skills/detect-duplicates
tools_binaries:
  - tools/claude
api_servers:
  - api-servers/gh-server
pre_script: scripts/triage-pre.sh
post_script: scripts/triage-post.sh
validation_loop:
  script: scripts/validate-triage.sh
  max_iterations: 2
  feedback_mode: append
runtime:
  timeout_minutes: 30

Key properties

The runner is maximally simple: one agent, one sandbox, one harness file. No multi-agent orchestration.
Orchestration lives in CI, which already has mature sequencing, conditionals, and parallelism. A GitHub Actions workflow step runs fullsend run code, and a subsequent step conditionally runs fullsend run review.
Validation loops are declarative, not scripted. The harness config says "run this deterministic check after the agent, retry N times with feedback." The runner handles the loop.
Shared resources are first-class: policies, skills, tools, and API servers each live in their own top-level directory and are referenced by path from the harness YAML. No per-stage directory silos.
No main.sh concept — the code-then-review loop is a CI concern, not a runner concern.

Comparison

Dimension	A: sandbox-run lifecycle	B: External main.sh	C: CI orchestrates
Where does orchestration live?	main.sh inside the sandbox	main.sh on the host	CI pipeline definition
What does sandbox-run do?	Full lifecycle: pre + sandbox(main) + post	Single agent, single sandbox	Single agent, single sandbox
Multi-agent loops	Shell script inside sandbox	Shell script outside sandbox	CI workflow steps or declarative validation_loop
Pre/post scripts	Explicit in stage YAML, run by sandbox-run	Absorbed into main.sh (or explicit)	Explicit in harness YAML, run by runner
main.sh runs where?	Inside sandbox	Outside sandbox (host)	N/A (no main.sh)
Who provisions tool servers?	sandbox-run, before launching sandbox	sandbox-run per invocation, or main.sh	Runner, before launching sandbox
Config file unit	Per-stage directory (stages/code/)	Per-agent or per-stage	Per-agent harness (harness/code.yaml)
Code-review loop expressed in	stages/code/main.sh	main.sh on host	GitHub Actions / Tekton workflow
Complexity budget	Runner is medium; main.sh can be complex	Runner is simple; main.sh is complex	Runner is simple; CI config may be complex

Trade-offs to resolve

Security of the orchestration loop. In Model A, main.sh runs inside the sandbox, so the loop logic is constrained by sandbox policy. In Model B, main.sh runs on the host with full privileges — a bug or injection in main.sh has broader blast radius. In Model C, CI pipelines are already hardened and access-controlled.
Passing context between agents. Models A and B share a workspace across agent invocations (Model A in-sandbox, Model B via host mount). Model C relies on CI artifacts or a shared workspace mount between separate runner invocations. How review feedback flows back to the code agent differs in each.
Portability across CI systems. Model C pushes orchestration into CI, meaning the loop logic must be reimplemented for each CI system (Actions, Tekton, GitLab CI). Models A and B keep orchestration in fullsend-owned scripts, portable across CI systems.
Validation loops vs. scripted loops. Model C's declarative validation_loop handles the common case (deterministic check, retry with feedback) without custom scripting. Models A and B require the user to write the loop in main.sh. Are there orchestration patterns that don't fit the declarative model?
Composability. Can these models be combined? For example, Model C's minimal runner + Model A's main.sh for stages that need custom orchestration beyond what CI or validation_loop can express.

ralphbean/stage-orchestration-models.md

Select an option

No results found

Select an option

No results found

Stage orchestration models

Common ground

Model A: Sandbox-run owns the full lifecycle (Ralph)

Stage definition shape

Key properties

Model B: Main script orchestrates from outside (Ascerra)

Main script shape (from Ascerra's comment)

Key properties

Model C: CI pipeline orchestrates, runner is minimal (Maruiz)

Harness definition shape (from Maruiz's comment)

Key properties

Comparison

Trade-offs to resolve