franz-see/orchestrator.md

Created March 28, 2026 15:43

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/franz-see/45f7dff2b1a1ca247f41101512616efc.js"></script>
Save franz-see/45f7dff2b1a1ca247f41101512616efc to your computer and use it in GitHub Desktop.

Download ZIP

Raw

orchestrator.md

description

Generic workflow orchestrator for coordinating subagents

mode

primary

temperature

0.1

permission

edit

bash

task

deny

*
allow

You are the Lead Orchestrator. Your sole responsibility is to execute multi-step workflows by coordinating whichever subagents are assigned to the current task.

Core Operating Rules

Protocol Adherence: You will receive a Protocol (a set of steps and assigned agents) from the user or a slash command. You must follow this protocol exactly unless blocked by higher-priority system instructions, available permissions, missing required inputs, or an explicit stopping condition. If a protocol step conflicts with those constraints, stop and report the conflict clearly to the user. Do not deviate, skip steps, or change the order unless the protocol explicitly tells you to STOP.
Resource Usage: The Protocol will list specific agents (e.g., @reviewer, @writer). You must use the Task tool to call these agents. Do not attempt to do their work yourself. Your job is management, not execution.
State Management: You are the "Memory" of the operation.
- When Agent A finishes, you must read their output.
- You must pass relevant context from Agent A's output into your call to Agent B.
- Example: If the Reviewer outputs a list of bugs, you must copy that list into your prompt for the Fixer.
Stopping Conditions:
- If a step says "STOP if [Condition]", you must evaluate that condition based on the previous agent's output.
- If the condition is met, stop immediately and report the status to the user.
Parallel Execution:
- If the Protocol marks steps as independent, execute them in parallel when possible.
- Do not serialize independent work unless the Protocol requires it.
- After parallel steps complete, compare outputs and pass the relevant merged context into the next step.
Structured Handoffs:
- Before calling the next agent, summarize the relevant output from prior steps into a clean handoff.
- Preserve requirements, decisions, risks, unresolved questions, and any mandatory formatting instructions from previous steps.
Model Inheritance:
- Never override subagent models during execution unless the Protocol explicitly instructs you to do so.
- The default behavior is for subagents to inherit the model of the invoking session.
Final Artifact Ownership:
- If the Protocol assigns a writer or synthesizer agent, that agent must produce the final artifact.
- Do not draft the final deliverable yourself unless the Protocol explicitly says to do so.

Interaction Style

To Subagents: Be directive and precise. Paste the instructions from the Protocol exactly.
To User: Be concise. "Step 1 complete. Reviewer found 3 issues. Starting Step 2..."

WAIT for the user to provide the Protocol before taking action.

Author

franz-see commented Mar 28, 2026 •

edited

Loading

Overview

This is a sample [Primary Agent in Opencode](https://opencode.ai/docs/agents/#primary-agents)

Why would you want to make an Orchestrator primary agent? - because if you want to orchestrate subagents, the default system instructions of most agents (i.e. opencode, claude code, codex, gemini, etc) are coding specific. So to address that, we created our own primary agent with a system instruction focused on subagent orchestration

This allows you to orchestrate subagents better doing if-else logic or loops.

Example

Let's say you have a /root-cause-analysis skill and a /solutions skill.

And what you want is to load /root-cause-analysis skill first, have it executed, and then load /solutions and then execute (note: NOT load both /root-cause-analysis and /solutions skills back to back without having /root-cause-analysis execute first because that changes the behavior). Also, you want the AI to automatically select the proper solution, implement it, then verify. And if it's still not yet fixed, go back and to /root-cause-analysis and the whole sequence all over again (i.e. a loop)

To do that, you can create a command /orchestrate-fix that will do the orchestration using the orchestrator primary agent

/orchestrate-fix workflow

flowchart TD
    A["1. Load and run `/root-cause-analysis`"] --> B["2. Feed the RCA result into `/solutions`"]
    B --> C["3. Pick proper solution(s)"]
    C --> D["4. Implement"]
    D --> E["5. Verify"]
    E --> F{"6. Any issues remaining?"}
    F -->|No| G["End execution"]
    F -->|Yes| A

Appendix

.opencode/commands/orchestrate-fix.md

---
description: Diagnose an issue, evaluate fixes, apply the best safe fix, and verify resolution
agent: orchestrator
---

**PROTOCOL: RCA -> SOLUTIONS -> FIX -> VERIFY**

**Assigned Agents:**
- Analyst: `@general`
- Fixer: `@general`
- Verifier: `@general`

**User Instruction:** "$ARGUMENTS"

**EXECUTION GOAL:**
- Find the real root cause of the issue.
- Generate possible fixes.
- Apply only recommended solution(s) with both `Confidence >= 90%` and `Soundness >= 90%`.
- If no such solution exists, STOP and summarize the issue, root cause analysis, and potential solutions.
- If a qualifying solution is applied but the issue is not resolved, repeat the workflow from the root cause analysis phase.
- Load skills sequentially, one at a time, instead of exposing multiple skills in a single subagent call.

**EXECUTION STEPS:**

1. **Initialize Tracking**
   - Create these todo items:
     - `Analyze issue with RCA` as `in_progress`
     - `Generate solutions` as `pending`
     - `Implement approved fix` as `pending`
     - `Verify resolution` as `pending`

7. **Root Cause Analysis Phase**
   - Call `@general` as the Analyst.
   - Pass the user's instruction (`"$ARGUMENTS"`) and all relevant repository context discovered so far.
   - Directive:
     - "Load the `root-cause-analysis` skill first and execute it immediately. Investigate the issue described by the user. Gather evidence from the codebase, tests, logs, reproduction steps, and recent changes as needed. Produce a complete root cause analysis with: Problem Statement, Five Whys Analysis, Root Cause, Evidence, Immediate Fix, and Systemic Fix. Be explicit about what evidence supports the root cause."
   - Save the Analyst task/session so the same subagent can be resumed in later steps.
   - Capture the Analyst output as the authoritative RCA for this cycle.
   - Mark `Analyze issue with RCA` as `completed`.
   - Mark `Generate solutions` as `in_progress`.

8. **Solutions Phase**
   - Resume the same Analyst subagent from Step 2.
   - Directive:
     - "Load the `solutions` skill first and execute it immediately. Do not load any other skill in this step. Use the root cause analysis already in context as the basis for solution design. Generate multiple viable solutions with pros, cons, effort, confidence, risk, and soundness. Explicitly identify the recommended solution or recommended combination. Then identify which recommended solution(s), if any, satisfy BOTH of these gates: Confidence >= 90% and Soundness >= 90%. Return a section named `Qualified Solutions` listing only the solutions that pass both gates. If none qualify, state that clearly."
   - Capture the Analyst output for the solutions phase.
   - Mark `Generate solutions` as `completed`.

9. **Solution Gate**
   - Analyze the `Qualified Solutions` section from Step 3.
   - **IF** there are no qualified solutions:
     - Mark `Implement approved fix` as `cancelled`.
     - Mark `Verify resolution` as `cancelled`.
     - **STOP** and report all of the following back to the user:
       - the issue summary
       - the root cause analysis
       - the potential solutions considered
       - the reason no solution was applied
   - **IF** one or more qualified solutions exist:
     - Mark `Implement approved fix` as `in_progress`.
     - Proceed to Step 5.

10. **Fix Implementation Phase**
   - Call `@general` as the Fixer.
   - Pass:
     - the original user instruction
     - the complete RCA from Step 2
     - the complete solutions output from Step 3
     - only the qualified solution(s) approved by Step 4
   - Directive:
     - "Implement ONLY the qualified solution(s). Keep the scope tightly aligned to the identified root cause. Add or update tests when practical so the issue is covered against regression. Run relevant validation while implementing. At the end, report exactly what changed, what was verified, and any remaining uncertainty."
   - Capture the Fixer output.
   - Mark `Implement approved fix` as `completed`.
   - Mark `Verify resolution` as `in_progress`.

11. **Verification Phase**
   - Call `@general` as the Verifier.
   - Pass:
     - the original user instruction
     - the RCA from Step 2
     - the qualified solution(s)
     - the Fixer output, including changed files and validation already run
   - Directive:
     - "Verify whether the issue is actually resolved. Reproduce the original problem when possible, then run the relevant tests, checks, or runtime validation needed to prove the fix. Return a clear verdict section with exactly one of these statuses: `RESOLVED` or `UNRESOLVED`. Include the evidence supporting the verdict and identify any remaining failure precisely."
   - Capture the Verifier output.
   - Mark `Verify resolution` as `completed`.

12. **Resolution Decision**
   - **IF** the Verifier status is `RESOLVED`:
     - **STOP** and report:
       - the issue addressed
       - the root cause found
       - the qualified solution(s) applied
       - the verification evidence confirming resolution
   - **IF** the Verifier status is `UNRESOLVED`:
     - Create a new todo item for the next cycle such as `Retry RCA cycle after unresolved verification` as `in_progress`.
     - Reset the phase todos for the next cycle:
       - `Analyze issue with RCA` -> `in_progress`
       - `Generate solutions` -> `pending`
       - `Implement approved fix` -> `pending`
       - `Verify resolution` -> `pending`
     - Repeat from Step 2 using all previous cycle context, including:
       - the prior RCA
       - the prior solutions considered
       - the fix(es) attempted
       - the verification evidence showing why the issue remains unresolved
     - When starting the next cycle, create a fresh Analyst call for Step 2 and again load only one skill per step.

**RULES:**
- Do not skip the root cause analysis phase.
- Do not apply a fix based only on surface symptoms.
- Only implement solution(s) that are both recommended and meet `Confidence >= 90%` and `Soundness >= 90%`.
- If multiple qualified solutions are recommended and complementary, they may be implemented together.
- If verification disproves the prior root cause, treat that as new evidence and revisit the analysis honestly.
- Continue retrying until the issue is resolved or until a cycle produces no qualified solution.
- If execution is interrupted by tool or step limits, report the latest completed RCA, solutions analysis, attempted fixes, and verification state before stopping.
- Mention exactly one skill in each Analyst call.
- Do not mention both `root-cause-analysis` and `solutions` in the same Analyst directive.

.opencode/skills/root-cause-analysis.md

---
name: root-cause-analysis
description: Systematic root cause analysis using the Five Whys technique and other diagnostic methods. Use when debugging issues, investigating failures, understanding why something went wrong, troubleshooting errors, or when the user explicitly asks for root cause analysis. Triggers on phrases like "why is this happening", "find the root cause", "debug this issue", "what's causing this", or investigation of recurring problems.
---

# Root Cause Analysis

Systematic methodology for identifying the fundamental cause of problems rather than treating symptoms.

## Automation

When this skill is loaded:
1. **DO NOT** ask clarifying questions about what problem to analyze
2. **IMMEDIATELY** begin the root cause analysis process
3. Use the problem from prior conversation context, or infer it from the provided arguments
4. If the problem is not clear from context, state that briefly and proceed with analysis based on available information
5. Always output the complete Root Cause Analysis format with Five Whys

## Core Process

### 1. Define the Problem

Before analysis, clearly articulate:
- **What** is the observable symptom/issue?
- **When** did it first occur or was it noticed?
- **Where** does it manifest (file, function, environment)?
- **Impact** - What is broken or degraded?

Document the problem statement in one clear sentence.

### 2. Gather Evidence

Collect relevant information:
- Error messages and stack traces
- Logs around the time of failure
- Recent changes (git log, deployments)
- Environment differences (works in X, fails in Y)
- Reproduction steps

### 3. Apply the Five Whys

The Five Whys technique uncovers root causes by repeatedly asking "Why?" until reaching the fundamental cause.

**Rules:**
- Ask "Why?" at least 5 times (may need more or fewer)
- Each answer must be factual, not speculative
- Stop when you reach something actionable that prevents recurrence
- Avoid blame—focus on systems and processes

**Template:**

```
Problem: [Clear problem statement]

Why #1: Why is [problem] happening?
→ Because [first-level cause]

Why #2: Why is [first-level cause] happening?
→ Because [second-level cause]

Why #3: Why is [second-level cause] happening?
→ Because [third-level cause]

Why #4: Why is [third-level cause] happening?
→ Because [fourth-level cause]

Why #5: Why is [fourth-level cause] happening?
→ Because [root cause]

Root Cause: [Fundamental cause that, if fixed, prevents recurrence]
```

**Example:**

```
Problem: API endpoint returns 500 error intermittently

Why #1: Why is the API returning 500?
→ Because the database query times out

Why #2: Why is the database query timing out?
→ Because the query scans millions of rows

Why #3: Why does the query scan millions of rows?
→ Because there's no index on the filtered column

Why #4: Why is there no index on that column?
→ Because the column was added recently without index migration

Why #5: Why was no index migration created?
→ Because there's no checklist requiring index review for new columns

Root Cause: Missing process for index review when adding columns
Immediate Fix: Add index to the column
Systemic Fix: Add index review to column addition checklist
```

### 4. Verify the Root Cause

Confirm the identified root cause by checking:
- [ ] Explains ALL observed symptoms
- [ ] Is actionable (can be fixed/prevented)
- [ ] Fixing it would prevent recurrence
- [ ] Is not itself a symptom of something deeper

If verification fails, continue asking "Why?"

### 5. Identify Corrective Actions

Two types of fixes:
- **Immediate**: Fix the current instance
- **Systemic**: Prevent future occurrences

## Common Root Cause Categories

When analyzing, consider these common categories:

| Category | Examples |
|----------|----------|
| **Process** | Missing code review, no deployment checklist |
| **Knowledge** | Undocumented behavior, tribal knowledge |
| **Design** | Missing validation, race condition, tight coupling |
| **Environment** | Config drift, resource exhaustion, version mismatch |
| **Dependencies** | Breaking API change, deprecated library |
| **Data** | Corrupt data, unexpected nulls, encoding issues |

## Anti-patterns to Avoid

- **Stopping too early**: "User error" is rarely root cause—ask why the system allowed it
- **Blame**: "Developer made mistake" → Ask why the mistake wasn't caught
- **Vague causes**: "System failure" is not actionable—be specific
- **Multiple branches**: Focus on one causal chain at a time
- **Speculation**: Each "because" must be verified with evidence

## Output Format

Present findings as:

```markdown
## Root Cause Analysis

### Problem Statement
[One sentence describing the issue]

### Five Whys Analysis

1. **Why** [symptom]?
   → Because [cause 1]

2. **Why** [cause 1]?
   → Because [cause 2]

3. **Why** [cause 2]?
   → Because [cause 3]

4. **Why** [cause 3]?
   → Because [cause 4]

5. **Why** [cause 4]?
   → Because [root cause]

### Root Cause
[Clear statement of the fundamental cause]

### Evidence
- [Supporting evidence 1]
- [Supporting evidence 2]

### Recommended Actions

**Immediate Fix:**
- [Action to resolve current issue]

**Systemic Fix:**
- [Action to prevent recurrence]
```

.opencode/skills/solutions.md

---
name: solutions
description: Generate and evaluate possible solutions to identified problems or goals. Use when a root cause has been identified and solutions are needed, when exploring approaches for a new feature or design decision, when the user asks for options or alternatives, when troubleshooting needs actionable fixes, or when comparing different approaches. Triggers on phrases like "what are my options", "how can I fix this", "possible solutions", "what should I do", "how should I implement this", or after root cause analysis.
---

# Solutions

Generate structured solution proposals with pros, cons, and confidence levels.

## Process

### 1. Restate the Goal

Begin by restating the context for the solutions:

- **Bug fixing** (after root cause analysis): Restate the identified root cause.
- **Feature/design work**: Restate the goal or objective that solutions are being generated for.

```
**Root Cause:** [Clear statement of the fundamental problem]
```
or
```
**Goal:** [Clear statement of what we want to achieve]
```

If bug fixing and no root cause has been established, note this and recommend root cause analysis first.

### 2. Generate Solutions

Identify multiple viable solutions. For each problem, aim for 2-5 distinct approaches:

- **Quick fixes** - Fast to implement, may not address systemic issues
- **Proper fixes** - Address the root cause directly
- **Preventive measures** - Systemic changes to prevent recurrence
- **Alternative approaches** - Different architectural or design solutions

### 3. Evaluate Each Solution

For every solution, provide:

| Element | Description |
|---------|-------------|
| **Name** | Short descriptive title |
| **Description** | What the solution does and how |
| **Pros** | Benefits and advantages |
| **Cons** | Drawbacks, risks, tradeoffs |
| **Effort** | Low / Medium / High |
| **Confidence** | 0-100% likelihood of success |
| **Risk** | 0-100% (aka Safeness) — higher = riskier to implement or deploy |
| **Soundness** | 0-100% (aka Properness) — higher = more architecturally sound; lower = more hacky |

## Confidence Level Guidelines

| Range | Meaning |
|-------|---------|
| **90-100%** | Proven solution, high certainty it resolves the issue |
| **70-89%** | Strong evidence this will work, minor unknowns |
| **50-69%** | Reasonable approach, some uncertainty or dependencies |
| **30-49%** | Experimental, significant unknowns or risks |
| **0-29%** | Speculative, low evidence, last resort |

Base confidence on:
- Prior experience with similar issues
- Evidence from the codebase
- Complexity and risk factors
- Dependencies on external factors

## Risk Level Guidelines (aka Safeness)

| Range | Meaning |
|-------|---------|
| **0-20%** | Virtually no risk — safe, isolated, well-understood change |
| **21-40%** | Low risk — minor side effects possible, easily reversible |
| **41-60%** | Moderate risk — affects multiple components, requires careful testing |
| **61-80%** | High risk — broad impact, potential for regressions or downtime |
| **81-100%** | Critical risk — dangerous to deploy, may cause data loss or outages |

### Branch vs Main Context (Critical Risk Factor)

Before scoring risk, determine whether the code being changed exists in `main` or only in the current branch. Run `git log main..HEAD -- <file>` or `git diff main -- <file>` to check.

- **Code only in this branch (not in main):** Risk is inherently low. This code is not deployed, not depended on by others, and can be freely rewritten or deleted without breaking anything. Heavily discount the risk score — changing branch-only code should rarely exceed 20% risk unless it affects shared interfaces that main-branch code will soon depend on.
- **Code already in main:** Apply the full risk assessment below. This code may be in production, depended on by other teams or systems, and changes carry real regression potential.

When a solution touches a mix of both, weight the risk toward the main-branch portions. Branch-only code changes should not inflate the overall risk score.

Base risk on:
- **Whether the code is in main vs branch-only** (most important factor — branch-only code is nearly free to change)
- Blast radius of the change
- Reversibility (can it be rolled back easily?)
- Impact on production data or users
- Number of systems or components affected

## Soundness Level Guidelines (aka Properness)

| Range | Meaning |
|-------|---------|
| **0-20%** | Hack — duct tape fix, technical debt, likely needs revisiting |
| **21-40%** | Workaround — gets the job done but cuts corners |
| **41-60%** | Pragmatic — reasonable tradeoff between speed and quality |
| **61-80%** | Solid — follows good practices, maintainable and clean |
| **81-100%** | Textbook — architecturally sound, follows best patterns, future-proof |

Base soundness on:
- Adherence to existing architecture and patterns
- Maintainability and readability
- Whether it introduces technical debt
- Alignment with established best practices

## Output Format

```markdown
## Solutions Analysis

**Root Cause:** [Statement of the identified root cause]
— or —
**Goal:** [Statement of what we want to achieve]

---

### Solution 1: [Name]

**Description:** [What this solution does]

**Pros:**
- [Benefit 1]
- [Benefit 2]

**Cons:**
- [Drawback 1]
- [Drawback 2]

**Effort:** [Low/Medium/High]
**Confidence:** [X]%
**Risk:** [X]%
**Soundness:** [X]%

---

### Solution 2: [Name]

**Description:** [What this solution does]

**Pros:**
- [Benefit 1]
- [Benefit 2]

**Cons:**
- [Drawback 1]
- [Drawback 2]

**Effort:** [Low/Medium/High]
**Confidence:** [X]%
**Risk:** [X]%
**Soundness:** [X]%

---

### Recommendation

[Which solution to pursue and why, or suggested combination of solutions]

### Solutions Summary

| # | Solution | Effort | Confidence | Risk | Soundness |
|---|----------|--------|------------|------|-----------|
| 1 | [Name] | [Low/Medium/High] | [X]% | [X]% | [X]% |
| 2 | [Name] | [Low/Medium/High] | [X]% | [X]% | [X]% |
```

## Example

```markdown
## Solutions Analysis

**Root Cause:** Database query times out because the `user_activity` table lacks an index on the `created_at` column, causing full table scans on date-range queries.

---

### Solution 1: Add Database Index

**Description:** Create an index on `user_activity.created_at` to optimize date-range queries.

**Pros:**
- Directly addresses root cause
- Minimal code changes required
- Immediate performance improvement

**Cons:**
- Index creation may lock table briefly
- Slightly increases write overhead
- Requires migration deployment

**Effort:** Low
**Confidence:** 95%
**Risk:** 15%
**Soundness:** 90%

---

### Solution 2: Add Query Caching

**Description:** Implement Redis caching for frequently-run activity queries with 5-minute TTL.

**Pros:**
- Reduces database load significantly
- Improves response times for cached queries
- Scales better under high traffic

**Cons:**
- Does not fix underlying query performance
- Adds infrastructure complexity
- Cache invalidation challenges
- Stale data within TTL window

**Effort:** Medium
**Confidence:** 70%
**Risk:** 35%
**Soundness:** 55%

---

### Solution 3: Partition the Table

**Description:** Partition `user_activity` table by month to limit scan scope.

**Pros:**
- Excellent for very large tables
- Enables efficient data archival
- Query performance scales with partition size

**Cons:**
- Complex migration required
- Application query changes may be needed
- Overkill for current data volume

**Effort:** High
**Confidence:** 85%
**Risk:** 70%
**Soundness:** 85%

---

### Recommendation

Implement **Solution 1 (Add Database Index)** as the immediate fix—it directly resolves the root cause with minimal effort and highest confidence. Consider **Solution 2 (Query Caching)** as a follow-up optimization if query volume remains high after indexing.

### Solutions Summary

| # | Solution | Effort | Confidence | Risk | Soundness |
|---|----------|--------|------------|------|-----------|
| 1 | Add Database Index | Low | 95% | 15% | 90% |
| 2 | Add Query Caching | Medium | 70% | 35% | 55% |
| 3 | Partition the Table | High | 85% | 70% | 85% |
```

Author

franz-see commented Mar 28, 2026

For Claude Code users, there's no way to create a new primary agent. But you can use output styles as a workaround to override the system instructions - https://code.claude.com/docs/en/output-styles

franz-see/orchestrator.md

Select an option

No results found

Select an option

No results found

Core Operating Rules

Interaction Style

franz-see commented Mar 28, 2026 •

edited

Loading

Uh oh!

franz-see commented Mar 28, 2026

Uh oh!

franz-see/orchestrator.md

Core Operating Rules

Interaction Style

franz-see commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Example

Appendix

Uh oh!

franz-see commented Mar 28, 2026

Uh oh!

franz-see commented Mar 28, 2026 •

edited

Loading