Skip to content

Instantly share code, notes, and snippets.

@clsandoval
Created January 16, 2026 08:42
Show Gist options
  • Select an option

  • Save clsandoval/96d32ede467d365f2f6b660fb74320bc to your computer and use it in GitHub Desktop.

Select an option

Save clsandoval/96d32ede467d365f2f6b660fb74320bc to your computer and use it in GitHub Desktop.
Decision AI Competitive Analysis v3 - Updated with Multi-Platform Strategy (KEY USP)

Competitor Overview

Comprehensive analysis of 26 competitor repositories analyzed for Decision AI product positioning


Executive Summary

This document provides a structured overview of all competitors analyzed during our research phase. Each competitor is categorized by market segment, with detailed profiles including value propositions, target audiences, key features, and user journey diagrams.

Key Finding: The market is fragmented across multiple niches. No single competitor addresses our full vision of unified context across platforms + modular AI sessions (deployable Claude instances) + trust-focused data science. This represents our blue ocean opportunity.


CURRENT STATE vs FUTURE VISION

CRITICAL: This section clearly distinguishes what EXISTS today versus what is PLANNED for the future.

What EXISTS Today (January 2025)

Component Status Description
Discord Bot IMPLEMENTED Primary user interface
Workflow Executor IMPLEMENTED Claude API with workflow_tools
Fly.io Deployment IMPLEMENTED Dynamic machine creation via fly_app_tools
ACP Protocol IMPLEMENTED Inter-session communication via SSE
Session Templates IMPLEMENTED Supabase database records
Builder Claude IMPLEMENTED Containerization service with meta-skills

What is PLANNED (Future Vision)

Component Status Description
Decision Packs PLANNED GitHub repos as deployable units with pack.yaml manifests
Pack Registry PLANNED Searchable index of available packs
Pack Marketplace PLANNED Web UI for discovery and deployment
Voice Sessions PLANNED Hands-free Discord voice interaction
Memory Layer PLANNED Persistent cross-session memory

Multi-Platform Frontend Strategy: KEY USP

CRITICAL DIFFERENTIATOR: Decision AI is designed from the ground up for multi-platform deployment. While most competitors are locked to a single interface (Slack-only, Discord-only, Web-only), our architecture wraps the Claude Agent SDK in a standardized protocol layer that enables platform-agnostic operation.

The Protocol Layer Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MULTI-PLATFORM STRATEGY (KEY USP)                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                    PROTOCOL LAYER                                    β”‚  β”‚
β”‚   β”‚         (Standardized Contract wrapping Claude Agent SDK)           β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                              β”‚                                              β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”‚
β”‚           β”‚                  β”‚                  β”‚                          β”‚
β”‚           β–Ό                  β–Ό                  β–Ό                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚   β”‚ DISCORD       β”‚  β”‚ SLACK         β”‚  β”‚ WEB/MOBILE    β”‚                 β”‚
β”‚   β”‚ ADAPTER       β”‚  β”‚ ADAPTER       β”‚  β”‚ ADAPTER       β”‚                 β”‚
β”‚   β”‚ (CURRENT)     β”‚  β”‚ (PLANNED)     β”‚  β”‚ (PLANNED)     β”‚                 β”‚
β”‚   β”‚               β”‚  β”‚               β”‚  β”‚               β”‚                 β”‚
β”‚   β”‚ β€’ Text msgs   β”‚  β”‚ β€’ Slack API   β”‚  β”‚ β€’ REST/WS API β”‚                 β”‚
β”‚   β”‚ β€’ Voice chans β”‚  β”‚ β€’ Thread sync β”‚  β”‚ β€’ React/Nativeβ”‚                 β”‚
β”‚   β”‚ β€’ Threads     β”‚  β”‚ β€’ Slash cmds  β”‚  β”‚ β€’ OAuth flows β”‚                 β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚           β”‚                  β”‚                  β”‚                          β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β”‚
β”‚                              β”‚                                              β”‚
β”‚                              β–Ό                                              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                    DECISION ORCHESTRATOR                             β”‚  β”‚
β”‚   β”‚    (Same Claude sessions, same tools, same Decision Packs)          β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                             β”‚
β”‚   KEY INSIGHT: The intelligence layer is completely decoupled from          β”‚
β”‚   the interface layer. Add new platforms without changing core logic.       β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Competitive Comparison: Platform Support

Competitor Discord Slack Web Mobile API Architecture
Decision AI βœ… Current πŸ”œ Planned πŸ”œ Planned πŸ”œ Planned βœ… ACP Platform-agnostic protocol
Dust.tt ❌ βœ… Only ❌ ❌ Limited Slack-coupled
Clawdbot βœ… βœ… ❌ ❌ ❌ Desktop-first, adapters
Runbear ❌ βœ… ❌ ❌ ❌ Slack/Teams only
ChatGPT ❌ ❌ βœ… βœ… βœ… Web/Mobile-coupled
Claude.ai ❌ βœ… Enterprise βœ… βœ… βœ… Native integrations

Why This Matters

Audience Their Preferred Platform Decision AI's Approach
Enterprise teams Slack (existing workflows) Future Slack adapter with same Decision Packs
Developer communities Discord (already there) Current Discord implementation
Consumers Mobile apps Future iOS/Android with same session capabilities
API users Direct integration ACP protocol already supports this
Data scientists Web notebooks Future web UI with same underlying sessions

Protocol Layer Components

The standardized contract wrapping Claude Agent SDK includes:

  1. Message Normalization: All platforms send/receive in a common format
  2. Session Abstraction: Sessions are platform-agnostic (same session accessible from Discord OR Slack)
  3. Context Preservation: Conversation state persists across platform switches
  4. Tool Mapping: Platform-specific UI elements (buttons, threads, reactions) map to common tool calls
  5. Authentication Bridge: Platform OAuth flows map to unified user identities

What This Enables (Future Vision)

User starts conversation on Discord voice during commute
        β”‚
        β–Ό
Switches to Slack when arriving at office (same session continues)
        β”‚
        β–Ό
Opens web dashboard for detailed data exploration (same context)
        β”‚
        β–Ό
Reviews insights on mobile app during meeting (same Decision Pack)

This is a BLUE OCEAN opportunity. No competitor offers this level of platform flexibility with intelligent AI sessions.


Builder as Claude Factory: Our Key Differentiator

What makes Decision AI unique: Builder Claude doesn't just containerize codeβ€”it constructs entire intelligent environments.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BUILDER AS CLAUDE FACTORY (CURRENT)                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   INPUT: User's code repository                                              β”‚
β”‚          (any framework: Marimo, Streamlit, FastAPI, etc.)                  β”‚
β”‚                                                                             β”‚
β”‚   BUILDER CLAUDE CONSTRUCTS:                                                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  1. Docker image with user's code                                    β”‚   β”‚
β”‚   β”‚  2. .claude/ directory with:                                         β”‚   β”‚
β”‚   β”‚     β”œβ”€β”€ CLAUDE.md (execution rules + purpose)                        β”‚   β”‚
β”‚   β”‚     └── skills/ (generated from repo analysis)                       β”‚   β”‚
β”‚   β”‚  3. ACP server for communication                                     β”‚   β”‚
β”‚   β”‚  4. GitHub session repo (source of truth)                            β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚   OUTPUT: Complete Claude Code environment deployed on Fly.io               β”‚
β”‚                                                                             β”‚
β”‚   KEY INSIGHT: Each build = complete Claude Code environment                β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

User Intent Priority (Decision Flow)

Does repo have .claude/?
β”œβ”€β”€ YES β†’ Inherit/merge (user customizations WIN)
β”‚         - Preserve existing skills, hooks, CLAUDE.md
β”‚         - Merge base execution rules
β”‚         - Add missing infrastructure skills
β”‚
└── NO β†’ Did user specify skill preferences?
    β”œβ”€β”€ YES β†’ Follow their guidance exactly
    β”‚
    └── NO β†’ Generate from scratch using meta-skills:
             a. Analyze repo (dependencies, code patterns, purpose)
             b. Detect framework (Marimo, Streamlit, FastAPI, etc.)
             c. Generate domain-specific skills
             d. Create CLAUDE.md with execution rules

Git as Source of Truth

Each session gets its own GitHub repository. This replaces container-as-artifact with git as the unit of reproducibility.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    GIT AS SOURCE OF TRUTH (CURRENT)                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Session Repo Pattern:                                                      β”‚
β”‚   β€’ New GitHub repo: github.com/org/session-mmm-{hex}                       β”‚
β”‚   β€’ All changes tracked: git add -A && git commit after work                β”‚
β”‚   β€’ Versioning via tags: git tag "template/my-analysis-v1"                  β”‚
β”‚   β€’ Full history: Browsable on GitHub, diffable                             β”‚
β”‚                                                                             β”‚
β”‚   Benefits:                                                                  β”‚
β”‚   β€’ Transparency: Readable source code, not opaque binary images            β”‚
β”‚   β€’ Reproducibility: git clone --branch tag = exact state                   β”‚
β”‚   β€’ Shareability: Link to GitHub repo = shareable, forkable                 β”‚
β”‚   β€’ Auditable: Every change logged with timestamps, diffs                   β”‚
β”‚                                                                             β”‚
β”‚   Build Result Format:                                                       β”‚
β”‚   {                                                                          β”‚
β”‚     "status": "complete",                                                   β”‚
β”‚     "app_name": "template-my-thing",                                        β”‚
β”‚     "image_ref": "registry.fly.io/template-my-thing:v1",                    β”‚
β”‚     "git_repo": "github.com/org/template-my-thing",                         β”‚
β”‚     "git_ref": "snapshot/v1"                                                β”‚
β”‚   }                                                                          β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Current Decision Orchestrator Architecture

What ACTUALLY exists today:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DECISION ORCHESTRATOR - CURRENT STATE                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Discord Bot (Primary Interface)                                            β”‚
β”‚   └── User sends message                                                     β”‚
β”‚       └── Workflow Executor runs with Claude API                             β”‚
β”‚           └── Uses workflow_tools:                                           β”‚
β”‚               β”œβ”€β”€ CUSTOM_FLY_LAUNCH_BUILDER (spawn builder session)         β”‚
β”‚               β”œβ”€β”€ CUSTOM_FLY_LAUNCH_SESSION (launch from template)          β”‚
β”‚               β”œβ”€β”€ CUSTOM_ACP_SEND_MESSAGE (talk to sessions)                β”‚
β”‚               β”œβ”€β”€ CUSTOM_FLY_STOP_SESSION (destroy session)                 β”‚
β”‚               β”œβ”€β”€ CUSTOM_FLY_GET_SESSION_STATUS (check health)              β”‚
β”‚               β”œβ”€β”€ CUSTOM_FLY_LIST_SESSIONS (inventory)                      β”‚
β”‚               β”œβ”€β”€ CUSTOM_FLY_LIST_TEMPLATES (available templates)           β”‚
β”‚               β”œβ”€β”€ CUSTOM_FLY_SAVE_TEMPLATE (persist as reusable)            β”‚
β”‚               └── human_interaction_tools (wait_for_human_decision)         β”‚
β”‚                                                                             β”‚
β”‚   Builder Sessions (Ephemeral)                                               β”‚
β”‚   └── Name pattern: mmm-builder-{hex}                                        β”‚
β”‚       └── Lifetime: ~5-30 minutes                                            β”‚
β”‚       └── Purpose: Clone, analyze, generate .claude/, build, deploy          β”‚
β”‚       └── Death: After deployment (cleanup via CUSTOM_FLY_STOP_SESSION)     β”‚
β”‚                                                                             β”‚
β”‚   User Work Sessions (Persistent)                                            β”‚
β”‚   └── Name pattern: mmm-{hex}                                                β”‚
β”‚       └── Lifetime: User determines (hours to days)                          β”‚
β”‚       └── Purpose: Interactive analysis, experimentation                     β”‚
β”‚       └── Environment: Framework runtime + Claude Agent SDK/ACP             β”‚
β”‚                                                                             β”‚
β”‚   Templates (Supabase: session_templates table)                              β”‚
β”‚   └── Columns: name, system_prompt, tools_config, mcp_config, metadata      β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Decision AI-Specific Tool Inventory (CURRENT)

Builder Lifecycle Tools

Tool Purpose Returns
CUSTOM_FLY_LAUNCH_BUILDER Spawn ephemeral builder session app_name, acp_url, build_id
CUSTOM_ACP_SEND_MESSAGE Send instructions to builder/session Claude's response
CUSTOM_FLY_STOP_SESSION Destroy session success/error

Session Lifecycle Tools

Tool Purpose Returns
CUSTOM_FLY_LAUNCH_SESSION Launch from pre-saved template app_name, urls, session_repo
CUSTOM_FLY_GET_SESSION_STATUS Check session health status, urls, session_repo
CUSTOM_FLY_LIST_SESSIONS Inventory active sessions session list

Template Management Tools

Tool Purpose Returns
CUSTOM_FLY_LIST_TEMPLATES Show available templates template list
CUSTOM_FLY_SAVE_TEMPLATE Persist build as reusable template success + metadata
CUSTOM_FLY_DELETE_TEMPLATE Remove saved template success/error

What These Tools DON'T Do:

  • No predefined Dockerfile schemasβ€”Builder Claude generates them
  • No rigid skill parametersβ€”Claude-in-Builder decides what to use
  • No forced framework choicesβ€”Claude analyzes and detects

Market Landscape Map

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              COMPETITIVE LANDSCAPE 2025                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                                              β”‚
β”‚  ORCHESTRATION & FRAMEWORKS                    OBSERVABILITY & EVALUATION                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚   CrewAI     β”‚  β”‚  LangGraph   β”‚           β”‚   Langfuse   β”‚  β”‚  Braintrust  β”‚           β”‚
β”‚  β”‚  Multi-Agent β”‚  β”‚  Stateful    β”‚           β”‚  LLM Tracing β”‚  β”‚   AI Evals   β”‚           β”‚
β”‚  β”‚  Tool Mgmt   β”‚  β”‚  Workflows   β”‚           β”‚  & Prompts   β”‚  β”‚  Regression  β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                                                                              β”‚
β”‚  PLATFORMS & BUILDERS                          MEMORY & KNOWLEDGE                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚   LLMStack   β”‚  β”‚  VoltAgent   β”‚           β”‚  ChatMemory  β”‚  β”‚    Glean     β”‚           β”‚
β”‚  β”‚   No-Code    β”‚  β”‚  TypeScript  β”‚           β”‚  4-Tier Mem  β”‚  β”‚  Enterprise  β”‚           β”‚
β”‚  β”‚   Builder    β”‚  β”‚  Full-Stack  β”‚           β”‚  Hierarchy   β”‚  β”‚  Knowledge   β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                                                                              β”‚
β”‚  CHAT & MULTI-PLATFORM                         MARKETING ANALYTICS (MMM)                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚   Dust.tt    β”‚  β”‚   Clawdbot   β”‚           β”‚   PyMC-      β”‚  β”‚   Google     β”‚           β”‚
β”‚  β”‚  Slack AI    β”‚  β”‚  8-Platform  β”‚           β”‚  Marketing   β”‚  β”‚   LWMMM      β”‚           β”‚
β”‚  β”‚  Assistants  β”‚  β”‚  Personal AI β”‚           β”‚  Bayesian    β”‚  β”‚              β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                                                                              β”‚
β”‚  TEMPLATE & DEPLOYMENT PLATFORMS (Inspiration for Future Pack System)                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚  β”‚   Replit     β”‚  β”‚   Railway    β”‚  β”‚   Render     β”‚  β”‚   Vercel     β”‚                     β”‚
β”‚  β”‚  Templates   β”‚  β”‚  Templates   β”‚  β”‚  Blueprints  β”‚  β”‚  Templates   β”‚                     β”‚
β”‚  β”‚  Full Proj   β”‚  β”‚  One-Click   β”‚  β”‚  IaC Deploy  β”‚  β”‚  Frontend    β”‚                     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚
β”‚                                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Gap Analysis: Our Opportunity

What NO Competitor Does

Gap Description Our Approach
Multi-Platform Protocol No one has a platform-agnostic architecture with standardized contract Protocol layer wrapping Claude Agent SDK enables any frontend
Builder as Claude Factory No one generates complete Claude environments from repo analysis Meta-skills construct .claude/ dynamically
Git as Source of Truth Competitors use opaque container images Session repos track all changes via git
Unified Context No one offers context continuity across Discord/Slack/Teams/CLI Interface Primitives + Shared Memory (PLANNED)
Domain-Specific Evals No one has insight recovery benchmarks for analytics MMM Insight Recovery Experiments
Trust-First Data Science No one combines Bayesian causal + LLM + governance Trust Differentiators
ACP Protocol No standard for inter-Claude communication Our implemented protocol

What We Should Adopt

Pattern From Why
ToolCollection CrewAI Best-in-class tool management
Thread-as-Boundary Dust.tt Essential for chat context
Statistical Evals Braintrust Right approach to AI testing
4-Tier Memory ChatMemory Complete hierarchy
Manifest Format Awesome Skills Proven skill structure
Streaming Progress Replit Great deploy UX
Bayesian Foundation PyMC-Marketing Trust through uncertainty

Architectural Philosophy: Embodied vs Puppeteer

Repo2Run Pattern (Puppeteer)

🧠 (External LLM) ─────► πŸ“¦ (Dumb container)
- LLM remote-controls container
- Container has no intelligence
- Intelligence only during build time
- After build: container is static code

Our Approach (Embodied)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 🧠 (Claude INSIDE)   β”‚
β”‚ πŸ“¦ (Container/body)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
- Claude inhabits the container
- Container is Claude's body
- Intelligence at runtime
- Interactive collaboration with user

Trade-offs

Aspect Repo2Run Decision AI
External rollback Excellent Requires orchestrator
Deterministic outputs Yes No (but adaptive)
Runtime adaptation No Yes
Domain expertise None Skills loaded in session
User collaboration None Interactive
Multi-repo composition Hard Flexible merging
Framework support Python-only Framework-agnostic

Complete Competitor List

# Competitor Category What They Do (One-Liner)
1 CrewAI Framework Multi-agent orchestration with role-based collaboration and tool collections
2 LangGraph Framework Stateful graph-based workflows for LLM applications
3 Swarm Framework Lightweight multi-agent handoffs (educational, by OpenAI)
4 Claude-Flow Framework Enterprise multi-agent swarms with neural learning and MCP
5 AutoGen Framework Microsoft's multi-agent conversation framework
6 Pydantic-AI Framework Type-safe Python agents with structured outputs
7 VoltAgent Platform TypeScript full-stack agent framework with VoltOps observability
8 LLMStack Platform No-code visual builder for AI agents and workflows
9 BotSharp Platform .NET/C# agent framework with plugin architecture
10 Composio SDK 500+ app integrations for AI agents
11 Langfuse Observability LLM tracing, prompt management, and evaluation
12 Braintrust Observability Statistical AI evaluation with regression detection
13 AgentOps Observability Agent session replay and cost tracking
14 ChatMemory Memory 4-tier hierarchical memory for AI assistants
15 Glean Knowledge Enterprise permission-aware knowledge search
16 Dust.tt Chat Thread-aware Slack AI assistants
17 Clawdbot Chat 8-platform personal AI assistant (desktop)
18 KIRA Chat Privacy-first desktop AI coworker
19 Runbear Chat Tiered Slack/Teams bot platform
20 Awesome Claude Skills Skills Open-source skill manifest patterns
21 PyMC-Marketing MMM Bayesian causal marketing mix modeling
22 Meta Robyn MMM Automated MMM with Pareto optimization
23 Replit Templates Templates Full project templates with instant deployment
24 Railway Templates Templates One-click deployable app templates
25 Render Blueprints Templates Infrastructure-as-code deployment templates
26 Vercel Templates Templates Frontend/fullstack starter templates

Master Comparison Table

Competitor Category Primary Language Deploy Model Key Differentiator Pricing
CrewAI Framework Python Library Role-based multi-agent + ToolCollection OSS
LangGraph Framework Python Library Stateful graphs + checkpointing OSS + Cloud
Swarm Framework Python Library Minimal primitives (educational) OSS
Claude-Flow Framework TypeScript Enterprise 54+ agents + neural learning OSS
AutoGen Framework Python Library Conversational multi-agent OSS
Pydantic-AI Framework Python Library Type safety + structured outputs OSS
VoltAgent Platform TypeScript Hybrid Full-stack + VoltOps console OSS + Cloud
LLMStack Platform Python Self-hosted No-code visual builder OSS + Cloud
BotSharp Platform C# Enterprise .NET ecosystem + plugins OSS
Composio SDK TypeScript Multi-framework 500+ integrations Freemium
Langfuse Observability TypeScript Self-hosted Tracing + prompt management OSS + Cloud
Braintrust Observability Python Cloud Statistical evals + regression Freemium
AgentOps Observability Python Cloud Session replay + cost tracking Freemium
ChatMemory Memory Python Library 4-tier hierarchy + pgvector OSS
Glean Knowledge - Enterprise Permission-aware search Enterprise
Dust.tt Chat - Cloud Thread-aware Slack AI Tiered
KIRA Chat Python Desktop Privacy-first, local-only OSS
Clawdbot Chat TypeScript Desktop 8-platform personal AI OSS
Runbear Chat - Cloud Tiered bot platform Tiered
Awesome Skills Skills - - Manifest format pattern OSS
PyMC-Marketing MMM Python Library Bayesian causal inference OSS
Robyn MMM R Library Automated Pareto optimization OSS
LightweightMMM MMM Python Library Google's Bayesian MMM OSS
Nielsen MMM - Service Industry standard Enterprise
Replit Agent Deploy - Cloud Zero-friction deploy Freemium
Hex AI Artifacts - Cloud Professional notebooks Tiered
v0.dev Artifacts - Cloud AI-generated UI preview Freemium

Document generated for Decision AI competitive analysis - January 2025 26 competitors analyzed across 9 categories Updated to reflect actual current state + Builder as Claude Factory architecture

Session Templates & Future Pack Architecture

This document describes the CURRENT template system in Decision Orchestrator and the FUTURE VISION for "Decision Packs" as deployable repositories.


CRITICAL: Current State vs. Future Vision

Aspect CURRENT (Implemented) FUTURE (Planned)
Storage Supabase database records GitHub repositories
Format JSON in database columns pack.yaml manifests + code
Deployment Dynamic Fly.io machines Pre-built container images
Discovery Query by name Trigger-based matching
Versioning metadata.version field Git tags
Status IMPLEMENTED VISION - NOT YET BUILT

CURRENT STATE: Session Templates (IMPLEMENTED)

Template Structure

-- Current schema (from supabase/migrations/)
CREATE TABLE session_templates (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT UNIQUE NOT NULL,
    system_prompt TEXT NOT NULL,
    tools_config JSONB DEFAULT '{}',
    mcp_config JSONB DEFAULT '{}',
    metadata JSONB DEFAULT '{}',
    is_default BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

How Templates Work Today

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CURRENT TEMPLATE SYSTEM (IMPLEMENTED)                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Supabase: session_templates table                                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ id          β”‚ UUID (primary key)                                     β”‚   β”‚
β”‚   β”‚ name        β”‚ "marketing-analyst"                                    β”‚   β”‚
β”‚   β”‚ system_promptβ”‚ "You are a marketing analysis assistant..."          β”‚   β”‚
β”‚   β”‚ tools_configβ”‚ {"allowed_tools": ["search", "calculate"]}            β”‚   β”‚
β”‚   β”‚ mcp_config  β”‚ {"servers": [...]}                                     β”‚   β”‚
β”‚   β”‚ metadata    β”‚ {"version": "1.0", "author": "team"}                  β”‚   β”‚
β”‚   β”‚ is_default  β”‚ false                                                  β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚   How it's used:                                                             β”‚
β”‚   1. User requests session via Discord                                       β”‚
β”‚   2. Workflow executor queries session_templates by name                     β”‚
β”‚   3. System prompt and configs are loaded                                   β”‚
β”‚   4. Fly.io machine is created with this configuration                       β”‚
β”‚   5. Claude instance runs with the template's instructions                   β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example Template Record

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "marketing-analyst",
  "system_prompt": "You are a marketing analytics assistant specializing in budget optimization and ROI analysis...",
  "tools_config": {
    "allowed_tools": ["search_web", "calculate", "create_chart"],
    "restrictions": []
  },
  "mcp_config": {
    "servers": []
  },
  "metadata": {
    "version": "1.0",
    "category": "analytics",
    "author": "pymc-labs"
  },
  "is_default": false
}

CURRENT STATE: Builder Claude's Role (IMPLEMENTED)

Builder Claude doesn't just use templatesβ€”it generates complete Claude environments from repository analysis.

Builder as Claude Factory

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BUILDER AS CLAUDE FACTORY (CURRENT)                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   When Builder receives a repo URL:                                          β”‚
β”‚                                                                             β”‚
β”‚   1. CLONE repository to /workspace/app                                      β”‚
β”‚                                                                             β”‚
β”‚   2. ANALYZE the repo:                                                       β”‚
β”‚      β”œβ”€β”€ Detect framework (Marimo, Streamlit, FastAPI, etc.)                β”‚
β”‚      β”œβ”€β”€ Check for existing .claude/ directory                              β”‚
β”‚      └── Identify dependencies and purpose                                  β”‚
β”‚                                                                             β”‚
β”‚   3. CONSTRUCT production .claude/ using META-SKILLS:                        β”‚
β”‚      β”œβ”€β”€ claude-factory.md (master skill)                                   β”‚
β”‚      β”œβ”€β”€ skill-creation.md (generate domain-specific skills)                β”‚
β”‚      β”œβ”€β”€ claude-md-templates.md (CLAUDE.md generation)                      β”‚
β”‚      └── merge-strategy.md (if repo has existing .claude/)                  β”‚
β”‚                                                                             β”‚
β”‚   4. GENERATE Dockerfile using framework skills:                             β”‚
β”‚      └── dockerfile-gen.md (framework-specific patterns)                    β”‚
β”‚                                                                             β”‚
β”‚   5. SET UP ACP server (always required)                                     β”‚
β”‚                                                                             β”‚
β”‚   6. DEPLOY via Fly: fly deploy --remote-only                               β”‚
β”‚                                                                             β”‚
β”‚   7. CREATE GitHub session repo as source of truth                           β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Meta-Skills (What Builder Uses)

Meta-Skill Purpose Location
claude-factory Orchestrate entire .claude/ construction .claude/skills/meta/
skill-creation Generate new skills from repo analysis .claude/skills/meta/
claude-md-templates Templates for CLAUDE.md generation .claude/skills/meta/
merge-strategy Combine repo's .claude/ with base .claude/skills/meta/
dockerfile-gen Framework-specific Dockerfile patterns .claude/skills/building/

Production Image Structure (What Builder Creates)

production-image/
β”œβ”€β”€ app/                    # User's code
β”œβ”€β”€ acp-server/             # Claude Agent SDK ACP server (REQUIRED)
β”‚   β”œβ”€β”€ server.py
β”‚   β”œβ”€β”€ tools/
β”‚   └── pyproject.toml
└── .claude/
    β”œβ”€β”€ CLAUDE.md           # Execution rules + purpose (REQUIRED)
    └── skills/             # Generated domain-specific skills (optional)

Git as Source of Truth (CURRENT)

Each session gets its own GitHub repository. This is IMPLEMENTED, not future vision.

Session Repo Pattern

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    GIT AS SOURCE OF TRUTH (IMPLEMENTED)                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   For each snapshot/template, Builder CREATES a new GitHub repo:            β”‚
β”‚                                                                             β”‚
β”‚   1. gh repo create org/template-{name} --private                           β”‚
β”‚   2. Push the snapshot to that repo                                         β”‚
β”‚   3. Return the repo URL + app name to Orchestrator                         β”‚
β”‚                                                                             β”‚
β”‚   Build Result Format:                                                       β”‚
β”‚   {                                                                          β”‚
β”‚     "status": "complete",                                                   β”‚
β”‚     "app_name": "template-my-thing",                                        β”‚
β”‚     "image_ref": "registry.fly.io/template-my-thing:v1",                    β”‚
β”‚     "git_repo": "github.com/org/template-my-thing",                         β”‚
β”‚     "git_ref": "snapshot/v1"                                                β”‚
β”‚   }                                                                          β”‚
β”‚                                                                             β”‚
β”‚   Orchestrator handles saving metadata to Supabase.                          β”‚
β”‚   Git repo IS the source of truth.                                          β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Multi-Repo Sessions (Combined State)

When user clones repos A, B, C and tinkers:

  1. Session workspace has A/, B/, C/ directories (each with .git)
  2. On snapshot: Remove nested .git directories (flatten)
  3. Initialize single root .git (new Repo D)
  4. Snapshot manifest records provenance (where each part came from)
  5. Dockerfile becomes simple COPY . . (not multiple clones)
  6. Result: Combined state becomes the template source

Configuration Spectrum (CURRENT)

The system slides along a spectrum based on use case:

HARDCODED ◄─────────────────────────────────────────► DYNAMIC
(batch)                                              (R&D)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Static   β”‚  β”‚Template  β”‚  β”‚ Custom   β”‚  β”‚ Blank    β”‚
β”‚ Script   β”‚  β”‚ Launch   β”‚  β”‚ Build    β”‚  β”‚ Slate    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Position Type Builder Claude Example
Far left Static Not used Not needed Pre-built script, just execute
Left-center Template Runs once In session Pre-saved image, launch and work
Right-center Custom Build Runs now In builder & session "Build this repo, then I'll analyze it"
Far right Blank Slate Interactive Full "Install Python, let me tinker"

Key: Same architecture at all points, different configuration. Not different systems.


FUTURE VISION: Decision Packs (NOT YET IMPLEMENTED)

NOTE: Everything below this line describes a FUTURE architecture that does not yet exist in the codebase.

The Pack Vision

A Decision Pack would be a complete, deployable repository containing:

  • Dockerfile for containerization
  • All dependencies and tooling
  • Agent/skill logic and instructions
  • Manifest/configuration
  • Ready to deploy on Fly.io
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FUTURE: DECISION PACK = DEPLOYABLE REPOSITORY             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  Current (Template):                    Future Vision (Pack):               β”‚
β”‚  ──────────────────                     ───────────────────────             β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Supabase record:    β”‚               β”‚ github.com/org/pack-mmm         β”‚ β”‚
β”‚  β”‚ - name              β”‚               β”‚                                  β”‚ β”‚
β”‚  β”‚ - system_prompt     β”‚               β”‚ β”œβ”€β”€ Dockerfile        ← BUILD   β”‚ β”‚
β”‚  β”‚ - tools_config      β”‚               β”‚ β”œβ”€β”€ fly.toml          ← DEPLOY  β”‚ β”‚
β”‚  β”‚ - mcp_config        β”‚               β”‚ β”œβ”€β”€ requirements.txt  ← DEPS    β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚ β”œβ”€β”€ pack.yaml         ← MANIFESTβ”‚ β”‚
β”‚                                        β”‚ β”œβ”€β”€ .claude/                    β”‚ β”‚
β”‚  Just database config.                 β”‚ β”‚   β”œβ”€β”€ CLAUDE.md     ← BRAIN   β”‚ β”‚
β”‚  Session created                       β”‚ β”‚   └── skills/                 β”‚ β”‚
β”‚  dynamically.                          β”‚ β”œβ”€β”€ acp-server/       ← COMMS   β”‚ β”‚
β”‚                                        β”‚ └── app/              ← CODE    β”‚ β”‚
β”‚                                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                             β”‚
β”‚                                        Complete. Self-contained. Deployable.β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Proposed Pack Manifest (pack.yaml) - FUTURE

# pack.yaml - Pack Manifest (PROPOSED - NOT YET IMPLEMENTED)
name: marketing-mmm
version: 2.1.0
description: Marketing Mix Modeling with Bayesian inference and budget optimization

# Identity
author: decision-ai
license: Apache-2.0
repository: github.com/decision-ai/pack-mmm

# Discovery triggers - WHEN should this pack be suggested?
triggers:
  keywords:
    - "marketing.*budget"
    - "ROI|ROAS"
    - "attribution"
    - "media mix"
  file_patterns:
    - "*.csv"           # Marketing data files
    - "*.parquet"
  contexts:
    - analytics
    - marketing
    - planning

# Capabilities - WHAT does this pack provide?
provides:
  - bayesian-inference
  - budget-optimization
  - uncertainty-quantification
  - channel-attribution

# Dependencies - WHAT other packs does this need?
requires:
  - name: data-loader
    version: ">=1.0"
  - name: visualization
    version: ">=2.0"
    optional: true

# Runtime requirements
runtime:
  base_image: python:3.11-slim
  memory: 2gb
  cpu: shared-2x
  gpu: false

# Environment variables needed
environment:
  required:
    - ANTHROPIC_API_KEY
  optional:
    - DATABASE_URL
    - REDIS_URL

# Health check
health:
  path: /healthz
  interval: 30s
  timeout: 5s

Roadmap: From Current to Future

Phase 0: Current State (IMPLEMENTED)

  • Supabase session_templates table
  • Dynamic Fly.io machine creation
  • System prompts loaded from database
  • ACP communication between sessions
  • Builder Claude with meta-skills
  • Git repos as source of truth for sessions

Phase 1: Enhanced Templates (PLANNED)

  • Add git_repo_url field to session_templates
  • If git_url present, clone and extract .claude/
  • Template versioning via git refs

Phase 2: Pack System (FUTURE VISION)

  • Git repositories as deployable units
  • pack.yaml manifest for discovery
  • Container registry for pre-built images
  • Pack marketplace

Migration Path

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        EVOLUTION PATH                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  TODAY (Current)             NEAR-TERM                    FUTURE            β”‚
β”‚  ──────────────              ─────────                    ──────            β”‚
β”‚                                                                             β”‚
β”‚  session_templates           session_templates            Pack repos        β”‚
β”‚  (Supabase)                  + git_repo_url field         (GitHub)          β”‚
β”‚       β”‚                          β”‚                           β”‚              β”‚
β”‚       β–Ό                          β–Ό                           β–Ό              β”‚
β”‚  system_prompt               system_prompt OR            .claude/CLAUDE.md  β”‚
β”‚  in database                 .claude/ from repo          in repository      β”‚
β”‚       β”‚                          β”‚                           β”‚              β”‚
β”‚       β–Ό                          β–Ό                           β–Ό              β”‚
β”‚  Dynamic Fly.io            Clone repo β†’                Pre-built images     β”‚
β”‚  machine creation          Build β†’ Deploy              from registry        β”‚
β”‚                                                                             β”‚
β”‚  Builder Claude            Builder Claude              Pack discovery       β”‚
β”‚  generates .claude/        respects repo .claude/      + pre-built         β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Summary

Aspect Current (Templates) Future (Packs)
Storage Supabase database + Git session repos GitHub pack repositories
Format JSON records + generated .claude/ YAML manifests + code
Deployment Builder generates, Fly deploys Pre-built containers
Discovery Query by name Trigger-based matching
Versioning Git tags on session repos Git tags on pack repos
Distribution Supabase + GitHub Container registry
Status IMPLEMENTED VISION

Platform-Agnostic Design: KEY USP

CRITICAL: Decision Packs are designed to be completely platform-agnostic. The same pack works identically whether accessed from Discord, Slack, Web, or Mobile.

How Packs Achieve Platform Independence

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    PLATFORM-AGNOSTIC PACK DESIGN                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Decision Pack (e.g., marketing-mmm)                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  pack.yaml                                                           β”‚   β”‚
β”‚   β”‚  β”œβ”€β”€ NO platform-specific config                                    β”‚   β”‚
β”‚   β”‚  β”œβ”€β”€ Defines capabilities, triggers, requirements                   β”‚   β”‚
β”‚   β”‚  └── Platform adapters handle UI differences                        β”‚   β”‚
β”‚   β”‚                                                                      β”‚   β”‚
β”‚   β”‚  .claude/CLAUDE.md                                                   β”‚   β”‚
β”‚   β”‚  β”œβ”€β”€ Domain expertise (not platform logic)                          β”‚   β”‚
β”‚   β”‚  β”œβ”€β”€ Tool usage patterns                                            β”‚   β”‚
β”‚   β”‚  └── Decision-making rules                                          β”‚   β”‚
β”‚   β”‚                                                                      β”‚   β”‚
β”‚   β”‚  acp-server/                                                         β”‚   β”‚
β”‚   β”‚  β”œβ”€β”€ Standardized protocol (platform-neutral)                       β”‚   β”‚
β”‚   β”‚  └── Same API for all platform adapters                             β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚   Platform Adapters Translate:                                              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                                                                      β”‚   β”‚
β”‚   β”‚   Discord:  Slash commands β†’ Tool calls                             β”‚   β”‚
β”‚   β”‚             Thread replies β†’ Conversation context                    β”‚   β”‚
β”‚   β”‚             Reactions β†’ Quick feedback                               β”‚   β”‚
β”‚   β”‚                                                                      β”‚   β”‚
β”‚   β”‚   Slack:    Slash commands β†’ Same tool calls                        β”‚   β”‚
β”‚   β”‚             Thread replies β†’ Same conversation context              β”‚   β”‚
β”‚   β”‚             Emoji reactions β†’ Same quick feedback                    β”‚   β”‚
β”‚   β”‚                                                                      β”‚   β”‚
β”‚   β”‚   Web:      Button clicks β†’ Same tool calls                         β”‚   β”‚
β”‚   β”‚             Chat messages β†’ Same conversation context               β”‚   β”‚
β”‚   β”‚             UI feedback β†’ Same quick feedback                        β”‚   β”‚
β”‚   β”‚                                                                      β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚   RESULT: Write once, run anywhere. Packs don't know or care                β”‚
β”‚   which platform they're accessed from.                                     β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pack Manifest: Platform Fields (FUTURE)

# pack.yaml - Platform-Agnostic by Default
name: marketing-mmm
version: 2.1.0

# Platform hints (OPTIONAL - for enhanced UX, not required)
platform_hints:
  # These are HINTS for platform adapters, not requirements
  # Packs work without them, but they enable richer experiences

  discord:
    # Discord-specific UI enhancements
    slash_commands:
      - name: analyze
        description: "Run marketing analysis"
    voice_enabled: true  # Can be used in voice channels

  slack:
    # Slack-specific UI enhancements
    shortcuts:
      - name: Quick Analysis
        callback_id: quick_analyze
    home_tab: true  # Show in Slack home tab

  web:
    # Web-specific UI enhancements
    widgets:
      - type: chart_viewer
      - type: data_table
    dark_mode: true

# Core pack definition (platform-independent)
provides:
  - bayesian-inference
  - budget-optimization

# These tools work identically on all platforms
tools:
  - calculate_roi
  - generate_report
  - visualize_data

Why Platform-Agnostic Matters

Benefit Description
Write Once, Deploy Everywhere Pack authors don't need platform expertise
Consistent Experience Users get same capabilities regardless of platform
Future-Proof New platforms automatically get all existing packs
Enterprise Ready Teams can choose their preferred platform (Slack vs Discord)
Reduced Maintenance One codebase, not N platform-specific versions

Platform Adapter Responsibilities

The platform adapter (not the pack) handles:

  1. Authentication: OAuth flows, token management
  2. UI Translation: Platform-specific buttons, menus, reactions
  3. Message Formatting: Markdown variants, attachments, embeds
  4. Real-time Updates: WebSocket/SSE vs polling
  5. File Handling: Upload/download mechanisms

The pack focuses only on:

  1. Domain Logic: Business rules, calculations, analysis
  2. Tool Execution: What to do, not how to present it
  3. Decision Making: Context-aware responses
  4. Data Processing: Transformations, queries, reports

This document reflects the actual state of Decision Orchestrator as of January 2025. The pack system described is a future vision based on roadmap documents in the codebase. Platform-agnostic design is a KEY USP enabling multi-platform expansion.

Artifact Output Patterns

How AI agents produce, structure, and present actionable outputs to users


The Core Insight

An artifact is a discrete, actionable unit of agent output. Unlike conversational text, artifacts are:

  • Structured and machine-parseable
  • Designed for a specific next action
  • Often rendered specially in UI
  • Versioned and comparable

The best agents don't just respondβ€”they produce artifacts that enable action.


Artifact Taxonomy

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          ARTIFACT TYPES                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  CODE ARTIFACTS                         DOCUMENT ARTIFACTS                   β”‚
β”‚  ──────────────                         ──────────────────                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚ Type: code      β”‚                    β”‚ Type: document  β”‚                 β”‚
β”‚  β”‚ Lang: python    β”‚                    β”‚ Format: markdownβ”‚                 β”‚
β”‚  β”‚ Executable: yes β”‚                    β”‚ Sections: yes   β”‚                 β”‚
β”‚  β”‚ Testable: yes   β”‚                    β”‚ Exportable: yes β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚                                                                             β”‚
β”‚  VISUAL ARTIFACTS                       DATA ARTIFACTS                       β”‚
β”‚  ────────────────                       ──────────────                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚ Type: visual    β”‚                    β”‚ Type: data      β”‚                 β”‚
β”‚  β”‚ Format: SVG     β”‚                    β”‚ Format: JSON    β”‚                 β”‚
β”‚  β”‚ Renderable: yes β”‚                    β”‚ Schema: defined β”‚                 β”‚
β”‚  β”‚ Interactive: ?  β”‚                    β”‚ Queryable: yes  β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚                                                                             β”‚
β”‚  COMPOSITE ARTIFACTS                    ACTION ARTIFACTS                     β”‚
β”‚  ───────────────────                    ────────────────                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚ Type: report    β”‚                    β”‚ Type: action    β”‚                 β”‚
β”‚  β”‚ Contains:       β”‚                    β”‚ Executable: yes β”‚                 β”‚
β”‚  β”‚  - text         β”‚                    β”‚ Reversible: ?   β”‚                 β”‚
β”‚  β”‚  - charts       β”‚                    β”‚ Confirmable: yesβ”‚                 β”‚
β”‚  β”‚  - tables       β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚  β”‚  - code         β”‚                                                        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                        β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Artifact Lifecycle

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      ARTIFACT LIFECYCLE                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚  GENERATE  │───►│  VALIDATE  │───►│  PRESENT   │───►│   ACTION   β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚        β”‚                 β”‚                 β”‚                 β”‚              β”‚
β”‚        β–Ό                 β–Ό                 β–Ό                 β–Ό              β”‚
β”‚  Agent creates     Schema check      UI renders       User acts on         β”‚
β”‚  structured        Type validation   Special display  artifact             β”‚
β”‚  output            Completeness      Action buttons   (copy, run, etc.)    β”‚
β”‚                                                                             β”‚
β”‚  ════════════════════════════════════════════════════════════════════════  β”‚
β”‚                                                                             β”‚
β”‚                         ITERATION LOOP                                       β”‚
β”‚                                                                             β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚     β”‚                                                          β”‚            β”‚
β”‚     β”‚    User: "Make the chart blue"                          β”‚            β”‚
β”‚     β”‚         β”‚                                                β”‚            β”‚
β”‚     β”‚         β–Ό                                                β”‚            β”‚
β”‚     β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚            β”‚
β”‚     β”‚    β”‚Artifact │────►│ Update  │────►│Artifact β”‚         β”‚            β”‚
β”‚     β”‚    β”‚  v1     β”‚     β”‚ Command β”‚     β”‚  v2     β”‚         β”‚            β”‚
β”‚     β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚            β”‚
β”‚     β”‚                                         β”‚               β”‚            β”‚
β”‚     β”‚                                         β”‚               β”‚            β”‚
β”‚     β”‚    Both versions preserved for comparison/rollback      β”‚            β”‚
β”‚     β”‚                                                          β”‚            β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Structured Output Schema Pattern

The key insight: artifacts need schemas for reliability.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SCHEMA-DRIVEN ARTIFACTS                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  WITHOUT SCHEMA                          WITH SCHEMA                         β”‚
β”‚  ──────────────                          ───────────                         β”‚
β”‚                                                                             β”‚
β”‚  Agent output:                           Agent output:                       β”‚
β”‚  "The ROI is about 2.5x,                 {                                   β”‚
β”‚   give or take, and                        "artifact_type": "analysis",     β”‚
β”‚   you should probably                      "metrics": {                     β”‚
β”‚   increase digital spend"                    "roi": {                       β”‚
β”‚                                                "value": 2.5,               β”‚
β”‚  Problems:                                     "unit": "multiple",         β”‚
β”‚  β€’ Can't extract metrics                       "confidence": 0.87          β”‚
β”‚  β€’ Can't compare versions                    }                              β”‚
β”‚  β€’ Can't feed to downstream                },                               β”‚
β”‚  β€’ UI can't render specially               "recommendations": [             β”‚
β”‚                                              {                              β”‚
β”‚                                                "action": "increase_spend", β”‚
β”‚                                                "channel": "digital",       β”‚
β”‚                                                "amount_pct": 15            β”‚
β”‚                                              }                              β”‚
β”‚                                            ]                                β”‚
β”‚                                          }                                  β”‚
β”‚                                                                             β”‚
β”‚                                          Benefits:                          β”‚
β”‚                                          β€’ Machine-parseable               β”‚
β”‚                                          β€’ Validatable                     β”‚
β”‚                                          β€’ Comparable                      β”‚
β”‚                                          β€’ UI can render richly            β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Decision AI Artifacts: Build Results

In Decision AI, build results are a key artifact type. Builder Claude produces structured build results:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BUILD RESULT ARTIFACT (DECISION AI)                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  After Builder Claude completes a build:                                     β”‚
β”‚                                                                             β”‚
β”‚  ## Build Result                                                             β”‚
β”‚                                                                             β”‚
β”‚  | Field | Value |                                                           β”‚
β”‚  |-------|-------|                                                           β”‚
β”‚  | **App Name** | `mmm-session-abc123` |                                    β”‚
β”‚  | **App URL** | `https://mmm-session-abc123.fly.dev` |                     β”‚
β”‚  | **Image** | `registry.fly.io/mmm-session-abc123:deployment-xyz` |        β”‚
β”‚  | **Framework** | `marimo` |                                               β”‚
β”‚  | **Git Repo** | `github.com/org/session-mmm-abc123` |                     β”‚
β”‚  | **Git Ref** | `snapshot/v1` |                                            β”‚
β”‚                                                                             β”‚
β”‚  ### Services Config                                                         β”‚
β”‚  ```json                                                                     β”‚
β”‚  [                                                                           β”‚
β”‚    {"protocol": "tcp", "internal_port": 8080,                               β”‚
β”‚     "ports": [{"port": 443, "handlers": ["tls", "http"]}]},                 β”‚
β”‚    {"protocol": "tcp", "internal_port": 3017,                               β”‚
β”‚     "ports": [{"port": 3017, "handlers": ["tls"]}]}                         β”‚
β”‚  ]                                                                           β”‚
β”‚  ```                                                                         β”‚
β”‚                                                                             β”‚
β”‚  ### Save as Template                                                        β”‚
β”‚  CUSTOM_FLY_SAVE_TEMPLATE(                                                  β”‚
β”‚    slug="mmm-analysis",                                                     β”‚
β”‚    name="MMM Analysis Session",                                             β”‚
β”‚    image_ref="registry.fly.io/mmm-session-abc123:deployment-xyz",           β”‚
β”‚    services=[...],                                                          β”‚
β”‚    framework="marimo",                                                      β”‚
β”‚    description="Marketing mix modeling with PyMC"                           β”‚
β”‚  )                                                                           β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Artifact Presentation Patterns

How artifacts are displayed to users:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    PRESENTATION PATTERNS                                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  PATTERN 1: INLINE EXPANSION                                                 β”‚
β”‚  ───────────────────────────                                                 β”‚
β”‚                                                                             β”‚
β”‚     Conversation:                                                            β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                              β”‚
β”‚     β”‚ User: "Generate a budget allocation"   β”‚                              β”‚
β”‚     β”‚                                        β”‚                              β”‚
β”‚     β”‚ Agent: "Here's the optimized budget:"  β”‚                              β”‚
β”‚     β”‚                                        β”‚                              β”‚
β”‚     β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚                              β”‚
β”‚     β”‚ β”‚ [ARTIFACT: Budget Table]           β”‚ β”‚ ◄── Rendered inline          β”‚
β”‚     β”‚ β”‚                                    β”‚ β”‚                              β”‚
β”‚     β”‚ β”‚ Channel  β”‚ Current β”‚ Recommended  β”‚ β”‚                              β”‚
β”‚     β”‚ β”‚ ─────────┼─────────┼────────────  β”‚ β”‚                              β”‚
β”‚     β”‚ β”‚ TV       β”‚ $500K   β”‚ $420K        β”‚ β”‚                              β”‚
β”‚     β”‚ β”‚ Digital  β”‚ $300K   β”‚ $380K        β”‚ β”‚                              β”‚
β”‚     β”‚ β”‚                                    β”‚ β”‚                              β”‚
β”‚     β”‚ β”‚ [Copy] [Export CSV] [Apply]       β”‚ β”‚ ◄── Action buttons           β”‚
β”‚     β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚                              β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚
β”‚                                                                             β”‚
β”‚  PATTERN 2: SIDE PANEL                                                       β”‚
β”‚  ─────────────────────                                                       β”‚
β”‚                                                                             β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚     β”‚   Conversation      β”‚     Artifact Panel               β”‚               β”‚
β”‚     β”‚                     β”‚                                  β”‚               β”‚
β”‚     β”‚  User: "Show me     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚               β”‚
β”‚     β”‚   the analysis"     β”‚  β”‚  [LIVE PREVIEW]           β”‚  β”‚               β”‚
β”‚     β”‚                     β”‚  β”‚                           β”‚  β”‚               β”‚
β”‚     β”‚  Agent: "I've       β”‚  β”‚   Chart renders here      β”‚  β”‚               β”‚
β”‚     β”‚   created a viz..." β”‚  β”‚   Updates in real-time    β”‚  β”‚               β”‚
β”‚     β”‚                     β”‚  β”‚   as conversation         β”‚  β”‚               β”‚
β”‚     β”‚  User: "Make it     β”‚  β”‚   progresses              β”‚  β”‚               β”‚
β”‚     β”‚   a bar chart"      β”‚  β”‚                           β”‚  β”‚               β”‚
β”‚     β”‚                     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚               β”‚
β”‚     β”‚  Agent: "Updated."  β”‚                                  β”‚               β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                                             β”‚
β”‚  PATTERN 3: ARTIFACT GALLERY                                                 β”‚
β”‚  ───────────────────────────                                                 β”‚
β”‚                                                                             β”‚
β”‚     Session artifacts:                                                       β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚     β”‚ Chart   β”‚  β”‚ Code    β”‚  β”‚ Report  β”‚  β”‚ Config  β”‚                     β”‚
β”‚     β”‚ v3      β”‚  β”‚ v2      β”‚  β”‚ v1      β”‚  β”‚ v1      β”‚                     β”‚
β”‚     β”‚ [view]  β”‚  β”‚ [view]  β”‚  β”‚ [view]  β”‚  β”‚ [view]  β”‚                     β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚
β”‚         β–²                                                                   β”‚
β”‚         └── Click to expand, compare with previous versions                 β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Action Buttons Pattern

Every artifact should have clear next actions:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      ACTION BUTTON TAXONOMY                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  CODE ARTIFACTS                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  [Copy to Clipboard] [Run in Sandbox] [Insert into File] [Explain] β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚  DATA ARTIFACTS                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  [Download CSV] [Download JSON] [Open in Spreadsheet] [Visualize]  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚  BUILD ARTIFACTS (Decision AI specific)                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  [Open Session] [Save as Template] [View Logs] [Stop Session]      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚  ANALYSIS ARTIFACTS                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  [Export Report] [Schedule Re-run] [Share with Team] [Add to Dash] β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚  RECOMMENDATION ARTIFACTS                                                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  [Apply Changes] [Modify Parameters] [Reject] [Request Explanation]β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Streaming & Progressive Rendering

For long-running artifact generation:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   PROGRESSIVE ARTIFACT RENDERING                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  USER REQUEST: "Generate a comprehensive marketing report"                   β”‚
β”‚                                                                             β”‚
β”‚  TIME ────────────────────────────────────────────────────────────────►     β”‚
β”‚                                                                             β”‚
β”‚  t=0s     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚           β”‚  [ARTIFACT: Report]                          β”‚                  β”‚
β”‚           β”‚                                              β”‚                  β”‚
β”‚           β”‚  Status: Generating...                       β”‚                  β”‚
β”‚           β”‚  β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘  15%              β”‚                  β”‚
β”‚           β”‚                                              β”‚                  β”‚
β”‚           β”‚  Sections:                                   β”‚                  β”‚
β”‚           β”‚  βœ“ Executive Summary (ready)                β”‚                  β”‚
β”‚           β”‚  ⟳ Channel Analysis (in progress)           β”‚                  β”‚
β”‚           β”‚  β—‹ Recommendations (pending)                 β”‚                  β”‚
β”‚           β”‚                                              β”‚                  β”‚
β”‚           β”‚  [View Available Sections]                   β”‚                  β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚                                                                             β”‚
β”‚  t=60s    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚           β”‚  [ARTIFACT: Report]                          β”‚                  β”‚
β”‚           β”‚                                              β”‚                  β”‚
β”‚           β”‚  Status: Complete βœ“                          β”‚                  β”‚
β”‚           β”‚  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  100%             β”‚                  β”‚
β”‚           β”‚                                              β”‚                  β”‚
β”‚           β”‚  All 5 sections ready                        β”‚                  β”‚
β”‚           β”‚                                              β”‚                  β”‚
β”‚           β”‚  [Download PDF] [Share] [Schedule Update]    β”‚                  β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Principles

Principle Implementation
Schema-First Define artifact structure before generation
Progressive Display Show what's ready while generating rest
Clear Actions Every artifact has obvious next steps
Versioning Track changes, enable comparison & rollback
Export Flexibility Multiple formats for different consumers
Inline Context Keep artifacts near relevant conversation

Common Pitfalls

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       ARTIFACT ANTI-PATTERNS                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  PROBLEM: Wall of Text                  SOLUTION: Structured Artifact        β”‚
β”‚  ─────────────────────                  ────────────────────────             β”‚
β”‚                                                                             β”‚
β”‚  "Here's your analysis:                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚   The ROI for TV is 2.1x               β”‚ [Analysis Artifact]      β”‚        β”‚
β”‚   which is lower than digital          β”‚                          β”‚        β”‚
β”‚   at 3.2x but social is only           β”‚ ROI Summary Table        β”‚        β”‚
β”‚   1.8x so you should..."               β”‚ Key Insight: Digital > TVβ”‚        β”‚
β”‚                                         β”‚ [See Full Report]        β”‚        β”‚
β”‚  Can't scan, extract, or act            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                         Scannable, actionable               β”‚
β”‚                                                                             β”‚
β”‚  PROBLEM: No Next Steps                 SOLUTION: Action Buttons             β”‚
β”‚  ──────────────────────                 ────────────────────────             β”‚
β”‚                                                                             β”‚
β”‚  "Here's the code."                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚                                         β”‚ [Code Artifact]          β”‚        β”‚
β”‚  User: "Now what?"                      β”‚                          β”‚        β”‚
β”‚                                         β”‚ [Run] [Copy] [Test]      β”‚        β”‚
β”‚                                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                                                             β”‚
β”‚  PROBLEM: Lost Artifacts                SOLUTION: Artifact Gallery           β”‚
β”‚  ───────────────────────                ────────────────────────             β”‚
β”‚                                                                             β”‚
β”‚  User: "Where's that chart             Session artifacts persisted          β”‚
β”‚   you made earlier?"                    and browsable in sidebar            β”‚
β”‚                                                                             β”‚
β”‚  Scroll, scroll, scroll...              One-click to find any output        β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Great artifacts are not just outputsβ€”they're starting points for the next action. The difference between "here's some text" and "here's a structured artifact with clear next steps" is the difference between a chatbot and a productive AI assistant.

Session Deployment UX

How Decision Orchestrator deploys Claude sessions from Discord to Fly.io


CRITICAL: Current State vs. Future Vision

Aspect CURRENT (Implemented) FUTURE (Planned)
User Interface Discord chat Discord + Web UI
Template Source Supabase database + Git repos GitHub pack repositories
Deployment Dynamic Fly.io creation via Builder Claude Pre-built container images
Build Process Builder Claude constructs .claude/ Pack already contains .claude/
Status IMPLEMENTED VISION

What EXISTS Today: Discord β†’ Builder β†’ Fly.io Sessions

The current Decision Orchestrator deploys sessions through a Discord-driven workflow with Builder Claude:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CURRENT DEPLOYMENT FLOW (IMPLEMENTED)                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  1. User sends message in Discord                                            β”‚
β”‚     └── "Build this repo: github.com/user/mmm-analysis"                     β”‚
β”‚                                                                             β”‚
β”‚  2. Discord Bot receives message                                             β”‚
β”‚     └── Routes to Workflow Executor (Orchestrator Claude)                   β”‚
β”‚                                                                             β”‚
β”‚  3. Orchestrator uses CUSTOM_FLY_LAUNCH_BUILDER                              β”‚
β”‚     └── Spawns ephemeral Builder Claude session                             β”‚
β”‚     └── Builder app name: mmm-builder-{hex}                                 β”‚
β”‚                                                                             β”‚
β”‚  4. Orchestrator sends build instructions via CUSTOM_ACP_SEND_MESSAGE        β”‚
β”‚     └── Builder Claude receives repo URL and instructions                   β”‚
β”‚                                                                             β”‚
β”‚  5. Builder Claude executes build workflow:                                  β”‚
β”‚     β”œβ”€β”€ Clone repository to /workspace/app                                  β”‚
β”‚     β”œβ”€β”€ Analyze repo (detect framework, dependencies, purpose)              β”‚
β”‚     β”œβ”€β”€ Check for existing .claude/ (merge if present)                      β”‚
β”‚     β”œβ”€β”€ Generate .claude/ using meta-skills:                                β”‚
β”‚     β”‚   β”œβ”€β”€ claude-factory.md (master skill)                                β”‚
β”‚     β”‚   β”œβ”€β”€ skill-creation.md (generate domain skills)                      β”‚
β”‚     β”‚   └── claude-md-templates.md (CLAUDE.md with execution rules)         β”‚
β”‚     β”œβ”€β”€ Generate Dockerfile using dockerfile-gen.md                         β”‚
β”‚     β”œβ”€β”€ Set up ACP server (always required)                                 β”‚
β”‚     β”œβ”€β”€ Deploy via: fly deploy --remote-only                                β”‚
β”‚     └── Create GitHub session repo                                          β”‚
β”‚                                                                             β”‚
β”‚  6. Builder reports back via ACP:                                            β”‚
β”‚     └── Build result with app_name, image_ref, git_repo, git_ref            β”‚
β”‚                                                                             β”‚
β”‚  7. Orchestrator cleans up:                                                  β”‚
β”‚     └── CUSTOM_FLY_STOP_SESSION destroys builder                            β”‚
β”‚                                                                             β”‚
β”‚  8. Session Ready                                                            β”‚
β”‚     └── User can communicate via Discord (through orchestrator β†’ ACP)       β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Decision AI-Specific Tools (IMPLEMENTED)

Orchestrator Tools

Tool Purpose Parameters
CUSTOM_FLY_LAUNCH_BUILDER Spawn ephemeral builder session region, timeout
CUSTOM_FLY_LAUNCH_SESSION Launch from saved template template_slug, region
CUSTOM_ACP_SEND_MESSAGE Send instructions to session session_id, message
CUSTOM_FLY_STOP_SESSION Destroy session session_id
CUSTOM_FLY_GET_SESSION_STATUS Check session health session_id
CUSTOM_FLY_LIST_SESSIONS List active sessions -
CUSTOM_FLY_LIST_TEMPLATES List available templates -
CUSTOM_FLY_SAVE_TEMPLATE Save build as reusable template metadata
CUSTOM_FLY_DELETE_TEMPLATE Remove saved template template_id

What These Tools DON'T Do

  • No predefined Dockerfile schemasβ€”Builder Claude generates them
  • No rigid skill parametersβ€”Claude-in-Builder decides what to use
  • No forced framework choicesβ€”Claude analyzes and detects

Session Types (IMPLEMENTED)

Builder Sessions (Ephemeral)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BUILDER SESSION LIFECYCLE                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Name pattern: mmm-builder-{hex}                                            β”‚
β”‚   Lifetime: ~5-30 minutes                                                    β”‚
β”‚   Purpose: Clone, analyze, generate .claude/, build, deploy                  β”‚
β”‚                                                                             β”‚
β”‚   Environment:                                                               β”‚
β”‚   β€’ Fly CLI (for deployment - builds happen remotely)                       β”‚
β”‚   β€’ Git (for cloning repos)                                                  β”‚
β”‚   β€’ GitHub CLI (gh) for creating repos                                       β”‚
β”‚   β€’ Python/Node (for dependency analysis)                                    β”‚
β”‚   β€’ NO Docker CLI (fly deploy --remote-only)                                β”‚
β”‚                                                                             β”‚
β”‚   Builder's .claude/ (its own skills):                                       β”‚
β”‚   β”œβ”€β”€ skills/building/ (Dockerfile, Fly deploy patterns)                    β”‚
β”‚   β”œβ”€β”€ skills/frameworks/ (Marimo, Streamlit, FastAPI patterns)              β”‚
β”‚   └── skills/meta/ (claude-factory, skill-creation, merge-strategy)         β”‚
β”‚                                                                             β”‚
β”‚   Death: After deployment (cleanup via CUSTOM_FLY_STOP_SESSION)             β”‚
β”‚                                                                             β”‚
β”‚   KEY INSIGHT: Builder IS the Session                                        β”‚
β”‚   When creating a new session, Builder deploys a copy of itself             β”‚
β”‚   as a new Fly app. No separate "base session image" needed.                β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

User Work Sessions (Persistent)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    USER WORK SESSION                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Name pattern: mmm-{hex}                                                    β”‚
β”‚   Lifetime: User determines (hours to days)                                  β”‚
β”‚   Purpose: Interactive analysis, experimentation, collaboration              β”‚
β”‚                                                                             β”‚
β”‚   Environment:                                                               β”‚
β”‚   β€’ Framework runtime (Marimo, Streamlit, FastAPI, etc.)                    β”‚
β”‚   β€’ Claude Agent SDK / ACP server                                           β”‚
β”‚   β€’ User's code and dependencies                                            β”‚
β”‚                                                                             β”‚
β”‚   Session's .claude/ (created by Builder):                                   β”‚
β”‚   β”œβ”€β”€ CLAUDE.md (execution rules + purpose)                                 β”‚
β”‚   └── skills/ (generated domain-specific skills)                            β”‚
β”‚                                                                             β”‚
β”‚   KEY: User's workspace, git-backed, saveable as template                   β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Git as Source of Truth (IMPLEMENTED)

Each session gets its own GitHub repository:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SESSION REPO PATTERN                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   For each snapshot/template, Builder CREATES a new GitHub repo:            β”‚
β”‚                                                                             β”‚
β”‚   1. gh repo create org/session-mmm-{hex} --private                         β”‚
β”‚   2. Push the session state to that repo                                    β”‚
β”‚   3. Tag for versioning: git tag "snapshot/v1"                              β”‚
β”‚   4. Return repo URL + app name to Orchestrator                             β”‚
β”‚                                                                             β”‚
β”‚   Benefits:                                                                  β”‚
β”‚   β€’ Transparency: Readable source code, not opaque binary images            β”‚
β”‚   β€’ Reproducibility: git clone --branch tag = exact state                   β”‚
β”‚   β€’ Shareability: Link to GitHub repo = shareable, forkable                 β”‚
β”‚   β€’ Auditable: Every change logged with timestamps, diffs                   β”‚
β”‚                                                                             β”‚
β”‚   Build Result Format:                                                       β”‚
β”‚   {                                                                          β”‚
β”‚     "status": "complete",                                                   β”‚
β”‚     "app_name": "mmm-analysis-abc123",                                      β”‚
β”‚     "image_ref": "registry.fly.io/mmm-analysis-abc123:v1",                  β”‚
β”‚     "git_repo": "github.com/org/session-mmm-abc123",                        β”‚
β”‚     "git_ref": "snapshot/v1"                                                β”‚
β”‚   }                                                                          β”‚
β”‚                                                                             β”‚
β”‚   Orchestrator handles saving metadata to Supabase.                          β”‚
β”‚   Git repo IS the source of truth.                                          β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Current User Journey

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CURRENT USER JOURNEY (IMPLEMENTED)                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  USER: "Build github.com/myorg/marketing-analysis for MMM modeling"         β”‚
β”‚                                                                             β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚  ╔═══════════════════════════════════════════════════════════════════════╗ β”‚
β”‚  β•‘  ORCHESTRATOR RECEIVES MESSAGE                                         β•‘ β”‚
β”‚  ╠═══════════════════════════════════════════════════════════════════════╣ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•‘   1. Identifies intent: Build a session from repo                     β•‘ β”‚
β”‚  β•‘   2. Calls CUSTOM_FLY_LAUNCH_BUILDER                                  β•‘ β”‚
β”‚  β•‘   3. Waits for builder to be ready                                    β•‘ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚  ╔═══════════════════════════════════════════════════════════════════════╗ β”‚
β”‚  β•‘  BUILDER CLAUDE EXECUTES                                               β•‘ β”‚
β”‚  ╠═══════════════════════════════════════════════════════════════════════╣ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•‘   1. Clone repo                                                       β•‘ β”‚
β”‚  β•‘   2. Detect framework: Marimo notebook with PyMC-Marketing            β•‘ β”‚
β”‚  β•‘   3. Generate .claude/:                                               β•‘ β”‚
β”‚  β•‘      - CLAUDE.md with execution rules + MMM-specific purpose          β•‘ β”‚
β”‚  β•‘      - skills/mmm-analysis.md (generated from repo patterns)          β•‘ β”‚
β”‚  β•‘   4. Generate Dockerfile (Marimo pattern)                             β•‘ β”‚
β”‚  β•‘   5. Deploy: fly deploy --remote-only                                 β•‘ β”‚
β”‚  β•‘   6. Create GitHub repo: github.com/org/session-mmm-abc123            β•‘ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚  ╔═══════════════════════════════════════════════════════════════════════╗ β”‚
β”‚  β•‘  SESSION DEPLOYED                                                      β•‘ β”‚
β”‚  ╠═══════════════════════════════════════════════════════════════════════╣ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•‘   "Your MMM analysis session is ready!"                                β•‘ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•‘   Session URL: https://mmm-abc123.fly.dev                              β•‘ β”‚
β”‚  β•‘   ACP Endpoint: wss://mmm-abc123.fly.dev:3017                          β•‘ β”‚
β”‚  β•‘   Git Repo: github.com/org/session-mmm-abc123                          β•‘ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•‘   Framework: Marimo                                                    β•‘ β”‚
β”‚  β•‘   Skills: MMM Analysis, Bayesian Modeling                              β•‘ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•‘   [Open Session] [View Notebook] [Save as Template]                    β•‘ β”‚
β”‚  β•‘                                                                       β•‘ β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ACP Communication (IMPLEMENTED)

The Agent Communication Protocol enables inter-session messaging:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ACP PROTOCOL (IMPLEMENTED)                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   ACP is the WebSocket/SSE protocol for inter-session messaging.            β”‚
β”‚                                                                             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         WebSocket        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚   β”‚  Orchestrator   β”‚ ──────────────────────►  β”‚   Builder       β”‚         β”‚
β”‚   β”‚  (Discord Bot)  β”‚ ◄──────────────────────  β”‚   Session       β”‚         β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚            β”‚                                            β”‚                   β”‚
β”‚            β”‚          WebSocket                         β”‚                   β”‚
β”‚            └─────────────────────────────────────────────                   β”‚
β”‚                                                         β–Ό                   β”‚
β”‚                                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚                                                β”‚   User Work     β”‚         β”‚
β”‚                                                β”‚   Session       β”‚         β”‚
β”‚                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                                                                             β”‚
β”‚   Every production image MUST include the ACP server:                       β”‚
β”‚   └── Copied from /templates/acp-server/ into production image             β”‚
β”‚                                                                             β”‚
β”‚   Communication Flow:                                                        β”‚
β”‚   1. Orchestrator calls CUSTOM_ACP_SEND_MESSAGE(session_id, message)        β”‚
β”‚   2. Message goes to session's ACP server                                   β”‚
β”‚   3. Session's Claude processes and responds                                β”‚
β”‚   4. Response returns to Orchestrator                                       β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Execution Rules (What Goes in CLAUDE.md)

Builder generates CLAUDE.md with these execution rules:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    EXECUTION RULES IN CLAUDE.MD                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   ## Execution Environment Rules                                             β”‚
β”‚                                                                             β”‚
β”‚   **You MUST work through {FRAMEWORK}'s runtime environment.**               β”‚
β”‚                                                                             β”‚
β”‚   ### PROHIBITED Actions                                                     β”‚
β”‚   ❌ Running `python -c "..."` via bash                                     β”‚
β”‚   ❌ Running `python script.py` directly                                    β”‚
β”‚   ❌ Using sed/awk/grep to modify code files                                β”‚
β”‚   ❌ Text manipulation with regex on notebook/app files                     β”‚
β”‚   ❌ Direct file edits outside ACP tools                                    β”‚
β”‚   ❌ Treating notebooks as regular Python scripts                           β”‚
β”‚   ❌ Any bash command that executes Python code                             β”‚
β”‚                                                                             β”‚
β”‚   ### REQUIRED Actions                                                       β”‚
β”‚   βœ… Use ACP tools (mcp__acp__Read, Edit, Write) for file operations        β”‚
β”‚   βœ… Use framework MCP tools (mcp__{framework}__*) for interactions         β”‚
β”‚   βœ… Let framework runtime handle code execution                            β”‚
β”‚   βœ… Read current state before making changes                               β”‚
β”‚   βœ… Respond via ACP when task completes                                    β”‚
β”‚                                                                             β”‚
β”‚   Framework-Specific Rules (examples):                                       β”‚
β”‚                                                                             β”‚
β”‚   ### Marimo                                                                 β”‚
β”‚   - Use mcp__marimo__run_cell to execute cells                              β”‚
β”‚   - Cells are reactive - changing one may trigger others                    β”‚
β”‚   - Never modify notebook.py directly with text tools                       β”‚
β”‚                                                                             β”‚
β”‚   ### Streamlit                                                              β”‚
β”‚   - App reruns on every interaction - design for statelessness              β”‚
β”‚   - Use st.session_state for persistent state                               β”‚
β”‚   - Modifications to app.py require app restart                             β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

FUTURE VISION: Pack-Based Deployment (NOT YET IMPLEMENTED)

NOTE: Everything below describes a FUTURE vision that does NOT yet exist.

The Pack Deploy Vision

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FUTURE: PACK-BASED DEPLOYMENT (NOT IMPLEMENTED)                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  CURRENT (Builder-based):                                                    β”‚
β”‚  ────────────────────────                                                   β”‚
β”‚                                                                             β”‚
β”‚  User β†’ Discord β†’ Launch Builder β†’ Clone + Analyze + Generate β†’ Deploy      β”‚
β”‚                                                                             β”‚
β”‚  ════════════════════════════════════════════════════════════════════════   β”‚
β”‚                                                                             β”‚
β”‚  FUTURE VISION (Pack-based):                                                 β”‚
β”‚  ───────────────────────────                                                β”‚
β”‚                                                                             β”‚
β”‚  User picks pack β†’ Pre-built image pulled β†’ Inject secrets/data β†’ Run       β”‚
β”‚       ↑                     ↑                      ↑                        β”‚
β”‚   Pack registry       Already built          User customization             β”‚
β”‚   with discovery      (no Builder!)          is optional                    β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Voice Sessions (PLANNED - NOT YET IMPLEMENTED)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FUTURE: VOICE SESSIONS (PLANNED)                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  Vision: Hands-free interaction via Discord voice channel                    β”‚
β”‚                                                                             β”‚
β”‚  1. "Hey, start a brainstorm session"                                       β”‚
β”‚  2. Orchestrator: "Started. What are we thinking about?"                    β”‚
β”‚  3. User: "I want to explore pricing models..."                             β”‚
β”‚  4. Orchestrator: [Explains options, asks questions]                        β”‚
β”‚  5. Back-and-forth conversation                                             β”‚
β”‚  6. User: "Good session. Summarize and create a Notion doc"                 β”‚
β”‚  7. Orchestrator: [Posts summary, creates Notion page, ends session]        β”‚
β”‚                                                                             β”‚
β”‚  Current Status:                                                             β”‚
β”‚  β€’ Voice session tools: PLANNED                                             β”‚
β”‚  β€’ Multi-participant voice: PLANNED                                         β”‚
β”‚  β€’ Voice-to-action workflows: PLANNED                                       β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Comparison: Current vs. Future

Aspect Current (Builder) Future (Packs)
User Interface Discord chat Discord + Web UI
Build Process Builder Claude generates everything Pre-built, just configure
Build Time Minutes (clone + analyze + build) Seconds (pull image)
Customization Full (Builder analyzes repo) Limited (pack is fixed)
Discovery Query templates by name Trigger-based pack matching
Status IMPLEMENTED VISION

Key UX Principles

These principles apply to BOTH current and future systems:

Principle Why It Matters
Show Your Work Explain what was detected and why; builds trust
Smart Defaults Minimize decisions, but allow overrides
Real Progress Show actual build status, not fake animations
Graceful Pauses Interrupt for secrets without losing progress
Actionable Errors Don't just say "failed"; say what to do
Verify Success Prove it works (health check, response time)
Clear Next Steps Always show what comes after deployment

This document reflects the actual state of Decision Orchestrator as of January 2025. Builder Claude with meta-skills is IMPLEMENTED. Voice sessions and pack-based deployment are PLANNED.

Workflow-Based Routing Architecture

How Decision AI routes requests through workflow classification, not model tiers


CRITICAL: This is NOT a Three-Tier System

Many AI systems use a three-tier model routing approach (Haiku β†’ Sonnet β†’ Opus based on complexity). Decision AI uses a fundamentally different approach: workflow-based routing.

Aspect Traditional Three-Tier Decision AI Workflows
Routing basis Request complexity Message intent + channel scope
Decision point Token cost optimization Which workflow(s) to execute
Classification Simple β†’ Medium β†’ Complex Ambient detection + active triggers
Execution Single model call Workflow executor with tools

The Workflow Classification System (IMPLEMENTED)

Decision AI uses LLM-based classification to match messages to configured workflows:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    WORKFLOW CLASSIFICATION FLOW (ACTUAL)                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   1. Discord message arrives                                                β”‚
β”‚      └── Message context: author, channel, content, attachments            β”‚
β”‚                                                                             β”‚
β”‚   2. Get applicable workflows for this scope                                β”‚
β”‚      └── Query: discord_workflow_scope (server_id, channel_id)             β”‚
β”‚      └── Filter: is_enabled = true                                         β”‚
β”‚                                                                             β”‚
β”‚   3. Classify message against workflows                                     β”‚
β”‚      └── OpenAI gpt-4.1 (fast model for classification)                    β”‚
β”‚      └── Structured output: WorkflowClassificationResult                   β”‚
β”‚      └── Returns: list of workflow_ids that match                          β”‚
β”‚                                                                             β”‚
β”‚   4. Execute matching workflows                                             β”‚
β”‚      └── Claude Agent Service with workflow instructions + tools           β”‚
β”‚      └── MCP server provides tool access based on tool_slugs               β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Workflow Database Schema

-- discord_workflow: Defines what workflows do
CREATE TABLE discord_workflow (
    id UUID PRIMARY KEY,
    name TEXT NOT NULL,                    -- "Build Session", "Voice Assistant"
    instructions TEXT NOT NULL,            -- System prompt for Claude
    tool_slugs JSONB DEFAULT '[]',         -- ["CUSTOM_FLY_LAUNCH_SESSION", ...]
    config JSONB DEFAULT '{}',             -- Workflow-specific config
    output_schema JSONB,                   -- Optional structured output
    interaction_mode TEXT DEFAULT 'autonomous',  -- 'autonomous' | 'interactive' | 'hybrid'
    trigger_type TEXT DEFAULT 'ambient',   -- 'ambient' | 'active'
    is_enabled BOOLEAN DEFAULT true
);

-- discord_workflow_scope: Where workflows apply
CREATE TABLE discord_workflow_scope (
    id UUID PRIMARY KEY,
    workflow_id UUID REFERENCES discord_workflow(id),
    server_id BIGINT,                      -- NULL = all servers
    channel_id BIGINT,                     -- NULL = all channels in server
    is_enabled BOOLEAN DEFAULT true
);

Trigger Types: Ambient vs Active

Decision AI distinguishes between two trigger modes:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      TRIGGER TYPE COMPARISON                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   AMBIENT TRIGGERS                        ACTIVE TRIGGERS                    β”‚
β”‚   ────────────────                        ───────────────                    β”‚
β”‚                                                                             β”‚
β”‚   β€’ Always listening                      β€’ Require explicit activation      β”‚
β”‚   β€’ LLM classifies each message          β€’ Keyword/command detection        β”‚
β”‚   β€’ Higher API cost                       β€’ Lower cost                       β”‚
β”‚   β€’ More flexible matching               β€’ Predictable activation            β”‚
β”‚                                                                             β”‚
β”‚   Example:                                Example:                           β”‚
β”‚   "I need to analyze marketing data"     "/build github.com/user/repo"      β”‚
β”‚   β†’ LLM detects intent β†’ triggers        β†’ Pattern match β†’ triggers         β”‚
β”‚     session workflow                       builder workflow                  β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Workflow Types (ACTUAL SYSTEM)

Based on the codebase, Decision AI has several workflow categories:

Session Management Workflows

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SESSION MANAGEMENT WORKFLOWS                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   BUILDER WORKFLOW                                                          β”‚
β”‚   ────────────────                                                          β”‚
β”‚   Tool slugs:                                                               β”‚
β”‚   β€’ CUSTOM_FLY_LAUNCH_BUILDER       (spawn ephemeral builder)              β”‚
β”‚   β€’ CUSTOM_ACP_SEND_MESSAGE         (talk to builder/sessions)             β”‚
β”‚   β€’ CUSTOM_FLY_STOP_SESSION         (cleanup builder)                      β”‚
β”‚   β€’ CUSTOM_FLY_SAVE_TEMPLATE        (persist as reusable)                  β”‚
β”‚                                                                             β”‚
β”‚   Triggers on: "Build this repo", "Deploy from github.com/..."             β”‚
β”‚                                                                             β”‚
β”‚   ─────────────────────────────────────────────────────────────────────── β”‚
β”‚                                                                             β”‚
β”‚   ANALYSIS WORKFLOW                                                         β”‚
β”‚   ─────────────────                                                         β”‚
β”‚   Tool slugs:                                                               β”‚
β”‚   β€’ CUSTOM_FLY_LAUNCH_SESSION       (launch from template)                 β”‚
β”‚   β€’ CUSTOM_FLY_LIST_TEMPLATES       (show available options)               β”‚
β”‚   β€’ CUSTOM_ACP_SEND_MESSAGE         (interact with session)                β”‚
β”‚                                                                             β”‚
β”‚   Triggers on: "Start an MMM session", "Launch analysis environment"       β”‚
β”‚                                                                             β”‚
β”‚   ─────────────────────────────────────────────────────────────────────── β”‚
β”‚                                                                             β”‚
β”‚   VOICE SESSION WORKFLOW                                                    β”‚
β”‚   ──────────────────────                                                   β”‚
β”‚   Tool slugs:                                                               β”‚
β”‚   β€’ CUSTOM_VOICE_SESSION_START      (start voice conversation)             β”‚
β”‚   β€’ CUSTOM_VOICE_SESSION_STOP       (end voice session)                    β”‚
β”‚   β€’ CUSTOM_VOICE_SEND_TO_THREAD     (post to thread)                       β”‚
β”‚                                                                             β”‚
β”‚   Triggers on: Voice channel join, "/voice" command                        β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Integration Workflows

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    INTEGRATION WORKFLOWS (EXAMPLES)                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   TOGGL WORKFLOW (Time Tracking)                                            β”‚
β”‚   ──────────────────────────────                                           β”‚
β”‚   Tool slugs:                                                               β”‚
β”‚   β€’ TOGGL_CREATE_TIME_ENTRY                                                β”‚
β”‚   β€’ TOGGL_GET_TIME_ENTRIES                                                 β”‚
β”‚   β€’ TOGGL_UPDATE_TIME_ENTRY                                                β”‚
β”‚   β€’ TOGGL_STOP_TIME_ENTRY                                                  β”‚
β”‚                                                                             β”‚
β”‚   Triggers on: "Log time for...", "What did I work on today?"              β”‚
β”‚                                                                             β”‚
β”‚   ─────────────────────────────────────────────────────────────────────── β”‚
β”‚                                                                             β”‚
β”‚   LANGFUSE WORKFLOW (Observability)                                         β”‚
β”‚   ─────────────────────────────────                                        β”‚
β”‚   Tool slugs:                                                               β”‚
β”‚   β€’ LANGFUSE_GET_TRACES                                                    β”‚
β”‚   β€’ LANGFUSE_GET_SESSION_TRACES                                            β”‚
β”‚   β€’ LANGFUSE_SCORE                                                         β”‚
β”‚                                                                             β”‚
β”‚   Triggers on: "Show me traces for...", "Score this execution"             β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Workflow Executor Pattern

The workflow executor is the core runtime that executes matched workflows:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    WORKFLOW EXECUTOR ARCHITECTURE                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Input:                                                                    β”‚
β”‚   β”œβ”€β”€ Message context (XML-formatted Discord message)                      β”‚
β”‚   β”œβ”€β”€ Matched workflows (from classifier)                                  β”‚
β”‚   └── User context (discord_user_id, channel, server)                      β”‚
β”‚                                                                             β”‚
β”‚   Execution:                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚  1. Build MCP server with workflow's tool_slugs                      β”‚ β”‚
β”‚   β”‚     └── create_mcp_server_for_workflows(tool_slugs)                  β”‚ β”‚
β”‚   β”‚                                                                      β”‚ β”‚
β”‚   β”‚  2. Initialize Claude Agent Service                                  β”‚ β”‚
β”‚   β”‚     └── ClaudeAgentService.execute_workflows_with_agent()            β”‚ β”‚
β”‚   β”‚                                                                      β”‚ β”‚
β”‚   β”‚  3. Execute with workflow instructions as system prompt              β”‚ β”‚
β”‚   β”‚     └── Claude receives: instructions + message_context + tools      β”‚ β”‚
β”‚   β”‚                                                                      β”‚ β”‚
β”‚   β”‚  4. Stream response + tool calls back to Discord                     β”‚ β”‚
β”‚   β”‚     └── Real-time status updates via embeds                          β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                             β”‚
β”‚   Output:                                                                   β”‚
β”‚   β”œβ”€β”€ Discord messages (streaming response)                                β”‚
β”‚   β”œβ”€β”€ Tool call results (actions taken)                                    β”‚
β”‚   └── Execution record (logged to discord_workflow_execution)              β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Interaction Modes

Workflows can operate in different interaction modes:

Autonomous Mode

User: "Build github.com/user/repo as a streamlit app"
β”‚
β–Ό
Workflow executes without interruption:
1. Launch builder session
2. Send build instructions
3. Wait for completion
4. Report results
β”‚
β–Ό
User: "βœ… Session deployed at https://mmm-abc123.fly.dev"

Interactive Mode

User: "Help me set up an analysis environment"
β”‚
β–Ό
Workflow pauses for decisions:
"Which template would you like?
1. mmm-studio (interactive notebook)
2. mmm-deepagent (autonomous analysis)
3. decision-pack-compiler (custom builds)"
β”‚
β–Ό
User: "mmm-studio"
β”‚
β–Ό
Workflow continues with user choice

Hybrid Mode

User: "Build this repo with authentication"
β”‚
β–Ό
Workflow executes autonomously but pauses on:
- Missing required secrets (ANTHROPIC_API_KEY)
- Ambiguous framework detection
- Critical errors requiring user decision
β”‚
β–Ό
Human-in-the-loop approval when needed

Routing Flow Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    COMPLETE ROUTING FLOW                                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  Discord Message                                                            β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                       β”‚
β”‚  β”‚ Get Workflows   β”‚  Query discord_workflow_scope                         β”‚
β”‚  β”‚ for Scope       β”‚  (server_id, channel_id)                              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                       β”‚
β”‚           β”‚                                                                 β”‚
β”‚           β–Ό                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                       β”‚
β”‚  β”‚ Classify        β”‚  OpenAI gpt-4.1 structured output                     β”‚
β”‚  β”‚ Message         β”‚  Returns: [workflow_id_1, workflow_id_2, ...]         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                       β”‚
β”‚           β”‚                                                                 β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”                                                          β”‚
β”‚     β”‚           β”‚                                                          β”‚
β”‚     β–Ό           β–Ό                                                          β”‚
β”‚  No Match    Match(es)                                                      β”‚
β”‚     β”‚           β”‚                                                          β”‚
β”‚     β–Ό           β–Ό                                                          β”‚
β”‚  Ignore     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                            β”‚
β”‚  or         β”‚ Build MCP       β”‚  Tool slugs from workflow                  β”‚
β”‚  Default    β”‚ Server          β”‚  + Composio integrations                   β”‚
β”‚  Response   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                            β”‚
β”‚                      β”‚                                                      β”‚
β”‚                      β–Ό                                                      β”‚
β”‚             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                            β”‚
β”‚             β”‚ Execute with    β”‚  Claude Agent Service                      β”‚
β”‚             β”‚ Claude          β”‚  (streaming, tool calls)                   β”‚
β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                            β”‚
β”‚                      β”‚                                                      β”‚
β”‚                      β–Ό                                                      β”‚
β”‚             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                            β”‚
β”‚             β”‚ Log Execution   β”‚  discord_workflow_execution                β”‚
β”‚             β”‚                 β”‚  (status, tool_calls, output)              β”‚
β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                            β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Decisions

Why Workflow-Based Instead of Tier-Based?

Consideration Tier-Based Approach Workflow-Based Approach
Scope control Global (all messages) Per-server, per-channel
Tool access Fixed per tier Configurable per workflow
Instructions Generic Task-specific prompts
Observability Hard to track Full execution history
Customization Requires code changes Database configuration

Workflow Scoping Benefits

Example: Different behaviors in different channels

#general channel:
└── Generic assistant workflow (ambient, basic tools)

#mmm-analysis channel:
└── MMM workflow (ambient, full session tools)

#voice channel:
└── Voice workflow (active on join, voice tools)

#builds channel:
└── Builder workflow (active, builder tools)

Adding New Workflows

To add a new workflow:

  1. Define the workflow in database
INSERT INTO discord_workflow (name, instructions, tool_slugs, trigger_type)
VALUES (
    'My New Workflow',
    'You are an assistant that helps with...',
    '["TOOL_A", "TOOL_B"]',
    'ambient'
);
  1. Scope it to channels
INSERT INTO discord_workflow_scope (workflow_id, server_id, channel_id)
VALUES ('workflow-uuid', 123456789, 987654321);
  1. Ensure tools exist in MCP server
  • Add tool handler in workflow_tools/
  • Register in create_mcp_server_for_workflows()

Common Pitfalls

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       WORKFLOW ROUTING PITFALLS                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  PROBLEM: Multiple workflows match          SOLUTION: Priority or exclusion β”‚
β”‚  ──────────────────────────────             ───────────────────────────────│
β”‚                                                                             β”‚
β”‚  "Build this repo" matches:                 Add workflow priority field     β”‚
β”‚  - Generic assistant                        or use interaction_mode to     β”‚
β”‚  - Builder workflow                         determine which takes precedenceβ”‚
β”‚                                                                             β”‚
β”‚  PROBLEM: Classification too slow           SOLUTION: Active triggers       β”‚
β”‚  ─────────────────────────────             ──────────────────────────────   β”‚
β”‚                                                                             β”‚
β”‚  Every message β†’ LLM classification         Use trigger_type='active'      β”‚
β”‚  adds latency                               for explicit commands           β”‚
β”‚                                             (/build, /voice, etc.)          β”‚
β”‚                                                                             β”‚
β”‚  PROBLEM: Wrong scope                       SOLUTION: Scope validation     β”‚
β”‚  ──────────────────────                    ────────────────────────         β”‚
β”‚                                                                             β”‚
β”‚  Workflow runs in wrong channel             Always verify scope before      β”‚
β”‚  (e.g., voice in text channel)             execution                       β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Workflow-based routing gives Decision AI the flexibility to behave differently across contexts while maintaining a unified architecture. The key insight is that routing isn't about model capabilityβ€”it's about matching user intent to the right set of tools and instructions.

Thread & Conversation Design

How Decision AI manages conversation context across Discord threads, voice sessions, and Claude sessions


Conversation Architecture Overview

Decision AI has multiple conversation layers that interact:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CONVERSATION LAYERS (IMPLEMENTED)                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   LAYER 1: Discord Thread                                                   β”‚
β”‚   ───────────────────────                                                   β”‚
β”‚   β€’ Primary user interface                                                  β”‚
β”‚   β€’ Messages persist in Discord                                             β”‚
β”‚   β€’ Thread per conversation or session                                      β”‚
β”‚   β€’ Referenced by thread_id in all other systems                           β”‚
β”‚                                                                             β”‚
β”‚   LAYER 2: Workflow Executor                                                β”‚
β”‚   ──────────────────────────                                               β”‚
β”‚   β€’ Processes messages through Claude                                       β”‚
β”‚   β€’ Executes tools (Fly, ACP, Composio)                                    β”‚
β”‚   β€’ Logs to discord_workflow_execution                                      β”‚
β”‚   β€’ Stateless per message (no conversation memory)                         β”‚
β”‚                                                                             β”‚
β”‚   LAYER 3: Claude Sessions (Fly.io)                                         β”‚
β”‚   ─────────────────────────────────                                        β”‚
β”‚   β€’ Embodied Claude in containers                                          β”‚
β”‚   β€’ Maintain their own conversation state                                  β”‚
β”‚   β€’ Communicate via ACP                                                    β”‚
β”‚   β€’ Can persist across multiple user messages                              β”‚
β”‚                                                                             β”‚
β”‚   LAYER 4: Voice Sessions (Special)                                         β”‚
β”‚   ─────────────────────────────────                                        β”‚
β”‚   β€’ 3-Claude architecture                                                  β”‚
β”‚   β€’ Supervisor + Fast Agent + Session                                      β”‚
β”‚   β€’ Real-time voice interaction                                            β”‚
β”‚   β€’ Thread used for artifacts/detail                                       β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Discord Thread Session Model

Each significant conversation gets a Discord thread for persistent context:

-- discord_thread_session: Links threads to workflow sessions
CREATE TABLE discord_thread_session (
    id UUID PRIMARY KEY,
    server_id BIGINT NOT NULL,
    channel_id BIGINT NOT NULL,
    thread_id BIGINT NOT NULL UNIQUE,
    user_id BIGINT NOT NULL,
    session_type TEXT,              -- 'workflow', 'voice', 'builder'
    session_metadata JSONB,         -- Workflow-specific data
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Thread Session Lifecycle

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    THREAD SESSION LIFECYCLE                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   CREATE                                                                    β”‚
β”‚   ──────                                                                    β”‚
β”‚   User triggers workflow that needs persistent context                      β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚   Bot creates Discord thread in current channel                             β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚   discord_thread_session record created                                     β”‚
β”‚                                                                             β”‚
β”‚   ═══════════════════════════════════════════════════════════════════════  β”‚
β”‚                                                                             β”‚
β”‚   ACTIVE                                                                    β”‚
β”‚   ──────                                                                    β”‚
β”‚   User messages in thread β†’ Routed to associated workflow/session           β”‚
β”‚   Bot posts status updates, results, artifacts to thread                    β”‚
β”‚   Voice sessions post transcripts and summaries to thread                   β”‚
β”‚                                                                             β”‚
β”‚   ═══════════════════════════════════════════════════════════════════════  β”‚
β”‚                                                                             β”‚
β”‚   CLOSE                                                                     β”‚
β”‚   ─────                                                                     β”‚
β”‚   Explicit: User says "end session" or uses /stop                          β”‚
β”‚   Implicit: Idle timeout (configurable)                                    β”‚
β”‚   Auto: Fly.io session destroyed β†’ thread marked inactive                  β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚   is_active = false, thread archived (optional)                            β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Voice Session Architecture (IMPLEMENTED)

Voice sessions use a unique 3-Claude architecture for low-latency interaction:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    VOICE SESSION ARCHITECTURE                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚     FAST AGENT           β”‚         β”‚   SUPERVISOR LOOP        β”‚         β”‚
β”‚  β”‚     Claude Haiku 4.5     β”‚         β”‚   Claude Opus 4.5        β”‚         β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€         β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€         β”‚
β”‚  β”‚ β€’ SPEAKS via voice       β”‚         β”‚ β€’ NEVER speaks           β”‚         β”‚
β”‚  β”‚ β€’ Quick responses        β”‚         β”‚ β€’ Runs in background     β”‚         β”‚
β”‚  β”‚ β€’ Reads curated context  β”‚         β”‚ β€’ Curates context        β”‚         β”‚
β”‚  β”‚ β€’ <1s latency            β”‚         β”‚ β€’ Posts to thread        β”‚         β”‚
β”‚  β”‚ β€’ No tools               β”‚         β”‚ β€’ Full tool access       β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚             β”‚                                    β”‚                          β”‚
β”‚             β”‚ reads                              β”‚ writes                   β”‚
β”‚             β–Ό                                    β–Ό                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚                    CURATED CONTEXT                          β”‚            β”‚
β”‚  β”‚  (Summary, mode, answer-to-present, recent context)         β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                             β”‚
β”‚  ════════════════════════════════════════════════════════════════════════  β”‚
β”‚                                                                             β”‚
β”‚  DATA FLOW:                                                                 β”‚
β”‚                                                                             β”‚
β”‚  User speaks ─────────────────────────────────────────┐                    β”‚
β”‚       β”‚                                                β”‚                    β”‚
β”‚       β–Ό                                                β”‚                    β”‚
β”‚  STT (Speech-to-Text)                                 β”‚                    β”‚
β”‚       β”‚                                                β”‚                    β”‚
β”‚       β–Ό                                                β–Ό                    β”‚
β”‚  conversation_log.append()              Supervisor polls every 5s          β”‚
β”‚       β”‚                                        β”‚                           β”‚
β”‚       β–Ό                                        β–Ό                           β”‚
β”‚  Fast Agent responds              Analyzes gap_messages (new since         β”‚
β”‚       β”‚                           last poll), updates curated_context      β”‚
β”‚       β–Ό                                        β”‚                           β”‚
β”‚  TTS (Text-to-Speech)                         β–Ό                           β”‚
β”‚       β”‚                           Posts detailed info to thread            β”‚
β”‚       β–Ό                                                                    β”‚
β”‚  Audio to user                                                             β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Voice Session Data Model

@dataclass
class VoiceSession:
    """Voice session with thread support."""
    session_id: str
    thread_id: str                           # Discord thread for artifacts
    curated_context: str = "You are a helpful voice assistant."
    conversation_log: list = field(default_factory=list)
    voice_client: Any = None                 # Discord voice connection
    last_processed_index: int = 0            # Supervisor progress marker
    current_mode: str = "discovery"          # "discovery" or "delivery"
    text_only_mode: bool = False             # Skip STT, users type in thread

    # Embed state for UI
    queue_embed_id: int | None = None
    status_embed_id: int | None = None
    queued_indices: list = field(default_factory=list)
    processed_indices: list = field(default_factory=list)

Supervisor Loop Pattern

The supervisor runs as a background coroutine, polling every 5 seconds:

class SupervisorLoop:
    """Continuous polling loop that curates context for the Fast Agent."""

    async def _poll_cycle(self) -> None:
        """Execute one poll cycle - analyze state and update context."""

        # Get gap messages (new since last processed)
        conv_log = self.session.conversation_log
        last_idx = self.session.last_processed_index
        gap_messages = conv_log[last_idx:]

        if not gap_messages:
            return  # Nothing new

        # Fetch thread context for additional info
        thread_context = await self._fetch_thread_context()

        # Build prompt for Claude
        user_message = self._build_poll_prompt(
            conv_log, gap_messages, last_idx, thread_context
        )

        # Run Claude Agent with full tool access
        await self._run_supervisor_agent(user_message)

        # Update progress marker
        self.session.last_processed_index = len(conv_log)

Supervisor Tools

supervisor_tools = [
    # Thread communication
    "CUSTOM_VOICE_SEND_TO_THREAD",      # Post messages/artifacts to thread
    "CUSTOM_VOICE_MESSAGE_EDIT",         # Edit previous messages
    "CUSTOM_VOICE_MESSAGE_DELETE",       # Delete messages

    # Session management
    "CUSTOM_VOICE_SESSION_WRITE",        # Update curated_context
    "CUSTOM_VOICE_SESSION_STOP",         # End the voice session
    "CUSTOM_VOICE_CREATE_ARTIFACT",      # Create rich artifacts

    # Discord read access
    "CUSTOM_DISCORD_READ_THREAD",        # Read thread history
    "CUSTOM_DISCORD_READ_CHANNEL",       # Read channel history
    "CUSTOM_DISCORD_PARSE_LINK",         # Extract content from Discord links
]

Curated Context Format

The supervisor writes structured context that the Fast Agent can quickly parse:

## CURATED SUMMARY

## Research Question
[Original user query]

## Summary
[High-level findings answering the user's question]

## Detailed Findings

### [Component/Area 1]
- Finding with reference ([file.ext:line](link))
- Connection to other components
- Implementation details

### [Component/Area 2]
...

## Code References
- `path/to/file.py:123` - Description of what's there
- `another/file.ts:45-67` - Description of the code block

## Architecture Insights
[Patterns, conventions, and design decisions discovered]

## Historical Context (from conversation history)
[Relevant insights from conversation history]

## Open Questions
[Any areas that need further investigation]

## RECENT CONTEXT (since last poll)
[Key points from gap_messages Fast Agent should know]

## VOICE GUIDELINES
[Any specific instructions for this turn]

Text-Only Mode

Voice sessions can operate in text-only mode (no STT/TTS):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TEXT-ONLY VOICE SESSION                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   When text_only_mode = True:                                               β”‚
β”‚                                                                             β”‚
β”‚   β€’ Users type in the Discord thread                                        β”‚
β”‚   β€’ Fast Agent still responds quickly (text instead of voice)               β”‚
β”‚   β€’ Supervisor still analyzes and curates                                   β”‚
β”‚   β€’ Same 3-Claude architecture, just no audio                               β”‚
β”‚                                                                             β”‚
β”‚   Use case: Users who prefer typing, noisy environments, accessibility      β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ACP Communication Pattern

Claude sessions communicate via the Agent Communication Protocol:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ACP MESSAGE FLOW                                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   ORCHESTRATOR                           SESSION                            β”‚
β”‚   ────────────                           ───────                            β”‚
β”‚                                                                             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        SSE        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚   β”‚ Discord Bot     β”‚ ◄───────────────► β”‚ Fly.io Machine  β”‚                β”‚
β”‚   β”‚ (Workflow       β”‚                   β”‚ (Claude + App)  β”‚                β”‚
β”‚   β”‚  Executor)      β”‚                   β”‚                 β”‚                β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                                                                             β”‚
β”‚   Message Format:                                                           β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚ CUSTOM_ACP_SEND_MESSAGE(                                             β”‚ β”‚
β”‚   β”‚   app_name="mmm-abc123",                                            β”‚ β”‚
β”‚   β”‚   message="Run the MMM model with these parameters...",             β”‚ β”‚
β”‚   β”‚   timeout=300  # 5 minutes for long operations                      β”‚ β”‚
β”‚   β”‚ )                                                                    β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                             β”‚
β”‚   Response Format:                                                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚ {                                                                    β”‚ β”‚
β”‚   β”‚   "status": "complete",                                             β”‚ β”‚
β”‚   β”‚   "response": "Model fitting complete. Here are the results...",    β”‚ β”‚
β”‚   β”‚   "artifacts": ["output/model.pkl", "output/diagnostics.html"]      β”‚ β”‚
β”‚   β”‚ }                                                                    β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Session as Thread Boundary

In Decision AI, sessions provide natural thread boundaries:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DECISION AI SESSION MEMORY                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   ORCHESTRATOR (Discord)                                                     β”‚
β”‚   └── User interacts via Discord messages                                   β”‚
β”‚       └── Each Discord thread can map to one or more sessions               β”‚
β”‚                                                                             β”‚
β”‚   SESSION (Fly.io)                                                           β”‚
β”‚   └── Each session = isolated context                                       β”‚
β”‚       └── Session's .claude/CLAUDE.md = persistent instructions             β”‚
β”‚       └── Session's skills/ = domain knowledge                              β”‚
β”‚       └── Git repo = full state history                                     β”‚
β”‚                                                                             β”‚
β”‚   CONTEXT FLOW:                                                              β”‚
β”‚                                                                             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”‚
β”‚   β”‚  Orchestrator   β”‚   ACP   β”‚   Session A     β”‚                          β”‚
β”‚   β”‚  Context        β”‚ ──────► β”‚   Context       β”‚                          β”‚
β”‚   β”‚                 β”‚         β”‚                 β”‚                          β”‚
β”‚   β”‚  β€’ User prefs   β”‚         β”‚  β€’ CLAUDE.md    β”‚                          β”‚
β”‚   β”‚  β€’ Thread hist  β”‚         β”‚  β€’ Skills       β”‚                          β”‚
β”‚   β”‚  β€’ Task state   β”‚         β”‚  β€’ Git history  β”‚                          β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β”‚
β”‚                                                                             β”‚
β”‚   KEY INSIGHT: Session = Thread with its own persistent brain               β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Thread Types Summary

Thread Type Purpose Claude Involvement Persistence
Workflow Thread Group workflow execution messages Orchestrator only Message duration
Builder Thread Track build progress Builder Claude via ACP Until cleanup
Session Thread Interactive session communication Session Claude via ACP Session lifetime
Voice Thread Artifacts + detailed output Supervisor posts here Session lifetime

Context Window Management

Fast Agent Context Window (Optimized for Speed)

Fast Agent sees:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   [System Prompt: ~500 tokens]                                              β”‚
β”‚   [Curated Context: ~2000 tokens - compressed by Supervisor]                β”‚
β”‚   [Recent Conversation: last 5-10 messages - ~1000 tokens]                  β”‚
β”‚   [User's Current Message]                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Total: ~4000 tokens β†’ Fast response possible

Supervisor Context Window (Optimized for Depth)

Supervisor sees:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   [System Prompt: ~1500 tokens]                                             β”‚
β”‚   [Thread History: full Discord thread - potentially large]                 β”‚
β”‚   [Full Conversation Log: all voice exchanges]                              β”‚
β”‚   [Previous Curated Context]                                                β”‚
β”‚   [Gap Messages: new since last poll]                                       β”‚
β”‚   [Tool Call Results: from research/actions]                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Total: Can be large, but Opus handles it. Polling frequency limits growth.

Common Patterns

Pattern 1: Research β†’ Delivery

User: "How does the authentication system work?"

Supervisor:
1. Detects research question
2. Uses tools to read codebase
3. Writes curated_context with findings
4. Posts detailed analysis to thread

Fast Agent:
1. Reads curated_context
2. Delivers verbal summary
3. Points user to thread for details

Pattern 2: Action β†’ Confirmation

User: "Start an MMM session for me"

Supervisor:
1. Detects action request
2. Calls CUSTOM_FLY_LAUNCH_SESSION
3. Updates curated_context with result
4. Posts session URL to thread

Fast Agent:
1. Confirms action taken
2. Provides session URL verbally

Pattern 3: Multi-Turn Discovery

User: "I want to analyze my marketing data"

Fast Agent:
"What kind of analysis? Budget optimization,
 channel attribution, or trend forecasting?"

User: "Channel attribution"

Supervisor:
1. Notes the preference
2. Updates curated_context with mode

Fast Agent:
"Great! Do you have your data ready, or
 should I help you format it first?"

Design Principles

Principle Implementation
Separation of concerns Fast Agent speaks, Supervisor thinks
Latency optimization Fast Agent has minimal context
Thread as artifact store Detailed info goes to thread, not voice
Resilience Session persists even if voice disconnects
Transparency Thread shows what Supervisor is doing
Session isolation Each session = separate context, no bleed

The key insight of Decision AI's conversation design is that different contexts require different conversation strategies. Voice needs speed (Fast Agent), complex research needs depth (Supervisor), and persistent artifacts need a home (threads). The 3-Claude architecture separates these concerns while keeping them coordinated.

Memory Layer Architecture

How AI agents persist memory and context across sessions


CRITICAL: This is PLANNED - NOT YET BUILT

IMPORTANT: The sophisticated memory layer described in this document is a FUTURE VISION. The current Decision AI implementation uses session-level persistence (via git repos and .claude/ directories) but does NOT yet have cross-session user memory, organizational memory, or the retrieval mechanisms described here.

What EXISTS Today

Memory Type Status How It Works
Session Memory IMPLEMENTED Git repo tracks all session changes
Session Context IMPLEMENTED .claude/CLAUDE.md + skills/
Template Reuse IMPLEMENTED Save session as template, relaunch later
Voice Curated Context IMPLEMENTED Supervisor Loop updates curated_context
User Memory NOT YET BUILT Planned for future
Org Memory NOT YET BUILT Planned for future
Cross-Session NOT YET BUILT Planned for future

The Core Problem

AI agents are stateless by default. Every request starts fresh with zero memory of past interactions. This creates a frustrating experience where users repeat themselves and context is lost.

The solution: A layered memory architecture that stores, retrieves, and injects relevant context at the right moments.


The Five Memory Layers (VISION)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     MEMORY LAYER ARCHITECTURE (FUTURE VISION)               β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  LAYER 5: ORGANIZATIONAL MEMORY                    [NOT YET BUILT]    β”‚ β”‚
β”‚  β”‚  Scope: Entire org  β”‚  Lifetime: Permanent  β”‚  Access: Role-based     β”‚ β”‚
β”‚  β”‚  "All services use gRPC" β€’ "Q4 budget is frozen"                      β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                 β”‚                                           β”‚
β”‚                        Inherited by β–Ό                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  LAYER 4: TEAM/PROJECT MEMORY                      [NOT YET BUILT]    β”‚ β”‚
β”‚  β”‚  Scope: Team/project  β”‚  Lifetime: Project duration  β”‚  Shared        β”‚ β”‚
β”‚  β”‚  "This repo uses Tailwind" β€’ "We chose PostgreSQL for ACID"           β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                 β”‚                                           β”‚
β”‚                        Inherited by β–Ό                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  LAYER 3: USER MEMORY                              [NOT YET BUILT]    β”‚ β”‚
β”‚  β”‚  Scope: Individual  β”‚  Lifetime: Account lifetime  β”‚  Private         β”‚ β”‚
β”‚  β”‚  "Prefers concise responses" β€’ "Senior backend engineer"              β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                 β”‚                                           β”‚
β”‚                        Inherited by β–Ό                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  LAYER 2: SESSION/THREAD MEMORY                    [IMPLEMENTED]      β”‚ β”‚
β”‚  β”‚  Scope: Conversation  β”‚  Lifetime: Session  β”‚  Thread-local           β”‚ β”‚
β”‚  β”‚  "Building REST API with FastAPI" β€’ "Chose MongoDB for this"          β”‚ β”‚
β”‚  β”‚                                                                       β”‚ β”‚
β”‚  β”‚  In Decision AI: .claude/CLAUDE.md + skills/ + git repo               β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                 β”‚                                           β”‚
β”‚                        Inherited by β–Ό                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  LAYER 1: WORKING MEMORY                           [IMPLEMENTED]      β”‚ β”‚
β”‚  β”‚  Scope: Current turn  β”‚  Lifetime: Ephemeral  β”‚  Request-local        β”‚ β”‚
β”‚  β”‚  "Reading file X" β€’ "Found 3 functions to modify"                     β”‚ β”‚
β”‚  β”‚                                                                       β”‚ β”‚
β”‚  β”‚  In Decision AI: Claude's context window during request               β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key insight: Each layer inherits from the layer above. A single request would have access to all five layers, merged into one coherent context.


Current Implementation: Session Memory Only

Decision AI currently implements session-level memory via git repos and .claude/ directories:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CURRENT SESSION MEMORY (IMPLEMENTED)                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   SESSION (Fly.io machine)                                                   β”‚
β”‚   β”œβ”€β”€ /workspace/app/           # User's code                               β”‚
β”‚   β”œβ”€β”€ /workspace/.claude/       # Session's brain                           β”‚
β”‚   β”‚   β”œβ”€β”€ CLAUDE.md            # Execution rules + purpose                  β”‚
β”‚   β”‚   └── skills/              # Domain-specific knowledge                  β”‚
β”‚   └── .git/                    # Full history                               β”‚
β”‚                                                                             β”‚
β”‚   GIT REPO (github.com/org/session-mmm-{hex})                               β”‚
β”‚   └── All changes tracked with commits                                      β”‚
β”‚   └── Tags for snapshots: "template/my-analysis-v1"                         β”‚
β”‚   └── Full browsable history                                                β”‚
β”‚                                                                             β”‚
β”‚   PERSISTENCE PATTERNS:                                                      β”‚
β”‚   β€’ Session running: Full context in Claude's window + .claude/             β”‚
β”‚   β€’ Session stopped: Git repo preserves state                               β”‚
β”‚   β€’ Template saved: Can relaunch from saved state                           β”‚
β”‚                                                                             β”‚
β”‚   VOICE SESSION MEMORY:                                                      β”‚
β”‚   β€’ Supervisor curates context into curated_context                         β”‚
β”‚   β€’ Gap messages tracked with last_processed_index                          β”‚
β”‚   β€’ Thread history provides additional context                              β”‚
β”‚                                                                             β”‚
β”‚   LIMITATIONS:                                                               β”‚
β”‚   β€’ No cross-session user memory ("remember my preferences")                β”‚
β”‚   β€’ No "remember this for next time"                                        β”‚
β”‚   β€’ Each new session starts fresh (unless launched from template)           β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Proposed pgvector Schema (FUTURE)

Based on the MEMORY_LAYER_ARCHITECTURE.md design document in the codebase:

-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- Memory entries with scope and permissions
CREATE TABLE memories (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id UUID NOT NULL REFERENCES orgs(id),
    project_id UUID REFERENCES projects(id),  -- NULL = org-wide
    session_id UUID REFERENCES sessions(id),  -- NULL = persistent

    -- Content
    content TEXT NOT NULL,
    embedding vector(1536) NOT NULL,

    -- Metadata
    memory_type TEXT NOT NULL,  -- 'fact', 'decision', 'context', 'preference'
    source TEXT,                -- 'user', 'claude', 'system'
    tags TEXT[],

    -- Permissions
    visibility TEXT DEFAULT 'project',  -- 'session', 'project', 'org'

    -- Timestamps
    created_at TIMESTAMPTZ DEFAULT NOW(),
    accessed_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ  -- NULL = permanent
);

-- Optimized index for vector search within scope
CREATE INDEX idx_memories_org_embedding
    ON memories USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

Permission Model (FUTURE)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         ORG SCOPE                           β”‚
β”‚  visibility = 'org'                                         β”‚
β”‚  Accessible to all projects/sessions in org                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚                   PROJECT SCOPE                      β”‚   β”‚
β”‚  β”‚  visibility = 'project'                              β”‚   β”‚
β”‚  β”‚  Accessible to all sessions in project               β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚   β”‚
β”‚  β”‚  β”‚              SESSION SCOPE                   β”‚    β”‚   β”‚
β”‚  β”‚  β”‚  visibility = 'session'                      β”‚    β”‚   β”‚
β”‚  β”‚  β”‚  Only accessible to current session          β”‚    β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Scoped Retrieval (FUTURE)

The retrieval function respects permission boundaries:

CREATE OR REPLACE FUNCTION search_memories(
    p_org_id UUID,
    p_project_id UUID,
    p_session_id UUID,
    p_query_embedding vector(1536),
    p_limit INT DEFAULT 10,
    p_memory_types TEXT[] DEFAULT NULL,
    p_min_similarity FLOAT DEFAULT 0.7
)
RETURNS TABLE (
    id UUID,
    content TEXT,
    memory_type TEXT,
    similarity FLOAT,
    scope TEXT
) AS $$
BEGIN
    RETURN QUERY
    SELECT
        m.id,
        m.content,
        m.memory_type,
        1 - (m.embedding <=> p_query_embedding) as similarity,
        CASE
            WHEN m.session_id = p_session_id THEN 'session'
            WHEN m.project_id = p_project_id THEN 'project'
            ELSE 'org'
        END as scope
    FROM memories m
    WHERE
        m.org_id = p_org_id
        AND (
            (m.visibility = 'session' AND m.session_id = p_session_id)
            OR (m.visibility = 'project' AND (m.project_id = p_project_id OR m.project_id IS NULL))
            OR m.visibility = 'org'
        )
        AND (m.expires_at IS NULL OR m.expires_at > NOW())
        AND (p_memory_types IS NULL OR m.memory_type = ANY(p_memory_types))
        AND (1 - (m.embedding <=> p_query_embedding)) >= p_min_similarity
    ORDER BY
        CASE
            WHEN m.session_id = p_session_id THEN 1
            WHEN m.project_id = p_project_id THEN 2
            ELSE 3
        END,
        m.embedding <=> p_query_embedding
    LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;

Memory Types (FUTURE)

Type Description Default Visibility TTL
fact Learned information project permanent
decision Architectural choices project permanent
context Conversation context session 24h
preference User preferences project permanent
policy Org-wide rules org permanent

Retrieval Patterns (FUTURE)

1. Session Context Retrieval

# Get relevant context for current conversation
memories = await store.search(
    query="user preferences for code style",
    org_id=org_id,
    project_id=project_id,
    session_id=session_id,
    types=['preference', 'context'],
    limit=5
)

2. Project Knowledge Base

# Search project documentation and decisions
memories = await store.search(
    query="authentication implementation",
    org_id=org_id,
    project_id=project_id,
    types=['decision', 'fact'],
    limit=10
)

3. Org-Wide Policies

# Find org-level standards and policies
memories = await store.search(
    query="security requirements",
    org_id=org_id,
    types=['policy'],
    limit=5
)

Implementation Roadmap

Phase What's Added Dependencies Status
Current Session memory via git repos + .claude/ - IMPLEMENTED
Phase 1 User preferences table in Supabase User accounts PLANNED
Phase 2 pgvector extension for semantic search Supabase Pro PLANNED
Phase 3 Memory extraction from conversations LLM pipeline PLANNED
Phase 4 Cross-session memory injection Retrieval function PLANNED
Phase 5 Org-wide memory with permissions Multi-tenancy FUTURE

Design Principles

  1. Layer appropriately - Store at the narrowest scope that makes sense
  2. Extract selectively - Not everything is worth remembering
  3. Retrieve relevantly - Only inject what helps the current task
  4. Decay gracefully - Old memories may be wrong
  5. Respect privacy - Users control their own data
  6. Resolve conflicts - Clear rules when memories contradict

Common Anti-Patterns

Problem Symptom Solution
Memory Hoarding Slow retrieval, noisy context Selective extraction
Stale Memories Wrong preferences applied Decay policies, validation
Scope Leakage User A sees User B's data Enforce scope at query time
Over-Retrieval 100 memories for simple question Tiered injection
No User Control "What does it know about me?" Memory dashboard

Memory transforms AI from a stateless tool into an intelligent partner. The goal isn't perfect recallβ€”it's the right memory at the right time.

Currently, Decision AI implements session-level memory. Cross-session memory is planned for future development based on the pgvector schema design already in the codebase.

Tool Collection Pattern

Organizing and managing tool access in Decision AI


The Core Problem

Tools are atomic. But agents need organized access to many tools.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              NAIVE: Flat List                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [read, write, search, git, api, db, test, deploy...]       β”‚
β”‚                                                              β”‚
β”‚  Problems:                                                   β”‚
β”‚  β€’ Token overhead (each tool = ~100 tokens)                 β”‚
β”‚  β€’ Decision fatigue for the LLM                             β”‚
β”‚  β€’ No semantic grouping                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              BETTER: Tool Collection                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  files/     code/      search/     deploy/                  β”‚
β”‚  β”œβ”€read     β”œβ”€run      β”œβ”€grep      β”œβ”€push                   β”‚
β”‚  β”œβ”€write    β”œβ”€lint     β”œβ”€glob      β”œβ”€rollback               β”‚
β”‚  └─delete   └─test     └─semantic  └─scale                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Decision AI Tool Inventory (IMPLEMENTED)

Decision AI has a specific set of tools organized by context and role:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DECISION AI TOOLS BY CONTEXT                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  ╔═══════════════════════════════════════════════════════════════════════╗ β”‚
β”‚  β•‘  ORCHESTRATOR CONTEXT (workflow_tools MCP)                            β•‘ β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚                                                                             β”‚
β”‚  Session Management:                                                         β”‚
β”‚  β”œβ”€β”€ CUSTOM_FLY_LAUNCH_SESSION      # Launch from saved template           β”‚
β”‚  β”œβ”€β”€ CUSTOM_FLY_GET_SESSION_STATUS  # Check session health                 β”‚
β”‚  β”œβ”€β”€ CUSTOM_FLY_LIST_SESSIONS       # List active sessions                 β”‚
β”‚  └── CUSTOM_FLY_STOP_SESSION        # Destroy session                      β”‚
β”‚                                                                             β”‚
β”‚  Builder Lifecycle:                                                          β”‚
β”‚  β”œβ”€β”€ CUSTOM_FLY_LAUNCH_BUILDER      # Spawn ephemeral builder session      β”‚
β”‚  └── CUSTOM_ACP_SEND_MESSAGE        # Send instructions to any session     β”‚
β”‚                                                                             β”‚
β”‚  Template Management:                                                        β”‚
β”‚  β”œβ”€β”€ CUSTOM_FLY_LIST_TEMPLATES      # Show available templates             β”‚
β”‚  β”œβ”€β”€ CUSTOM_FLY_SAVE_TEMPLATE       # Save build as reusable template      β”‚
β”‚  └── CUSTOM_FLY_DELETE_TEMPLATE     # Remove saved template                β”‚
β”‚                                                                             β”‚
β”‚  Image Management:                                                           β”‚
β”‚  └── CUSTOM_FLY_LIST_IMAGES         # Show available Docker images         β”‚
β”‚                                                                             β”‚
β”‚  Human Interaction:                                                          β”‚
β”‚  β”œβ”€β”€ wait_for_human_decision        # Present options, wait for choice     β”‚
β”‚  └── wait_for_human_input           # Free-form input from user            β”‚
β”‚                                                                             β”‚
β”‚  ╔═══════════════════════════════════════════════════════════════════════╗ β”‚
β”‚  β•‘  BUILDER CONTEXT (Claude Code tools)                                  β•‘ β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚                                                                             β”‚
β”‚  File Operations:                                                            β”‚
β”‚  β”œβ”€β”€ Read           # Read file contents                                   β”‚
β”‚  β”œβ”€β”€ Write          # Write new files                                      β”‚
β”‚  β”œβ”€β”€ Edit           # Edit existing files                                  β”‚
β”‚  β”œβ”€β”€ Glob           # Find files by pattern                                β”‚
β”‚  └── Grep           # Search file contents                                 β”‚
β”‚                                                                             β”‚
β”‚  Shell:                                                                      β”‚
β”‚  └── Bash           # Execute shell commands                               β”‚
β”‚                                                                             β”‚
β”‚  Deployment (via Bash):                                                      β”‚
β”‚  β”œβ”€β”€ fly deploy --remote-only   # Deploy to Fly.io                         β”‚
β”‚  β”œβ”€β”€ gh repo create             # Create GitHub repos                      β”‚
β”‚  └── gh auth status             # Check GitHub auth                        β”‚
β”‚                                                                             β”‚
β”‚  **NOTE**: No Docker CLI - builds happen remotely on Fly's infrastructure  β”‚
β”‚                                                                             β”‚
β”‚  ╔═══════════════════════════════════════════════════════════════════════╗ β”‚
β”‚  β•‘  VOICE SUPERVISOR CONTEXT                                             β•‘ β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚                                                                             β”‚
β”‚  Thread Communication:                                                       β”‚
β”‚  β”œβ”€β”€ CUSTOM_VOICE_SEND_TO_THREAD    # Post messages/artifacts to thread   β”‚
β”‚  β”œβ”€β”€ CUSTOM_VOICE_MESSAGE_EDIT      # Edit previous messages              β”‚
β”‚  └── CUSTOM_VOICE_MESSAGE_DELETE    # Delete messages                     β”‚
β”‚                                                                             β”‚
β”‚  Session Management:                                                         β”‚
β”‚  β”œβ”€β”€ CUSTOM_VOICE_SESSION_WRITE     # Update curated_context              β”‚
β”‚  β”œβ”€β”€ CUSTOM_VOICE_SESSION_STOP      # End the voice session               β”‚
β”‚  └── CUSTOM_VOICE_CREATE_ARTIFACT   # Create rich artifacts               β”‚
β”‚                                                                             β”‚
β”‚  Discord Read Access:                                                        β”‚
β”‚  β”œβ”€β”€ CUSTOM_DISCORD_READ_THREAD     # Read thread history                 β”‚
β”‚  β”œβ”€β”€ CUSTOM_DISCORD_READ_CHANNEL    # Read channel history                β”‚
β”‚  └── CUSTOM_DISCORD_PARSE_LINK      # Extract content from Discord links  β”‚
β”‚                                                                             β”‚
β”‚  ╔═══════════════════════════════════════════════════════════════════════╗ β”‚
β”‚  β•‘  SESSION CONTEXT (inside user work sessions)                          β•‘ β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚                                                                             β”‚
β”‚  ACP Tools:                                                                  β”‚
β”‚  β”œβ”€β”€ mcp__acp__Read     # Read files via ACP                              β”‚
β”‚  β”œβ”€β”€ mcp__acp__Write    # Write files via ACP                             β”‚
β”‚  └── mcp__acp__Edit     # Edit files via ACP                              β”‚
β”‚                                                                             β”‚
β”‚  Framework Tools (varies by framework):                                      β”‚
β”‚  β”œβ”€β”€ mcp__marimo__run_cell      # Run Marimo notebook cells               β”‚
β”‚  β”œβ”€β”€ mcp__marimo__get_state     # Get variable values                     β”‚
β”‚  └── (other framework-specific tools)                                      β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ACP Protocol: Inter-Session Communication

The Agent Communication Protocol (ACP) enables Claude sessions to talk to each other:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ACP PROTOCOL FLOW                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   ORCHESTRATOR                               SESSION                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚   β”‚ Discord Bot     β”‚    WebSocket (SSE)    β”‚ Fly.io Machine  β”‚            β”‚
β”‚   β”‚ workflow_tools  β”‚ ◄───────────────────► β”‚ Claude + App    β”‚            β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                             β”‚
β”‚   CUSTOM_ACP_SEND_MESSAGE Parameters:                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚ {                                                                    β”‚  β”‚
β”‚   β”‚   "app_name": "mmm-abc123",        # Target session                  β”‚  β”‚
β”‚   β”‚   "message": "Run the analysis",   # Instructions for session Claude β”‚  β”‚
β”‚   β”‚   "timeout": 300                   # Wait up to 5 minutes            β”‚  β”‚
β”‚   β”‚ }                                                                    β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                             β”‚
β”‚   Response Format:                                                           β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚ {                                                                    β”‚  β”‚
β”‚   β”‚   "status": "complete",                                             β”‚  β”‚
β”‚   β”‚   "response": "Analysis complete. Results saved to output/",        β”‚  β”‚
β”‚   β”‚   "artifacts": ["output/model.pkl", "output/report.html"]           β”‚  β”‚
β”‚   β”‚ }                                                                    β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                             β”‚
β”‚   USE CASES:                                                                 β”‚
β”‚   β€’ Orchestrator β†’ Builder: "Clone this repo, deploy as Streamlit app"     β”‚
β”‚   β€’ Orchestrator β†’ Session: "Run this analysis with these parameters"      β”‚
β”‚   β€’ Supervisor β†’ Session: "Execute code in Marimo notebook"                β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Fly.io Session Management Tools

The CUSTOM_FLY_* tools manage the lifecycle of ephemeral sessions:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FLY.IO SESSION LIFECYCLE                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   TEMPLATE APPS (hardcoded sources):                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚ mmm-studio:                                                          β”‚  β”‚
β”‚   β”‚   fly_app: pymc-mmm-studio                                          β”‚  β”‚
β”‚   β”‚   features: [marimo notebook, Claude assistant, ACP shim]            β”‚  β”‚
β”‚   β”‚                                                                      β”‚  β”‚
β”‚   β”‚ mmm-deepagent:                                                       β”‚  β”‚
β”‚   β”‚   fly_app: pymc-mmm-deepagent                                       β”‚  β”‚
β”‚   β”‚   features: [autonomous analysis, real-time progress, Svelte UI]     β”‚  β”‚
β”‚   β”‚                                                                      β”‚  β”‚
β”‚   β”‚ decision-pack-compiler:                                              β”‚  β”‚
β”‚   β”‚   fly_app: decision-pack-compiler                                   β”‚  β”‚
β”‚   β”‚   features: [Docker-in-Docker, Claude assistant, git, Fly deploy]    β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                             β”‚
β”‚   LAUNCH FLOW:                                                               β”‚
β”‚   1. CUSTOM_FLY_LAUNCH_SESSION(template="mmm-studio")                       β”‚
β”‚      β”œβ”€β”€ Create new Fly app: mmm-{random_hex}                              β”‚
β”‚      β”œβ”€β”€ Allocate IPs (shared_v4 + v6)                                     β”‚
β”‚      β”œβ”€β”€ Fetch image from template app                                     β”‚
β”‚      β”œβ”€β”€ Create machine with env vars (ANTHROPIC_API_KEY, etc.)           β”‚
β”‚      β”œβ”€β”€ Wait for machine to start                                         β”‚
β”‚      └── Return: app_name, URLs (marimo, acp)                              β”‚
β”‚                                                                             β”‚
β”‚   MANAGEMENT:                                                                β”‚
β”‚   β€’ CUSTOM_FLY_GET_SESSION_STATUS(app_name) β†’ state, region, URLs          β”‚
β”‚   β€’ CUSTOM_FLY_LIST_SESSIONS() β†’ all active mmm-* apps                     β”‚
β”‚   β€’ CUSTOM_FLY_STOP_SESSION(app_name) β†’ destroy app and resources          β”‚
β”‚                                                                             β”‚
β”‚   MACHINE SPECS:                                                             β”‚
β”‚   β€’ cpu_kind: "shared" (default) or "performance"                          β”‚
β”‚   β€’ cpus: 1, 2, 4, or 8                                                    β”‚
β”‚   β€’ memory_mb: 256 to 16384                                                β”‚
β”‚   β€’ region: iad (default), ord, lax, etc.                                  β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tool Design Philosophy

Decision AI tools are intentionally minimal and declarative:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TOOL DESIGN PHILOSOPHY                                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  TOOLS DON'T:                           CLAUDE DECIDES:                     β”‚
β”‚  ────────────                           ───────────────                     β”‚
β”‚                                                                             β”‚
β”‚  β€’ Define Dockerfile schemas            β€’ How to structure Dockerfile       β”‚
β”‚  β€’ Specify skill parameters             β€’ What skills to generate           β”‚
β”‚  β€’ Force framework choices              β€’ Which framework detected          β”‚
β”‚  β€’ Hardcode build patterns              β€’ Best build approach for repo      β”‚
β”‚  β€’ Require specific outputs             β€’ How to format results             β”‚
β”‚                                                                             β”‚
β”‚  WHY: Builder Claude uses meta-skills to make intelligent decisions,        β”‚
β”‚  not rigid tool parameters. Tools are capabilities, not constraints.        β”‚
β”‚                                                                             β”‚
β”‚  EXAMPLE - Repo2Run Style (Puppeteer):                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ build(repo_url, framework="streamlit", python="3.11", ...)         β”‚   β”‚
β”‚  β”‚ β†’ Rigid parameters dictate every choice                             β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β”‚  EXAMPLE - Decision AI Style (Embodied):                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ CUSTOM_ACP_SEND_MESSAGE(                                            β”‚   β”‚
β”‚  β”‚   app_name="mmm-builder-abc",                                       β”‚   β”‚
β”‚  β”‚   message="Clone github.com/user/repo and deploy as appropriate"    β”‚   β”‚
β”‚  β”‚ )                                                                    β”‚   β”‚
β”‚  β”‚ β†’ Claude analyzes repo, uses meta-skills, makes decisions          β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

MCP Integration Pattern

Decision AI uses MCP (Model Context Protocol) for tool composition:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  UNIFIED TOOL VIEW                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚   Local     β”‚  β”‚    MCP:     β”‚  β”‚    MCP:     β”‚            β”‚
β”‚  β”‚   Tools     β”‚  β”‚   ACP       β”‚  β”‚  Framework  β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚         β”‚                β”‚                β”‚                    β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                          β–Ό                                     β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
β”‚              β”‚  Unified View      β”‚                            β”‚
β”‚              β”‚                    β”‚                            β”‚
β”‚              β”‚  read_file         β”‚ (local)                    β”‚
β”‚              β”‚  mcp__acp__Read    β”‚ (mcp:acp)                  β”‚
β”‚              β”‚  mcp__marimo__*    β”‚ (mcp:framework)            β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β”‚                                                                β”‚
β”‚  Adapters transform external MCP tools into uniform format     β”‚
β”‚                                                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Composio Integration (Third-Party Tools)

Decision AI can integrate with external services via Composio:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    COMPOSIO INTEGRATION                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  Available Integrations:                                                     β”‚
β”‚  β”œβ”€β”€ Toggl     β†’ Time tracking (TOGGL_CREATE_TIME_ENTRY, etc.)             β”‚
β”‚  β”œβ”€β”€ Langfuse  β†’ Observability (LANGFUSE_GET_TRACES, SCORE)                β”‚
β”‚  β”œβ”€β”€ GitHub    β†’ Repository operations                                      β”‚
β”‚  └── (others)  β†’ As configured in Composio                                  β”‚
β”‚                                                                             β”‚
β”‚  How It Works:                                                               β”‚
β”‚  1. Composio wraps external APIs as MCP tools                              β”‚
β”‚  2. Tools registered in workflow_tools MCP server                          β”‚
β”‚  3. Claude calls them like any other tool                                  β”‚
β”‚  4. Composio handles auth, rate limiting, retries                          β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Role-Based Tool Composition

Different contexts get different tool sets:

Master Collection
       β”‚
       β”œβ”€β”€β–Ί Orchestrator [CUSTOM_FLY_*, CUSTOM_ACP_*, human_*]
       β”‚    └── Can spawn/destroy sessions, communicate via ACP
       β”‚
       β”œβ”€β”€β–Ί Builder      [Read, Write, Bash, fly, gh]
       β”‚    └── Can analyze repos, generate files, deploy
       β”‚
       β”œβ”€β”€β–Ί Supervisor   [CUSTOM_VOICE_*, CUSTOM_DISCORD_*]
       β”‚    └── Can post to threads, curate context
       β”‚
       └──► Session      [mcp__acp__*, mcp__{framework}__*]
            └── Can edit files, run framework code

Token Management

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              PROGRESSIVE LOADING                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚   PROBLEM: 50 tools Γ— 100 tokens = 5000 tokens of overhead  β”‚
β”‚                                                              β”‚
β”‚   SOLUTION: Load only what's needed for the context         β”‚
β”‚                                                              β”‚
β”‚   Orchestrator context: ~10 tools (FLY_*, ACP_*, human_*)   β”‚
β”‚   Builder context: ~8 tools (Read, Write, Bash, etc.)       β”‚
β”‚   Session context: ~5 tools (acp__*, framework__*)          β”‚
β”‚                                                              β”‚
β”‚   Each context is optimized for its specific role           β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Principles

Principle Implementation
Role-based composition Orchestrator vs Builder vs Session tools
Namespace isolation Avoid MCP tool name conflicts
Declarative tools Claude decides, tools execute
ACP as glue Inter-session communication standard
Token awareness Load only what's needed

Anti-Patterns

Bad Good
Flat list of 100+ tools Role-based collections
All tools always loaded Context-specific loading
Tools dictate strategy Claude decides, tools execute
Tight MCP coupling Adapter pattern
Complex tool parameters Simple, declarative tools

Trust Differentiators

Trust-building mechanisms for AI agent operations


The Trust Stack

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              LAYER 5: GOVERNANCE                         β”‚
β”‚         Who approved? What policies apply?               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              LAYER 4: ACCOUNTABILITY                     β”‚
β”‚         Full audit trail. Reproducible results.          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              LAYER 3: UNCERTAINTY                        β”‚
β”‚         Confidence intervals. Known limitations.         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              LAYER 2: EXPLAINABILITY                     β”‚
β”‚         Why this answer? What factors?                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              LAYER 1: TRANSPARENCY                       β”‚
β”‚         What data? What methodology?                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

All layers must be satisfied for trusted operation.


Decision AI Trust Implementation (CURRENT)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DECISION AI TRUST MECHANISMS                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  GIT AS AUDIT TRAIL                                                          β”‚
β”‚  ─────────────────                                                          β”‚
β”‚  Every session gets a GitHub repo that tracks:                              β”‚
β”‚  β€’ All file changes with diffs                                              β”‚
β”‚  β€’ Timestamps for every commit                                              β”‚
β”‚  β€’ Full browsable history                                                   β”‚
β”‚  β€’ Ability to revert to any state                                           β”‚
β”‚                                                                             β”‚
β”‚  EXECUTION RULES IN CLAUDE.MD                                                β”‚
β”‚  ────────────────────────────                                               β”‚
β”‚  Sessions have explicit rules that Claude MUST follow:                      β”‚
β”‚                                                                             β”‚
β”‚  ### PROHIBITED Actions                                                      β”‚
β”‚  ❌ Running `python -c "..."` via bash                                      β”‚
β”‚  ❌ Running `python script.py` directly                                     β”‚
β”‚  ❌ Using sed/awk/grep to modify code files                                 β”‚
β”‚  ❌ Text manipulation with regex on notebook/app files                      β”‚
β”‚  ❌ Direct file edits outside ACP tools                                     β”‚
β”‚  ❌ Any bash command that executes Python code                              β”‚
β”‚                                                                             β”‚
β”‚  ### REQUIRED Actions                                                        β”‚
β”‚  βœ… Use ACP tools (mcp__acp__Read, Edit, Write) for file operations         β”‚
β”‚  βœ… Use framework MCP tools (mcp__{framework}__*) for interactions          β”‚
β”‚  βœ… Let framework runtime handle code execution                             β”‚
β”‚  βœ… Read current state before making changes                                β”‚
β”‚  βœ… Respond via ACP when task completes                                     β”‚
β”‚                                                                             β”‚
β”‚  SESSION ISOLATION                                                           β”‚
β”‚  ─────────────────                                                          β”‚
β”‚  Each session is:                                                            β”‚
β”‚  β€’ Separate Fly.io machine (isolated container)                             β”‚
β”‚  β€’ Separate git repo (isolated history)                                     β”‚
β”‚  β€’ Separate .claude/ (isolated instructions)                                β”‚
β”‚  No cross-contamination between sessions.                                   β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Permission as Trust Boundary

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Claude wants to:        Permission check:      OK?    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Read file          ───► Is path allowed?       [Y/N]  β”‚
β”‚   Write file         ───► User approved?         [Y/N]  β”‚
β”‚   Execute command    ───► Command allowlisted?   [Y/N]  β”‚
β”‚   Access network     ───► Domain permitted?      [Y/N]  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key: Permission layer is the ONLY enforcement point.


Trust Escalation Patterns

Ask First (Default)

Claude: "May I write to x.py?"
User:   "Yes"
Claude: [writes file]

Pre-Authorized

User sets: "Allow writes to /app/*"
Claude: [writes to /app/x.py - auto-approved]

Yolo Mode (High Trust)

User sets: "Trust Claude fully"
Claude: [any operation - auto-approved]

Risk increases with trust level.


Governance Hierarchy

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              SETTINGS.JSON (Org policy)                  β”‚
β”‚   Blocked: ["rm -rf", "sudo"]                           β”‚
β”‚   Required approval: ["network", "exec"]                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ overrides
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         .claude/settings.json (Project)                  β”‚
β”‚   Allow: ["npm install", "pytest"]                      β”‚
β”‚   Deny: ["npm publish"]                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ overrides
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           SESSION PERMISSIONS (Runtime)                  β”‚
β”‚   "Always allow read in /src"                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Rule: Lower levels can only RESTRICT, never EXPAND.


Transparency Model

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Claude's View  β”‚              β”‚  User's View    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€              β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Tool: write     β”‚   ────────►  β”‚ Claude wants to β”‚
β”‚ Path: /app/x.py β”‚              β”‚ write to x.py   β”‚
β”‚ Content: ...    β”‚              β”‚                 β”‚
β”‚                 β”‚              β”‚ [Allow] [Deny]  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

User sees WHAT, not just THAT.


Audit Trail

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AUDIT LOG                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  2024-01-15 10:23:45                                    β”‚
β”‚  β”œβ”€ Tool: bash                                          β”‚
β”‚  β”œβ”€ Command: npm install lodash                         β”‚
β”‚  β”œβ”€ Permission: pre-authorized (allowlist)              β”‚
β”‚  └─ Result: success                                     β”‚
β”‚                                                          β”‚
β”‚  2024-01-15 10:24:12                                    β”‚
β”‚  β”œβ”€ Tool: write                                         β”‚
β”‚  β”œβ”€ Path: /src/utils.js                                 β”‚
β”‚  β”œβ”€ Permission: user-approved (session)                 β”‚
β”‚  └─ Diff: +15 lines, -3 lines                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Everything is logged. In Decision AI, git history provides this audit trail.


Sandboxing: Defense in Depth

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    HOST SYSTEM                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                   FLY.IO MACHINE                   β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚                 CONTAINER                    β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚           CLAUDE SESSION               β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β€’ Limited filesystem (git-tracked)    β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β€’ Network via ACP only               β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β€’ No privileged ops                   β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β€’ Framework runtime sandbox           β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Even if Claude "escapes" permissions, Fly.io container contains damage.


Decision AI Specific: Embodied vs Puppeteer Trust

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TRUST IMPLICATIONS OF EMBODIED APPROACH                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  PUPPETEER APPROACH (Repo2Run-style)                                         β”‚
β”‚  ───────────────────────────────────                                        β”‚
β”‚  🧠 External LLM β†’ πŸ“¦ Dumb container                                        β”‚
β”‚                                                                             β”‚
β”‚  Trust model:                                                                β”‚
β”‚  β€’ External LLM has full control                                            β”‚
β”‚  β€’ Container is passive                                                      β”‚
β”‚  β€’ Easy to snapshot/rollback (external can capture state)                   β”‚
β”‚  β€’ Deterministic behavior                                                   β”‚
β”‚                                                                             β”‚
β”‚  EMBODIED APPROACH (Decision AI)                                             β”‚
β”‚  ───────────────────────────────                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                   β”‚
β”‚  β”‚ 🧠 Claude INSIDE     β”‚                                                   β”‚
β”‚  β”‚ πŸ“¦ Container (body)  β”‚                                                   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                   β”‚
β”‚                                                                             β”‚
β”‚  Trust model:                                                                β”‚
β”‚  β€’ Claude has autonomy within session                                       β”‚
β”‚  β€’ Trust enforced via:                                                      β”‚
β”‚    - CLAUDE.md execution rules (what Claude believes it should do)          β”‚
β”‚    - ACP tool restrictions (what Claude can actually do)                    β”‚
β”‚    - Fly.io container isolation (blast radius containment)                  β”‚
β”‚    - Git tracking (full audit trail)                                        β”‚
β”‚  β€’ Requires orchestrator for external rollback                              β”‚
β”‚  β€’ Adaptive behavior (can reason about problems)                            β”‚
β”‚                                                                             β”‚
β”‚  TRADE-OFF:                                                                  β”‚
β”‚  Embodied is more capable but requires more sophisticated trust controls    β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

User Control Primitives

Control Meaning
Allow once This action, this time
Allow always This action type, this session
Allow pattern Matching actions, forever
Deny Block this action
Deny pattern Block all matching, forever
Review Show what happened
Undo Reverse last action (git revert in Decision AI)

Granular control at every level.


The Trust Equation

EFFECTIVE TRUST = MODEL CAPABILITY Γ— PERMISSION BOUNDARY

High capability + Tight boundary = Safe power
High capability + No boundary    = Dangerous
Low capability  + Tight boundary = Limited but safe

Decision AI approach: High capability (full Claude Sonnet/Opus), explicit boundaries (CLAUDE.md rules + ACP tools + container isolation).


Key Principles

Principle Implementation
Show work Methodology + data sources (git history)
Quantify uncertainty Confidence intervals in analysis
Acknowledge limits Explicit caveats in CLAUDE.md
Enable reproduction Full git history + template relaunch
Human oversight Orchestrator can intervene via ACP

Anti-Patterns

Bad Better
"The answer is X" "X with Y% confidence"
Black box Full git transparency
No audit trail Git tracks everything
Auto-everything Risk-based approval
Overpromising Honest limitations in CLAUDE.md
Trust by default Explicit permission boundaries

Evaluation Framework

Testing and evaluation methodology for Decision AI


Why AI Evaluation is Different

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   TRADITIONAL SOFTWARE   β”‚      AI SYSTEMS          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Deterministic          β”‚   Probabilistic          β”‚
β”‚   assert == expected     β”‚   result varies          β”‚
β”‚                          β”‚                          β”‚
β”‚   Binary correctness     β”‚   Degrees of quality     β”‚
β”‚   pass/fail              β”‚   good/better/best       β”‚
β”‚                          β”‚                          β”‚
β”‚   Exact matching         β”‚   Semantic equivalence   β”‚
β”‚   "hello" == "hello"     β”‚   "hello" β‰ˆ "hi there"  β”‚
β”‚                          β”‚                          β”‚
β”‚   Known edge cases       β”‚   Unbounded input space  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Decision AI Evaluation Strategy

Decision AI has multiple evaluation concerns across its components:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DECISION AI EVALUATION MATRIX                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  BUILDER EVALUATION                                                          β”‚
β”‚  ─────────────────                                                          β”‚
β”‚  Does Builder correctly:                                                     β”‚
β”‚  β€’ Detect frameworks? (Marimo, Streamlit, FastAPI, etc.)                    β”‚
β”‚  β€’ Generate valid Dockerfiles?                                              β”‚
β”‚  β€’ Generate appropriate .claude/CLAUDE.md?                                  β”‚
β”‚  β€’ Respect existing .claude/ in repos?                                      β”‚
β”‚  β€’ Create working deployments?                                              β”‚
β”‚  β€’ Include ACP server in production images?                                 β”‚
β”‚                                                                             β”‚
β”‚  SESSION EVALUATION                                                          β”‚
β”‚  ─────────────────                                                          β”‚
β”‚  Does session Claude correctly:                                              β”‚
β”‚  β€’ Follow execution rules in CLAUDE.md?                                     β”‚
β”‚  β€’ Use ACP tools instead of raw Python?                                     β”‚
β”‚  β€’ Use framework-specific MCP tools?                                        β”‚
β”‚  β€’ Respond via ACP when done?                                               β”‚
β”‚  β€’ Handle long-running tasks with progress updates?                         β”‚
β”‚                                                                             β”‚
β”‚  VOICE EVALUATION                                                            β”‚
β”‚  ────────────────                                                           β”‚
β”‚  Does voice architecture correctly:                                          β”‚
β”‚  β€’ Achieve low-latency responses (Fast Agent)?                              β”‚
β”‚  β€’ Curate context effectively (Supervisor)?                                 β”‚
β”‚  β€’ Post appropriate artifacts to thread?                                    β”‚
β”‚  β€’ Handle mode transitions (discovery ↔ delivery)?                          β”‚
β”‚                                                                             β”‚
β”‚  ORCHESTRATOR EVALUATION                                                     β”‚
β”‚  ──────────────────────                                                     β”‚
β”‚  Does orchestrator correctly:                                                β”‚
β”‚  β€’ Route "build X" to builder launch?                                       β”‚
β”‚  β€’ Route messages to appropriate workflows?                                 β”‚
β”‚  β€’ Clean up sessions after completion?                                      β”‚
β”‚  β€’ Handle errors gracefully?                                                β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Insight Recovery Benchmark

The Insight Recovery evaluation measures how well the voice system captures and surfaces insights during research conversations:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    INSIGHT RECOVERY BENCHMARK                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  WHAT IT MEASURES:                                                           β”‚
β”‚  Given a research conversation, how many of the key insights are:           β”‚
β”‚  1. Captured by the Supervisor                                              β”‚
β”‚  2. Written to curated_context                                              β”‚
β”‚  3. Posted to thread as artifacts                                           β”‚
β”‚  4. Delivered verbally by Fast Agent                                        β”‚
β”‚                                                                             β”‚
β”‚  TEST METHODOLOGY:                                                           β”‚
β”‚                                                                             β”‚
β”‚  1. PREPARE TEST CONVERSATION                                                β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚     β”‚ User: "How does the authentication system work?"                     β”‚β”‚
β”‚     β”‚ Fast Agent: "Let me look into that for you..."                       β”‚β”‚
β”‚     β”‚ [Supervisor researches, finds 5 key components]                     β”‚β”‚
β”‚     β”‚ User: "What about the JWT implementation?"                           β”‚β”‚
β”‚     β”‚ [Supervisor finds 3 more insights]                                  β”‚β”‚
β”‚     β”‚ User: "Can you summarize?"                                           β”‚β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                                                                             β”‚
β”‚  2. DEFINE EXPECTED INSIGHTS                                                 β”‚
β”‚     Ground truth: 8 key insights that should be recovered                  β”‚
β”‚     β€’ Auth middleware location                                              β”‚
β”‚     β€’ JWT secret configuration                                              β”‚
β”‚     β€’ Token refresh mechanism                                               β”‚
β”‚     β€’ ... etc.                                                              β”‚
β”‚                                                                             β”‚
β”‚  3. MEASURE RECOVERY                                                         β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚     β”‚ Curated Context Recovery:  6/8 (75%)                                β”‚β”‚
β”‚     β”‚ Thread Artifact Recovery:  7/8 (87.5%)                              β”‚β”‚
β”‚     β”‚ Verbal Summary Recovery:   5/8 (62.5%)                              β”‚β”‚
β”‚     β”‚                                                                      β”‚β”‚
β”‚     β”‚ Combined Insight Recovery Score: 75%                                 β”‚β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                                                                             β”‚
β”‚  SCORING CRITERIA:                                                           β”‚
β”‚  β€’ Exact match: 1.0 (insight captured accurately)                          β”‚
β”‚  β€’ Partial match: 0.5 (insight mentioned but incomplete)                   β”‚
β”‚  β€’ Missing: 0.0 (insight not recovered)                                    β”‚
β”‚                                                                             β”‚
β”‚  ACCEPTABLE THRESHOLDS:                                                      β”‚
β”‚  β€’ Curated Context: >70% recovery                                          β”‚
β”‚  β€’ Thread Artifacts: >80% recovery                                          β”‚
β”‚  β€’ Verbal Summary: >60% recovery (constrained by time)                     β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Insight Recovery Test Cases

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    INSIGHT RECOVERY TEST SUITE                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  TEST 1: Codebase Exploration                                                β”‚
β”‚  ─────────────────────────────                                              β”‚
β”‚  Scenario: User asks about a codebase they haven't seen                     β”‚
β”‚  Expected insights: Architecture, key files, patterns, dependencies         β”‚
β”‚  Measure: How many architectural insights are captured                      β”‚
β”‚                                                                             β”‚
β”‚  TEST 2: Multi-Turn Research                                                 β”‚
β”‚  ───────────────────────────                                                β”‚
β”‚  Scenario: User asks 5+ follow-up questions, drilling deeper                β”‚
β”‚  Expected insights: Accumulated context from all turns                      β”‚
β”‚  Measure: Does Supervisor maintain coherent picture across turns            β”‚
β”‚                                                                             β”‚
β”‚  TEST 3: Mode Transition                                                     β”‚
β”‚  ────────────────────────                                                   β”‚
β”‚  Scenario: User transitions from discovery to delivery                      β”‚
β”‚  Expected: Summary captures key decisions, next steps                       β”‚
β”‚  Measure: Quality of transition summary                                     β”‚
β”‚                                                                             β”‚
β”‚  TEST 4: Context Window Pressure                                             β”‚
β”‚  ────────────────────────────────                                           β”‚
β”‚  Scenario: Long conversation (50+ messages)                                 β”‚
β”‚  Expected: Important early insights not lost                                β”‚
β”‚  Measure: Recovery rate of insights from first 10 messages                  β”‚
β”‚                                                                             β”‚
β”‚  TEST 5: Contradiction Handling                                              β”‚
β”‚  ──────────────────────────────                                             β”‚
β”‚  Scenario: Later finding contradicts earlier insight                        β”‚
β”‚  Expected: Supervisor updates context, notes change                         β”‚
β”‚  Measure: Does final context reflect correct information                    β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Builder Verification Tests

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BUILDER VERIFICATION TESTS                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  TEST 1: Framework Detection                                                 β”‚
β”‚  Input: Various repos (Marimo, Streamlit, FastAPI, Gradio, etc.)            β”‚
β”‚  Expected: Correct framework identified                                      β”‚
β”‚  Scorer: Exact match on detected framework                                  β”‚
β”‚                                                                             β”‚
β”‚  TEST 2: Dockerfile Generation                                               β”‚
β”‚  Input: Repo with requirements.txt                                          β”‚
β”‚  Expected: Valid Dockerfile that builds                                     β”‚
β”‚  Scorer: fly deploy --remote-only succeeds                                  β”‚
β”‚                                                                             β”‚
β”‚  TEST 3: Existing .claude/ Preservation                                      β”‚
β”‚  Input: Repo with custom .claude/CLAUDE.md                                  β”‚
β”‚  Expected: Merged output contains user's custom instructions                β”‚
β”‚  Scorer: Contains check + semantic similarity                               β”‚
β”‚                                                                             β”‚
β”‚  TEST 4: ACP Server Inclusion                                                β”‚
β”‚  Input: Any repo                                                            β”‚
β”‚  Expected: Production image includes ACP server                             β”‚
β”‚  Scorer: ACP endpoint responds on deployed image                            β”‚
β”‚                                                                             β”‚
β”‚  TEST 5: Git Repo Creation                                                   β”‚
β”‚  Input: Build request                                                       β”‚
β”‚  Expected: GitHub repo created with proper structure                        β”‚
β”‚  Scorer: gh repo view succeeds, has expected files                          β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Session Compliance Tests

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SESSION COMPLIANCE TESTS                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  TEST 1: Execution Rule Compliance                                           β”‚
β”‚  ────────────────────────────────                                           β”‚
β”‚  Input: Task that could be done with raw Python                             β”‚
β”‚  Expected: Claude uses ACP tools, not python -c                             β”‚
β”‚  Scorer: No Bash(python) calls in trace                                     β”‚
β”‚                                                                             β”‚
β”‚  TEST 2: Framework Tool Usage                                                β”‚
β”‚  ─────────────────────────────                                              β”‚
β”‚  Input: "Run this cell in the notebook"                                     β”‚
β”‚  Expected: Uses mcp__marimo__run_cell                                       β”‚
β”‚  Scorer: Correct MCP tool called                                            β”‚
β”‚                                                                             β”‚
β”‚  TEST 3: ACP Response                                                        β”‚
β”‚  ────────────────────                                                       β”‚
β”‚  Input: Task via ACP message                                                β”‚
β”‚  Expected: Session responds via ACP when complete                           β”‚
β”‚  Scorer: ACP response received within timeout                               β”‚
β”‚                                                                             β”‚
β”‚  TEST 4: Progress Updates                                                    β”‚
β”‚  ─────────────────────                                                      β”‚
β”‚  Input: Long-running task (>30s)                                            β”‚
β”‚  Expected: Progress updates sent periodically                               β”‚
β”‚  Scorer: Multiple ACP messages before completion                            β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Evaluation Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    EVALUATION PIPELINE                                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   TEST DATASET                                                               β”‚
β”‚   [Case 1] [Case 2] [Case 3] ... [Case N]                                   β”‚
β”‚        β”‚                                                                    β”‚
β”‚        β–Ό                                                                    β”‚
β”‚   SYSTEM UNDER TEST                                                          β”‚
β”‚   Run inputs, capture: output, latency, tokens, cost                        β”‚
β”‚        β”‚                                                                    β”‚
β”‚        β–Ό                                                                    β”‚
β”‚   SCORERS                                                                    β”‚
β”‚   β”œβ”€β”€ Exact Match (deterministic)                                          β”‚
β”‚   β”œβ”€β”€ Semantic (embedding-based)                                           β”‚
β”‚   β”œβ”€β”€ LLM-as-Judge (model-based)                                           β”‚
β”‚   └── Domain-Specific (framework detection, build success, etc.)           β”‚
β”‚        β”‚                                                                    β”‚
β”‚        β–Ό                                                                    β”‚
β”‚   AGGREGATION                                                                β”‚
β”‚   Combine scores, compare baseline, detect regressions                      β”‚
β”‚        β”‚                                                                    β”‚
β”‚        β–Ό                                                                    β”‚
β”‚   REPORTING                                                                  β”‚
β”‚   PR comments, dashboards, alerts                                           β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Scorer Types

Exact Match (Deterministic)

String match:   output == expected
JSON match:     json_equal(output, expected)
Contains:       expected in output

Semantic (Embedding-based)

Similarity:     cosine(embed(output), embed(expected))
Entailment:     does output logically follow?

LLM-as-Judge (Model-based)

Factuality:     "Is this factually correct?"
Helpfulness:    "Does this answer the question?"
Safety:         "Does this contain harmful content?"

Domain-Specific (Custom)

Framework detection:   Correctly identified framework?
Dockerfile validity:   Generated Dockerfile builds?
CLAUDE.md quality:     Contains required sections?
Tool compliance:       Used ACP tools, not raw python?
Build success:         Deployment healthy?
Insight recovery:      Key insights captured?

Regression Detection

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   BASELINE vs CURRENT                                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Metric               Baseline   Current    Status                         β”‚
β”‚   ──────────────       ────────   ───────    ──────                         β”‚
β”‚   Framework Detection  100%       95%        ↓ WARN                         β”‚
β”‚   Build Success        90%        92%        ↑ GOOD                         β”‚
β”‚   Deploy Time          2.3min     4.1min     ↓ FAIL                         β”‚
β”‚   Insight Recovery     72%        78%        ↑ GOOD                         β”‚
β”‚   Tool Compliance      98%        99%        ↑ GOOD                         β”‚
β”‚   Cost/build           $0.02      $0.03      ↓ WARN                         β”‚
β”‚                                                                             β”‚
β”‚   Use statistical significance. Don't alert on noise.                       β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Golden Dataset Composition

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              GOLDEN TEST DATASET                                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   HAPPY PATH (40%)                                                          β”‚
β”‚   β€’ Typical repos, expected use cases                                       β”‚
β”‚   β€’ Simple Marimo notebooks                                                 β”‚
β”‚   β€’ Standard FastAPI apps                                                   β”‚
β”‚   β€’ Normal voice conversations                                              β”‚
β”‚                                                                             β”‚
β”‚   EDGE CASES (25%)                                                          β”‚
β”‚   β€’ Multi-framework repos                                                   β”‚
β”‚   β€’ Missing requirements.txt                                                β”‚
β”‚   β€’ Unusual directory structures                                            β”‚
β”‚   β€’ Long voice sessions (context pressure)                                  β”‚
β”‚                                                                             β”‚
β”‚   ADVERSARIAL (15%)                                                         β”‚
β”‚   β€’ Conflicting Dockerfiles                                                 β”‚
β”‚   β€’ Malformed .claude/ directories                                          β”‚
β”‚   β€’ Very large repos                                                        β”‚
β”‚   β€’ Contradictory research findings                                         β”‚
β”‚                                                                             β”‚
β”‚   REGRESSION (20%)                                                          β”‚
β”‚   β€’ Cases from past build failures                                          β”‚
β”‚   β€’ Production incidents                                                    β”‚
β”‚   β€’ Known edge cases                                                        β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

CI/CD Integration

[Code Change] ──► [PR Created]
                       β”‚
                       β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  FAST EVAL      β”‚  < 5 minutes
              β”‚  β€’ Smoke tests  β”‚
              β”‚  β€’ 20 cases     β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β–Ό                     β–Ό
       [PASS]              [FAIL: Block PR]
            β”‚
            β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚  FULL EVAL      β”‚  Async
   β”‚  β€’ All scorers  β”‚
   β”‚  β€’ 150 cases    β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
   Results posted to PR

Metrics That Matter

Metric What it measures Decision AI specific
Pass rate % tasks completed successfully Build success rate
Insight recovery % key insights captured Voice system quality
Token efficiency Tokens per successful task Tokens per build
Time to solve Wall clock per task Deploy time
Tool compliance % correct tool usage ACP vs raw python
Consistency Variance across identical tasks Same repo = same result

Human Evaluation

When to use:

Automated Only            Human Required
──────────────            ──────────────
β€’ Format correctness      β€’ CLAUDE.md quality
β€’ Build success           β€’ Skill appropriateness
β€’ Deployment health       β€’ User experience
β€’ Safety filters          β€’ Edge case decisions
β€’ Tool compliance         β€’ Insight relevance

Key Principles

Principle Why
Statistical significance Don't alert on noise
Multi-dimensional Multiple scorers, not single metric
Regression focus Catch degradation early
Dataset versioning Content-addressed test sets
Fast feedback Quick evals block PRs
Insight recovery Voice quality is measurable

Evaluation is about confidence that the system works. In Decision AI, we measure builder reliability, session compliance, and voice insight recovery to ensure quality across all components.

Decision AI Implementation Roadmap

Executive Summary

This document provides a realistic roadmap based on the ACTUAL current state of Decision AI and the FUTURE vision from thoughts/ documents in the codebase.

KEY INSIGHT: The original roadmap described building from scratch. However, significant infrastructure ALREADY EXISTS, including Builder Claude with meta-skills, workflow routing, and voice session architecture. This updated roadmap acknowledges what's built and focuses on what's needed next.


Current State vs Future Vision

Component Status Notes
Discord Bot IMPLEMENTED Primary user interface
Workflow Executor IMPLEMENTED Claude API with workflow_tools
Workflow Classification IMPLEMENTED LLM-based routing to workflows
Fly.io Tools IMPLEMENTED CUSTOM_FLY_* tools for lifecycle management
ACP Protocol IMPLEMENTED Inter-session WebSocket communication
Session Templates IMPLEMENTED Supabase database records
Builder Claude IMPLEMENTED Meta-skills for .claude/ construction
Git as Source of Truth IMPLEMENTED Session repos track all changes
Builder Lifecycle IMPLEMENTED Spawn, use, cleanup ephemeral builders
Voice Session Architecture IMPLEMENTED 3-Claude: Fast Agent + Supervisor + Session
Supervisor Loop IMPLEMENTED Background polling, context curation
Protocol Layer IMPLEMENTED Standardized contract wrapping Claude Agent SDK
Discord Adapter IMPLEMENTED Platform adapter for Discord
Decision Packs PLANNED GitHub repos as deployable units
Pack Registry PLANNED Searchable index
Memory Layer PLANNED Cross-session persistence
Slack Adapter PLANNED Platform adapter for Slack (enterprise)
Web/Mobile Adapter PLANNED Browser and mobile interfaces

Current Architecture (What EXISTS)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DECISION AI - CURRENT STATE                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   Discord User (text or voice)                                              β”‚
β”‚        β”‚                                                                    β”‚
β”‚        β–Ό                                                                    β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                    DISCORD BOT                                       β”‚  β”‚
β”‚   β”‚                                                                      β”‚  β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚  β”‚
β”‚   β”‚  β”‚ Text Workflow    β”‚    β”‚ Voice Workflow   β”‚                       β”‚  β”‚
β”‚   β”‚  β”‚ Executor         β”‚    β”‚ Executor         β”‚                       β”‚  β”‚
β”‚   β”‚  β”‚ β€’ Classify msg   β”‚    β”‚ β€’ Fast Agent     β”‚                       β”‚  β”‚
β”‚   β”‚  β”‚ β€’ Route to flow  β”‚    β”‚ β€’ Supervisor     β”‚                       β”‚  β”‚
β”‚   β”‚  β”‚ β€’ Execute tools  β”‚    β”‚ β€’ Thread output  β”‚                       β”‚  β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚            β”‚                           β”‚                                    β”‚
β”‚            β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚            β–Ό                                                        β–Ό       β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚   β”‚ BUILDER SESSION β”‚    β”‚  WORK SESSION   β”‚    β”‚ SUPABASE        β”‚       β”‚
β”‚   β”‚ (mmm-builder-*) β”‚    β”‚  (mmm-*)        β”‚    β”‚ (templates,     β”‚       β”‚
β”‚   β”‚                 β”‚    β”‚                 β”‚    β”‚  workflows)     β”‚       β”‚
β”‚   β”‚ β€’ Clone repo    β”‚    β”‚ β€’ User's code   β”‚    β”‚                 β”‚       β”‚
β”‚   β”‚ β€’ Analyze       β”‚    β”‚ β€’ Framework     β”‚    β”‚ β€’ Metadata      β”‚       β”‚
β”‚   β”‚ β€’ Gen .claude/  β”‚    β”‚ β€’ ACP server    β”‚    β”‚ β€’ System promptsβ”‚       β”‚
β”‚   β”‚ β€’ Deploy        β”‚    β”‚ β€’ Git-tracked   β”‚    β”‚ β€’ Configs       β”‚       β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚            β”‚                      β”‚                                        β”‚
β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β”‚                      β”‚                                                     β”‚
β”‚                      β–Ό                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                    GITHUB SESSION REPOS                              β”‚  β”‚
β”‚   β”‚                    github.com/org/session-mmm-{hex}                  β”‚  β”‚
β”‚   β”‚                    β€’ All changes tracked                             β”‚  β”‚
β”‚   β”‚                    β€’ Tags for templates                              β”‚  β”‚
β”‚   β”‚                    β€’ Full browsable history                          β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implemented Components

Core Infrastructure

Component Location Description
Discord Bot app/src/services/discord/ Primary user interface
Workflow Executor app/src/services/workflow_executor.py Claude API with tool execution
Workflow Classification app/src/llm_calls/classify_workflows.py LLM-based message β†’ workflow routing
Fly.io Tools app/src/services/workflow_tools/fly_app_tools.py Session lifecycle management
ACP Tools app/src/services/workflow_tools/acp_tools.py Inter-session communication
Human Interaction app/src/services/workflow_tools/human_interaction.py User decision prompts

Builder Claude Meta-Skills

Meta-Skill Purpose Status
claude-factory Orchestrate entire .claude/ construction βœ… IMPLEMENTED
skill-creation Generate skills from repo analysis βœ… IMPLEMENTED
claude-md-templates Generate CLAUDE.md with execution rules βœ… IMPLEMENTED
merge-strategy Combine repo's .claude/ with base βœ… IMPLEMENTED
dockerfile-gen Framework-specific Dockerfile patterns βœ… IMPLEMENTED
fly-deploy Fly.io deployment patterns βœ… IMPLEMENTED

Voice Session Architecture

Component Location Description
Voice Session Manager app/src/services/voice/session_manager.py Manages VoiceSession state
Fast Agent app/src/services/voice/fast_agent.py Low-latency Claude Haiku responses
Supervisor Loop app/src/services/voice/supervisor_loop.py Background context curation
Voice Tools app/src/services/workflow_tools/voice_tools.py Thread posting, context writing

What's NOT Built Yet

Based on thoughts/ documents and analysis:

Component Status Description
Decision Packs ❌ NOT YET Pre-built GitHub repos with pack.yaml manifests
Pack Registry ❌ NOT YET Searchable index of available packs
Pack Discovery ❌ NOT YET Trigger-based pack matching
Memory Layer ❌ NOT YET Cross-session user/org memory (pgvector)
Web UI ❌ NOT YET Browser-based interface
Multi-Participant Voice ❌ NOT YET Collaborative voice sessions

Implementation Roadmap

Phase 0: Foundation (COMPLETED βœ…)

Already implemented:

  • βœ… Discord bot interface
  • βœ… Workflow executor with Claude API
  • βœ… Workflow classification (LLM-based routing)
  • βœ… Fly.io deployment tools (CUSTOM_FLY_*)
  • βœ… ACP inter-session communication
  • βœ… Supabase database (templates, workflows, sessions)
  • βœ… Human interaction tools
  • βœ… Builder Claude with meta-skills
  • βœ… Git session repos as source of truth
  • βœ… Voice session architecture (3-Claude)
  • βœ… Supervisor Loop for context curation
  • βœ… Configuration spectrum (hardcoded β†’ dynamic)

Phase 1: Stabilization & Polish (2-3 Weeks)

Goal: Harden existing implementation, improve reliability

Task Description Priority
Error Handling Improve builder and voice error recovery HIGH
Build Caching Don't rebuild if image already exists MEDIUM
Template UX Better template listing/selection in Discord MEDIUM
Voice Reliability Handle STT/TTS failures gracefully HIGH
Monitoring Fly.io logs, health checks, alerts HIGH
Documentation User-facing docs for Discord commands MEDIUM

Phase 2: Enhanced Templates (2-3 Weeks)

Goal: Move templates from pure database to hybrid (DB + Git)

Task Description Dependencies
Git URL Field Add git_repo_url to session_templates Migration
Template Cloning If git_url present, clone and use .claude/ Builder
Template Versioning Track template versions with git refs Git repos
Template Sharing Export templates as git repositories Builder
-- Migration: Add git support to templates
ALTER TABLE session_templates
ADD COLUMN git_repo_url TEXT,
ADD COLUMN git_ref TEXT DEFAULT 'main',
ADD COLUMN use_git_config BOOLEAN DEFAULT FALSE;

Phase 3: Voice Enhancement (3-4 Weeks)

Goal: Improve voice experience based on usage patterns

Task Description Priority
Insight Recovery Benchmark and improve insight capture HIGH
Mode Detection Automatic discovery ↔ delivery transitions MEDIUM
Context Compression Handle longer conversations better MEDIUM
Multi-User Voice Multiple participants in voice channel LOW
Voice Commands "Claude, search for..." style triggers MEDIUM
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    PHASE 3: VOICE ENHANCEMENT                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  INSIGHT RECOVERY IMPROVEMENTS:                                              β”‚
β”‚  β”œβ”€β”€ Add structured output to Supervisor                                   β”‚
β”‚  β”œβ”€β”€ Track insight provenance (where did this come from?)                  β”‚
β”‚  β”œβ”€β”€ Deduplicate similar insights                                          β”‚
β”‚  └── Score insights by relevance/importance                                β”‚
β”‚                                                                             β”‚
β”‚  MODE DETECTION:                                                             β”‚
β”‚  β”œβ”€β”€ "I want to explore" β†’ discovery mode                                  β”‚
β”‚  β”œβ”€β”€ "Give me the summary" β†’ delivery mode                                 β”‚
β”‚  β”œβ”€β”€ Questions β†’ discovery                                                 β”‚
β”‚  └── Requests β†’ delivery                                                   β”‚
β”‚                                                                             β”‚
β”‚  CONTEXT COMPRESSION:                                                        β”‚
β”‚  β”œβ”€β”€ Summarize older conversation chunks                                   β”‚
β”‚  β”œβ”€β”€ Keep recent 10 messages in full                                       β”‚
β”‚  └── Use embeddings for semantic deduplication                             β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 4: Pack Foundation (4-6 Weeks)

Goal: Implement the "Decision Pack" vision from research docs

Task Description Dependencies
Pack Schema Define pack.yaml manifest format None
Pack CLI decision-pack create, deploy, list Pack schema
Pack Builder Build containers from pack repos Builder
Pack Registry JSON index of available packs GitHub repos
Pack Repository Structure:

pack-{name}/
β”œβ”€β”€ pack.yaml           # Manifest with triggers, deps
β”œβ”€β”€ Dockerfile          # Container definition
β”œβ”€β”€ fly.toml            # Fly.io config
β”œβ”€β”€ .claude/
β”‚   β”œβ”€β”€ CLAUDE.md       # System prompt + execution rules
β”‚   └── skills/         # Additional skills
β”œβ”€β”€ acp-server/         # ACP communication
└── app/                # Application code

DIFFERENCE FROM CURRENT:
Current: Builder generates everything from any repo
Packs: Pre-defined, pre-built, instant launch

Phase 5: Memory Layer (6-8 Weeks)

Goal: Cross-session memory persistence

Task Description Dependencies
pgvector Setup Enable vector extension in Supabase Supabase Pro
Memory Extraction LLM-based extraction from conversations Pipeline
Memory Retrieval Scoped search with permissions Database
Memory Injection RAG-based context enhancement Retrieval
-- From MEMORY_LAYER_ARCHITECTURE.md design
CREATE TABLE memories (
    id UUID PRIMARY KEY,
    org_id UUID NOT NULL,
    project_id UUID,
    session_id UUID,
    content TEXT NOT NULL,
    embedding vector(1536) NOT NULL,
    memory_type TEXT NOT NULL,  -- 'fact', 'decision', 'preference'
    visibility TEXT DEFAULT 'project',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ
);

Phase 6: Multi-Platform Expansion (KEY USP) (6-8 Weeks)

Goal: Extend beyond Discord to Slack and Web interfaces

CRITICAL DIFFERENTIATOR: This phase leverages our platform-agnostic architecture. The Protocol Layer wrapping Claude Agent SDK enables rapid platform expansion.

Task Description Dependencies
Protocol Layer Stabilization Finalize standardized contract between adapters and orchestrator ACP
Slack Adapter Enterprise-ready Slack integration Protocol Layer
Web REST/WS API Browser-accessible interface Protocol Layer
Mobile API Foundation iOS/Android compatible endpoints Web API
Cross-Platform Sessions Same session accessible from multiple platforms Protocol Layer
Authentication Bridge Unified identity across platforms OAuth
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    PHASE 6: MULTI-PLATFORM EXPANSION (KEY USP)               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   PROTOCOL LAYER (Already Implemented Conceptually):                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  Standardized Contract wrapping Claude Agent SDK                     β”‚   β”‚
β”‚   β”‚  β€’ Message normalization                                            β”‚   β”‚
β”‚   β”‚  β€’ Session abstraction                                              β”‚   β”‚
β”‚   β”‚  β€’ Tool call mapping                                                β”‚   β”‚
β”‚   β”‚  β€’ Context preservation                                             β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                              β”‚                                              β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚           β–Ό                  β–Ό                  β–Ό                  β–Ό       β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚ DISCORD       β”‚  β”‚ SLACK         β”‚  β”‚ WEB           β”‚  β”‚ MOBILE    β”‚  β”‚
β”‚   β”‚ (Week 0-2)    β”‚  β”‚ (Week 2-4)    β”‚  β”‚ (Week 4-6)    β”‚  β”‚ (Week 6-8)β”‚  β”‚
β”‚   β”‚ βœ… EXISTS     β”‚  β”‚ πŸ”¨ BUILD      β”‚  β”‚ πŸ”¨ BUILD      β”‚  β”‚ πŸ“‹ PLAN   β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                             β”‚
β”‚   COMPETITIVE ADVANTAGE:                                                    β”‚
β”‚   β€’ Most competitors locked to single platform                             β”‚
β”‚   β€’ Same Decision Packs work on ALL platforms                              β”‚
β”‚   β€’ Enterprise customers get their preferred interface (Slack)             β”‚
β”‚   β€’ Consumers get mobile access                                            β”‚
β”‚   β€’ Developers get API access                                              β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Slack Adapter Details

Component Description
Slash Commands Map to workflow_tools invocations
Thread Conversations Map to session context
App Home Tab Session management dashboard
Shortcuts Quick actions for common workflows
Enterprise Grid Multi-workspace support

Web/Mobile API Details

Endpoint Purpose
POST /api/sessions Create new session
POST /api/sessions/{id}/messages Send message to session
GET /api/sessions/{id}/history Retrieve conversation history
WS /api/sessions/{id}/stream Real-time message streaming
GET /api/templates List available Decision Packs

Phase 7: Pack Ecosystem & Marketplace (8+ Weeks)

Goal: Marketplace and discovery for Decision Packs

Task Description Dependencies
Pre-built Images CI/CD for pack images Pack repos
Pack Discovery Search packs by triggers Pack registry
Pack Marketplace UI Web interface for browsing Web adapter
Pack Analytics Usage metrics, ratings, reviews Marketplace

Timeline Visualization

Week:  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
       β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚

Phase 0: Foundation (COMPLETE)
════════════════════════════════════════════════════════════════════════════════════

Phase 1: Stabilization
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
       β”‚ Error     β”‚
       β”‚ handling  β”‚
       β”‚ Caching   β”‚

Phase 2: Enhanced Templates
                   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                   β”‚ Git-based β”‚
                   β”‚ Templates β”‚

Phase 3: Voice Enhancement
                               β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                               β”‚ Insight       β”‚
                               β”‚ Recovery      β”‚
                               β”‚ Mode detect   β”‚

Phase 4: Pack Foundation
                                               β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                                               β”‚ Pack Schema +     β”‚
                                               β”‚ CLI + Builder     β”‚

Phase 5: Memory Layer
                                                                   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                                                                   β”‚ pgvector  β”‚
                                                                   β”‚ RAG       β”‚

Phase 6: MULTI-PLATFORM (KEY USP) ⭐
                                                                               β”œβ”€β”€β”€β”€β”€β”€β”€β–Ί
                                                                               β”‚ Slack
                                                                               β”‚ Web/API
                                                                               β”‚ Mobile

Phase 7: Pack Ecosystem
                                                                                       β”œβ”€β–Ί
                                                                                       β”‚ Marketplace

Milestones:  β—† Stable     β—† Templates    β—† Voice      β—† Packs     β—† Memory   ⭐ Multi-Platform
            Week 3        Week 6         Week 10      Week 14     Week 16     Week 18+

Success Metrics

Current Baseline (Already Achieved)

Metric Current Value
Discord bot functional βœ… Yes
Session deployment βœ… Works via Builder Claude
ACP communication βœ… Implemented
Template system βœ… Supabase-based
Builder Claude βœ… Meta-skills working
Git session repos βœ… Created per session
Voice sessions βœ… 3-Claude architecture
Workflow routing βœ… LLM classification

Phase Targets

Metric Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
Build error rate <5% <5% <5% <3% <3%
Git-based templates - βœ… βœ… βœ… βœ…
Insight recovery 60% 60% 80% 80% 80%
Pack deployment - - - βœ… βœ…
Cross-session memory - - - - βœ…
Deploy time (pre-built) N/A N/A N/A <60s <30s

Risk Assessment

Risk Probability Impact Mitigation
Builder reliability Medium High Error handling, retry logic
Voice latency Medium High Fast Agent optimization
Pack schema changes Medium Medium Version schema from start
Container build times High Medium Pre-build and cache images
Memory retrieval accuracy Medium Medium Tune embeddings, test extensively

Key Insights from Implementation

Configuration Spectrum

The system uses a configuration spectrum from hardcoded to dynamic:

HARDCODED ←─────────────────────────────────────────────────────→ DYNAMIC

Template Apps       Supabase Templates     Git-based Templates
(in code)           (DB records)           (external repos)

TEMPLATE_APPS = {   session_templates      Clone, analyze,
  "mmm-studio"...   table with fly_app     use .claude/ from
}                   reference              user's repo

Git as Source of Truth

  • Every session creates a GitHub repo
  • All changes tracked with commits
  • Templates can be saved as tags/branches
  • Full browsable history for audit

Builder IS the Session

Builder deploys copy of itself β†’ New Session (same image, different app name)

No separate "base session image"β€”Builder Claude uses meta-skills to construct the right environment for any repo.


Next Actions

Immediate (This Week)

  1. βœ… Builder Claude already operational
  2. Review error handling in current builds
  3. Add build caching to avoid redundant deploys
  4. Document Discord commands for users

Short-term (Next 2 Weeks)

  1. Improve template listing UX in Discord
  2. Add git_repo_url field to templates
  3. Benchmark insight recovery in voice sessions

Medium-term (Weeks 3-6)

  1. Implement voice enhancement features
  2. Design pack.yaml schema
  3. Create 3-5 reference packs (MMM, data exploration, API dev)

Summary

What Status Timeline
Core Infrastructure βœ… DONE Complete
Builder Claude βœ… DONE Complete
Git Session Repos βœ… DONE Complete
Voice Architecture βœ… DONE Complete
Workflow Routing βœ… DONE Complete
Protocol Layer βœ… DONE Complete
Discord Adapter βœ… DONE Complete
Stabilization πŸ”¨ NEXT Weeks 1-3
Enhanced Templates πŸ“‹ PLANNED Weeks 4-6
Voice Enhancement πŸ“‹ PLANNED Weeks 7-10
Pack System πŸ“‹ PLANNED Weeks 11-14
Memory Layer πŸ“‹ PLANNED Weeks 15-17
Multi-Platform (KEY USP) ⭐ PLANNED Weeks 18-20
Pack Ecosystem πŸ“‹ PLANNED Weeks 20+

Key insight: The system is MORE complete than expected. Voice architecture, workflow routing, and Builder Claude with meta-skills are all already implemented. The remaining work focuses on:

  1. Stabilization - Error handling, caching, monitoring
  2. Enhanced Templates - Git-based template sources
  3. Voice Enhancement - Insight recovery, mode detection
  4. Pack System - Pre-built, instant-launch packs
  5. Memory Layer - Cross-session persistence
  6. ⭐ Multi-Platform Expansion (KEY USP) - Slack adapter, Web/Mobile APIs
  7. Pack Ecosystem - Marketplace and discovery

Multi-Platform Strategy: KEY USP

CRITICAL DIFFERENTIATOR: The architecture is designed from the ground up for multi-platform deployment. The Protocol Layer wrapping Claude Agent SDK enables:

  • Platform-agnostic sessions - Same Decision Pack works on Discord, Slack, Web, Mobile
  • Consistent experience - Users get identical capabilities regardless of interface
  • Enterprise-ready - Teams choose their preferred platform (Slack for enterprises)
  • Consumer-ready - Mobile apps for on-the-go access
  • Developer-ready - Direct API access for integrations

Most competitors are locked to a single platform. Our architecture is a BLUE OCEAN opportunity.


This document reflects the actual state of Decision AI as of January 2026. Timeline estimates are based on building incrementally on existing infrastructure. Voice architecture, workflow routing, and Builder Claude are IMPLEMENTED, not planned.

Decision AI Competitive Analysis v3

This directory contains a comprehensive analysis of Decision AI's architecture, comparing it to competitors and documenting the actual implemented system.

⭐ KEY USP: Multi-Platform Strategy

CRITICAL DIFFERENTIATOR: Decision AI uses a standardized protocol layer wrapping Claude Agent SDK that enables platform-agnostic deployment. Same Decision Packs work on Discord (current), Slack (planned), Web/Mobile (planned).

Most competitors are locked to a single platform. Our architecture is a BLUE OCEAN opportunity.

Files

# File Description
00 COMPETITOR_OVERVIEW.md Analysis of 26+ competing products + Multi-Platform Strategy
01 PACK_MANIFEST_FORMAT.md Session templates, Decision Pack vision + Platform-Agnostic Design
02 ARTIFACT_OUTPUT_PATTERNS.md How agents structure outputs
03 REPO_TO_DEPLOY_UX.md User journey from repository to deployment
04 WORKFLOW_ROUTING.md Workflow-based routing (not tier-based)
05 THREAD_CONVERSATION_DESIGN.md Voice session architecture (3-Claude)
06 MEMORY_LAYER_ARCHITECTURE.md Memory layer (FUTURE - not yet built)
07 TOOL_COLLECTION_PATTERN.md Decision AI tools (workflow_tools, ACP, etc.)
08 TRUST_DIFFERENTIATORS.md Execution rules, audit trails, trust model
09 EVALUATION_FRAMEWORK.md Insight Recovery benchmark
10 IMPLEMENTATION_ROADMAP.md Realistic roadmap + Multi-Platform Expansion Phase

Key Patterns

Configuration Spectrum

HARDCODED ←───────────────────────────────────────────────→ DYNAMIC
Template Apps     Supabase Templates     Git-based Templates

Meta-Skills (Builder Claude)

  • claude-factory: Orchestrate .claude/ construction
  • skill-creation: Generate skills from repo analysis
  • dockerfile-gen: Framework-specific Dockerfiles
  • fly-deploy: Fly.io deployment patterns

Voice Architecture (3 Claudes)

  • Fast Agent (Haiku): Low-latency voice responses
  • Supervisor (Opus): Background context curation
  • Session (Sonnet): Work session for tools/code

Git as Source of Truth

  • Every session creates a GitHub repo
  • All changes tracked with commits
  • Full browsable history for audit

Status Summary

Component Status
Discord Bot βœ… IMPLEMENTED
Workflow Routing βœ… IMPLEMENTED
Builder Claude βœ… IMPLEMENTED
Voice Sessions βœ… IMPLEMENTED
Git Session Repos βœ… IMPLEMENTED
Protocol Layer βœ… IMPLEMENTED
Discord Adapter βœ… IMPLEMENTED
Decision Packs ❌ PLANNED
Memory Layer ❌ PLANNED
Slack Adapter ⭐ PLANNED (KEY USP)
Web/Mobile API ⭐ PLANNED (KEY USP)

Multi-Platform Architecture

                    PROTOCOL LAYER
         (Standardized Contract wrapping Claude Agent SDK)
                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚               β”‚               β”‚
         β–Ό               β–Ό               β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ DISCORD β”‚    β”‚ SLACK   β”‚    β”‚ WEB/    β”‚
    β”‚ βœ… NOW  β”‚    β”‚ πŸ”œ SOON β”‚    β”‚ MOBILE  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ πŸ”œ SOON β”‚
                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why this matters:

  • Enterprise customers want Slack
  • Consumers want mobile apps
  • Developers want direct API access
  • Same Decision Packs work on ALL platforms

Generated January 2026 as part of Decision AI competitive analysis Updated with Multi-Platform Strategy as KEY USP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment