clsandoval/00_COMPETITOR_OVERVIEW.md

Created January 16, 2026 08:27

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/clsandoval/4877f5d2147ea6242233a860798d5c39.js"></script>
Save clsandoval/4877f5d2147ea6242233a860798d5c39 to your computer and use it in GitHub Desktop.

Download ZIP

Decision AI Competitive Analysis v3

Raw

00_COMPETITOR_OVERVIEW.md

Competitor Overview

Comprehensive analysis of 26 competitor repositories analyzed for Decision AI product positioning

Executive Summary

This document provides a structured overview of all competitors analyzed during our research phase. Each competitor is categorized by market segment, with detailed profiles including value propositions, target audiences, key features, and user journey diagrams.

Key Finding: The market is fragmented across multiple niches. No single competitor addresses our full vision of unified context across platforms + modular AI sessions (deployable Claude instances) + trust-focused data science. This represents our blue ocean opportunity.

CURRENT STATE vs FUTURE VISION

CRITICAL: This section clearly distinguishes what EXISTS today versus what is PLANNED for the future.

What EXISTS Today (January 2025)

Component	Status	Description
Discord Bot	IMPLEMENTED	Primary user interface
Workflow Executor	IMPLEMENTED	Claude API with workflow_tools
Fly.io Deployment	IMPLEMENTED	Dynamic machine creation via fly_app_tools
ACP Protocol	IMPLEMENTED	Inter-session communication via SSE
Session Templates	IMPLEMENTED	Supabase database records
Builder Claude	IMPLEMENTED	Containerization service with meta-skills

What is PLANNED (Future Vision)

Component	Status	Description
Decision Packs	PLANNED	GitHub repos as deployable units with pack.yaml manifests
Pack Registry	PLANNED	Searchable index of available packs
Pack Marketplace	PLANNED	Web UI for discovery and deployment
Voice Sessions	PLANNED	Hands-free Discord voice interaction
Memory Layer	PLANNED	Persistent cross-session memory

Builder as Claude Factory: Our Key Differentiator

What makes Decision AI unique: Builder Claude doesn't just containerize code—it constructs entire intelligent environments.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    BUILDER AS CLAUDE FACTORY (CURRENT)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   INPUT: User's code repository                                              │
│          (any framework: Marimo, Streamlit, FastAPI, etc.)                  │
│                                                                             │
│   BUILDER CLAUDE CONSTRUCTS:                                                 │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  1. Docker image with user's code                                    │   │
│   │  2. .claude/ directory with:                                         │   │
│   │     ├── CLAUDE.md (execution rules + purpose)                        │   │
│   │     └── skills/ (generated from repo analysis)                       │   │
│   │  3. ACP server for communication                                     │   │
│   │  4. GitHub session repo (source of truth)                            │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   OUTPUT: Complete Claude Code environment deployed on Fly.io               │
│                                                                             │
│   KEY INSIGHT: Each build = complete Claude Code environment                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

User Intent Priority (Decision Flow)

Does repo have .claude/?
├── YES → Inherit/merge (user customizations WIN)
│         - Preserve existing skills, hooks, CLAUDE.md
│         - Merge base execution rules
│         - Add missing infrastructure skills
│
└── NO → Did user specify skill preferences?
    ├── YES → Follow their guidance exactly
    │
    └── NO → Generate from scratch using meta-skills:
             a. Analyze repo (dependencies, code patterns, purpose)
             b. Detect framework (Marimo, Streamlit, FastAPI, etc.)
             c. Generate domain-specific skills
             d. Create CLAUDE.md with execution rules

Git as Source of Truth

Each session gets its own GitHub repository. This replaces container-as-artifact with git as the unit of reproducibility.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GIT AS SOURCE OF TRUTH (CURRENT)                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Session Repo Pattern:                                                      │
│   • New GitHub repo: github.com/org/session-mmm-{hex}                       │
│   • All changes tracked: git add -A && git commit after work                │
│   • Versioning via tags: git tag "template/my-analysis-v1"                  │
│   • Full history: Browsable on GitHub, diffable                             │
│                                                                             │
│   Benefits:                                                                  │
│   • Transparency: Readable source code, not opaque binary images            │
│   • Reproducibility: git clone --branch tag = exact state                   │
│   • Shareability: Link to GitHub repo = shareable, forkable                 │
│   • Auditable: Every change logged with timestamps, diffs                   │
│                                                                             │
│   Build Result Format:                                                       │
│   {                                                                          │
│     "status": "complete",                                                   │
│     "app_name": "template-my-thing",                                        │
│     "image_ref": "registry.fly.io/template-my-thing:v1",                    │
│     "git_repo": "github.com/org/template-my-thing",                         │
│     "git_ref": "snapshot/v1"                                                │
│   }                                                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Current Decision Orchestrator Architecture

What ACTUALLY exists today:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DECISION ORCHESTRATOR - CURRENT STATE                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Discord Bot (Primary Interface)                                            │
│   └── User sends message                                                     │
│       └── Workflow Executor runs with Claude API                             │
│           └── Uses workflow_tools:                                           │
│               ├── CUSTOM_FLY_LAUNCH_BUILDER (spawn builder session)         │
│               ├── CUSTOM_FLY_LAUNCH_SESSION (launch from template)          │
│               ├── CUSTOM_ACP_SEND_MESSAGE (talk to sessions)                │
│               ├── CUSTOM_FLY_STOP_SESSION (destroy session)                 │
│               ├── CUSTOM_FLY_GET_SESSION_STATUS (check health)              │
│               ├── CUSTOM_FLY_LIST_SESSIONS (inventory)                      │
│               ├── CUSTOM_FLY_LIST_TEMPLATES (available templates)           │
│               ├── CUSTOM_FLY_SAVE_TEMPLATE (persist as reusable)            │
│               └── human_interaction_tools (wait_for_human_decision)         │
│                                                                             │
│   Builder Sessions (Ephemeral)                                               │
│   └── Name pattern: mmm-builder-{hex}                                        │
│       └── Lifetime: ~5-30 minutes                                            │
│       └── Purpose: Clone, analyze, generate .claude/, build, deploy          │
│       └── Death: After deployment (cleanup via CUSTOM_FLY_STOP_SESSION)     │
│                                                                             │
│   User Work Sessions (Persistent)                                            │
│   └── Name pattern: mmm-{hex}                                                │
│       └── Lifetime: User determines (hours to days)                          │
│       └── Purpose: Interactive analysis, experimentation                     │
│       └── Environment: Framework runtime + Claude Agent SDK/ACP             │
│                                                                             │
│   Templates (Supabase: session_templates table)                              │
│   └── Columns: name, system_prompt, tools_config, mcp_config, metadata      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Decision AI-Specific Tool Inventory (CURRENT)

Builder Lifecycle Tools

Tool	Purpose	Returns
`CUSTOM_FLY_LAUNCH_BUILDER`	Spawn ephemeral builder session	app_name, acp_url, build_id
`CUSTOM_ACP_SEND_MESSAGE`	Send instructions to builder/session	Claude's response
`CUSTOM_FLY_STOP_SESSION`	Destroy session	success/error

Session Lifecycle Tools

Tool	Purpose	Returns
`CUSTOM_FLY_LAUNCH_SESSION`	Launch from pre-saved template	app_name, urls, session_repo
`CUSTOM_FLY_GET_SESSION_STATUS`	Check session health	status, urls, session_repo
`CUSTOM_FLY_LIST_SESSIONS`	Inventory active sessions	session list

Template Management Tools

Tool	Purpose	Returns
`CUSTOM_FLY_LIST_TEMPLATES`	Show available templates	template list
`CUSTOM_FLY_SAVE_TEMPLATE`	Persist build as reusable template	success + metadata
`CUSTOM_FLY_DELETE_TEMPLATE`	Remove saved template	success/error

What These Tools DON'T Do:

No predefined Dockerfile schemas—Builder Claude generates them
No rigid skill parameters—Claude-in-Builder decides what to use
No forced framework choices—Claude analyzes and detects

Market Landscape Map

┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                              COMPETITIVE LANDSCAPE 2025                                      │
├─────────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                              │
│  ORCHESTRATION & FRAMEWORKS                    OBSERVABILITY & EVALUATION                   │
│  ┌──────────────┐  ┌──────────────┐           ┌──────────────┐  ┌──────────────┐           │
│  │   CrewAI     │  │  LangGraph   │           │   Langfuse   │  │  Braintrust  │           │
│  │  Multi-Agent │  │  Stateful    │           │  LLM Tracing │  │   AI Evals   │           │
│  │  Tool Mgmt   │  │  Workflows   │           │  & Prompts   │  │  Regression  │           │
│  └──────────────┘  └──────────────┘           └──────────────┘  └──────────────┘           │
│                                                                                              │
│  PLATFORMS & BUILDERS                          MEMORY & KNOWLEDGE                           │
│  ┌──────────────┐  ┌──────────────┐           ┌──────────────┐  ┌──────────────┐           │
│  │   LLMStack   │  │  VoltAgent   │           │  ChatMemory  │  │    Glean     │           │
│  │   No-Code    │  │  TypeScript  │           │  4-Tier Mem  │  │  Enterprise  │           │
│  │   Builder    │  │  Full-Stack  │           │  Hierarchy   │  │  Knowledge   │           │
│  └──────────────┘  └──────────────┘           └──────────────┘  └──────────────┘           │
│                                                                                              │
│  CHAT & MULTI-PLATFORM                         MARKETING ANALYTICS (MMM)                    │
│  ┌──────────────┐  ┌──────────────┐           ┌──────────────┐  ┌──────────────┐           │
│  │   Dust.tt    │  │   Clawdbot   │           │   PyMC-      │  │   Google     │           │
│  │  Slack AI    │  │  8-Platform  │           │  Marketing   │  │   LWMMM      │           │
│  │  Assistants  │  │  Personal AI │           │  Bayesian    │  │              │           │
│  └──────────────┘  └──────────────┘           └──────────────┘  └──────────────┘           │
│                                                                                              │
│  TEMPLATE & DEPLOYMENT PLATFORMS (Inspiration for Future Pack System)                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                     │
│  │   Replit     │  │   Railway    │  │   Render     │  │   Vercel     │                     │
│  │  Templates   │  │  Templates   │  │  Blueprints  │  │  Templates   │                     │
│  │  Full Proj   │  │  One-Click   │  │  IaC Deploy  │  │  Frontend    │                     │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘                     │
│                                                                                              │
└─────────────────────────────────────────────────────────────────────────────────────────────┘

Gap Analysis: Our Opportunity

What NO Competitor Does

Gap	Description	Our Approach
Builder as Claude Factory	No one generates complete Claude environments from repo analysis	Meta-skills construct .claude/ dynamically
Git as Source of Truth	Competitors use opaque container images	Session repos track all changes via git
Unified Context	No one offers context continuity across Discord/Slack/Teams/CLI	Interface Primitives + Shared Memory (PLANNED)
Domain-Specific Evals	No one has insight recovery benchmarks for analytics	MMM Insight Recovery Experiments
Trust-First Data Science	No one combines Bayesian causal + LLM + governance	Trust Differentiators
ACP Protocol	No standard for inter-Claude communication	Our implemented protocol

What We Should Adopt

Pattern	From	Why
ToolCollection	CrewAI	Best-in-class tool management
Thread-as-Boundary	Dust.tt	Essential for chat context
Statistical Evals	Braintrust	Right approach to AI testing
4-Tier Memory	ChatMemory	Complete hierarchy
Manifest Format	Awesome Skills	Proven skill structure
Streaming Progress	Replit	Great deploy UX
Bayesian Foundation	PyMC-Marketing	Trust through uncertainty

Architectural Philosophy: Embodied vs Puppeteer

Repo2Run Pattern (Puppeteer)

🧠 (External LLM) ─────► 📦 (Dumb container)
- LLM remote-controls container
- Container has no intelligence
- Intelligence only during build time
- After build: container is static code

Our Approach (Embodied)

┌──────────────────────┐
│ 🧠 (Claude INSIDE)   │
│ 📦 (Container/body)  │
└──────────────────────┘
- Claude inhabits the container
- Container is Claude's body
- Intelligence at runtime
- Interactive collaboration with user

Trade-offs

Aspect	Repo2Run	Decision AI
External rollback	Excellent	Requires orchestrator
Deterministic outputs	Yes	No (but adaptive)
Runtime adaptation	No	Yes
Domain expertise	None	Skills loaded in session
User collaboration	None	Interactive
Multi-repo composition	Hard	Flexible merging
Framework support	Python-only	Framework-agnostic

Complete Competitor List

#	Competitor	Category	What They Do (One-Liner)
1	CrewAI	Framework	Multi-agent orchestration with role-based collaboration and tool collections
2	LangGraph	Framework	Stateful graph-based workflows for LLM applications
3	Swarm	Framework	Lightweight multi-agent handoffs (educational, by OpenAI)
4	Claude-Flow	Framework	Enterprise multi-agent swarms with neural learning and MCP
5	AutoGen	Framework	Microsoft's multi-agent conversation framework
6	Pydantic-AI	Framework	Type-safe Python agents with structured outputs
7	VoltAgent	Platform	TypeScript full-stack agent framework with VoltOps observability
8	LLMStack	Platform	No-code visual builder for AI agents and workflows
9	BotSharp	Platform	.NET/C# agent framework with plugin architecture
10	Composio	SDK	500+ app integrations for AI agents
11	Langfuse	Observability	LLM tracing, prompt management, and evaluation
12	Braintrust	Observability	Statistical AI evaluation with regression detection
13	AgentOps	Observability	Agent session replay and cost tracking
14	ChatMemory	Memory	4-tier hierarchical memory for AI assistants
15	Glean	Knowledge	Enterprise permission-aware knowledge search
16	Dust.tt	Chat	Thread-aware Slack AI assistants
17	Clawdbot	Chat	8-platform personal AI assistant (desktop)
18	KIRA	Chat	Privacy-first desktop AI coworker
19	Runbear	Chat	Tiered Slack/Teams bot platform
20	Awesome Claude Skills	Skills	Open-source skill manifest patterns
21	PyMC-Marketing	MMM	Bayesian causal marketing mix modeling
22	Meta Robyn	MMM	Automated MMM with Pareto optimization
23	Replit Templates	Templates	Full project templates with instant deployment
24	Railway Templates	Templates	One-click deployable app templates
25	Render Blueprints	Templates	Infrastructure-as-code deployment templates
26	Vercel Templates	Templates	Frontend/fullstack starter templates

Master Comparison Table

Competitor	Category	Primary Language	Deploy Model	Key Differentiator	Pricing
CrewAI	Framework	Python	Library	Role-based multi-agent + ToolCollection	OSS
LangGraph	Framework	Python	Library	Stateful graphs + checkpointing	OSS + Cloud
Swarm	Framework	Python	Library	Minimal primitives (educational)	OSS
Claude-Flow	Framework	TypeScript	Enterprise	54+ agents + neural learning	OSS
AutoGen	Framework	Python	Library	Conversational multi-agent	OSS
Pydantic-AI	Framework	Python	Library	Type safety + structured outputs	OSS
VoltAgent	Platform	TypeScript	Hybrid	Full-stack + VoltOps console	OSS + Cloud
LLMStack	Platform	Python	Self-hosted	No-code visual builder	OSS + Cloud
BotSharp	Platform	C#	Enterprise	.NET ecosystem + plugins	OSS
Composio	SDK	TypeScript	Multi-framework	500+ integrations	Freemium
Langfuse	Observability	TypeScript	Self-hosted	Tracing + prompt management	OSS + Cloud
Braintrust	Observability	Python	Cloud	Statistical evals + regression	Freemium
AgentOps	Observability	Python	Cloud	Session replay + cost tracking	Freemium
ChatMemory	Memory	Python	Library	4-tier hierarchy + pgvector	OSS
Glean	Knowledge	-	Enterprise	Permission-aware search	Enterprise
Dust.tt	Chat	-	Cloud	Thread-aware Slack AI	Tiered
KIRA	Chat	Python	Desktop	Privacy-first, local-only	OSS
Clawdbot	Chat	TypeScript	Desktop	8-platform personal AI	OSS
Runbear	Chat	-	Cloud	Tiered bot platform	Tiered
Awesome Skills	Skills	-	-	Manifest format pattern	OSS
PyMC-Marketing	MMM	Python	Library	Bayesian causal inference	OSS
Robyn	MMM	R	Library	Automated Pareto optimization	OSS
LightweightMMM	MMM	Python	Library	Google's Bayesian MMM	OSS
Nielsen	MMM	-	Service	Industry standard	Enterprise
Replit Agent	Deploy	-	Cloud	Zero-friction deploy	Freemium
Hex AI	Artifacts	-	Cloud	Professional notebooks	Tiered
v0.dev	Artifacts	-	Cloud	AI-generated UI preview	Freemium

Document generated for Decision AI competitive analysis - January 2025 26 competitors analyzed across 9 categories Updated to reflect actual current state + Builder as Claude Factory architecture

Raw

01_PACK_MANIFEST_FORMAT.md

Session Templates & Future Pack Architecture

This document describes the CURRENT template system in Decision Orchestrator and the FUTURE VISION for "Decision Packs" as deployable repositories.

CRITICAL: Current State vs. Future Vision

Aspect	CURRENT (Implemented)	FUTURE (Planned)
Storage	Supabase database records	GitHub repositories
Format	JSON in database columns	pack.yaml manifests + code
Deployment	Dynamic Fly.io machines	Pre-built container images
Discovery	Query by name	Trigger-based matching
Versioning	metadata.version field	Git tags
Status	IMPLEMENTED	VISION - NOT YET BUILT

CURRENT STATE: Session Templates (IMPLEMENTED)

Template Structure

-- Current schema (from supabase/migrations/)
CREATE TABLE session_templates (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT UNIQUE NOT NULL,
    system_prompt TEXT NOT NULL,
    tools_config JSONB DEFAULT '{}',
    mcp_config JSONB DEFAULT '{}',
    metadata JSONB DEFAULT '{}',
    is_default BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

How Templates Work Today

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CURRENT TEMPLATE SYSTEM (IMPLEMENTED)                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Supabase: session_templates table                                          │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │ id          │ UUID (primary key)                                     │   │
│   │ name        │ "marketing-analyst"                                    │   │
│   │ system_prompt│ "You are a marketing analysis assistant..."          │   │
│   │ tools_config│ {"allowed_tools": ["search", "calculate"]}            │   │
│   │ mcp_config  │ {"servers": [...]}                                     │   │
│   │ metadata    │ {"version": "1.0", "author": "team"}                  │   │
│   │ is_default  │ false                                                  │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   How it's used:                                                             │
│   1. User requests session via Discord                                       │
│   2. Workflow executor queries session_templates by name                     │
│   3. System prompt and configs are loaded                                   │
│   4. Fly.io machine is created with this configuration                       │
│   5. Claude instance runs with the template's instructions                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Example Template Record

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "marketing-analyst",
  "system_prompt": "You are a marketing analytics assistant specializing in budget optimization and ROI analysis...",
  "tools_config": {
    "allowed_tools": ["search_web", "calculate", "create_chart"],
    "restrictions": []
  },
  "mcp_config": {
    "servers": []
  },
  "metadata": {
    "version": "1.0",
    "category": "analytics",
    "author": "pymc-labs"
  },
  "is_default": false
}

CURRENT STATE: Builder Claude's Role (IMPLEMENTED)

Builder Claude doesn't just use templates—it generates complete Claude environments from repository analysis.

Builder as Claude Factory

┌─────────────────────────────────────────────────────────────────────────────┐
│                    BUILDER AS CLAUDE FACTORY (CURRENT)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   When Builder receives a repo URL:                                          │
│                                                                             │
│   1. CLONE repository to /workspace/app                                      │
│                                                                             │
│   2. ANALYZE the repo:                                                       │
│      ├── Detect framework (Marimo, Streamlit, FastAPI, etc.)                │
│      ├── Check for existing .claude/ directory                              │
│      └── Identify dependencies and purpose                                  │
│                                                                             │
│   3. CONSTRUCT production .claude/ using META-SKILLS:                        │
│      ├── claude-factory.md (master skill)                                   │
│      ├── skill-creation.md (generate domain-specific skills)                │
│      ├── claude-md-templates.md (CLAUDE.md generation)                      │
│      └── merge-strategy.md (if repo has existing .claude/)                  │
│                                                                             │
│   4. GENERATE Dockerfile using framework skills:                             │
│      └── dockerfile-gen.md (framework-specific patterns)                    │
│                                                                             │
│   5. SET UP ACP server (always required)                                     │
│                                                                             │
│   6. DEPLOY via Fly: fly deploy --remote-only                               │
│                                                                             │
│   7. CREATE GitHub session repo as source of truth                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Meta-Skills (What Builder Uses)

Meta-Skill	Purpose	Location
claude-factory	Orchestrate entire .claude/ construction	`.claude/skills/meta/`
skill-creation	Generate new skills from repo analysis	`.claude/skills/meta/`
claude-md-templates	Templates for CLAUDE.md generation	`.claude/skills/meta/`
merge-strategy	Combine repo's .claude/ with base	`.claude/skills/meta/`
dockerfile-gen	Framework-specific Dockerfile patterns	`.claude/skills/building/`

Production Image Structure (What Builder Creates)

production-image/
├── app/                    # User's code
├── acp-server/             # Claude Agent SDK ACP server (REQUIRED)
│   ├── server.py
│   ├── tools/
│   └── pyproject.toml
└── .claude/
    ├── CLAUDE.md           # Execution rules + purpose (REQUIRED)
    └── skills/             # Generated domain-specific skills (optional)

Git as Source of Truth (CURRENT)

Each session gets its own GitHub repository. This is IMPLEMENTED, not future vision.

Session Repo Pattern

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GIT AS SOURCE OF TRUTH (IMPLEMENTED)                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   For each snapshot/template, Builder CREATES a new GitHub repo:            │
│                                                                             │
│   1. gh repo create org/template-{name} --private                           │
│   2. Push the snapshot to that repo                                         │
│   3. Return the repo URL + app name to Orchestrator                         │
│                                                                             │
│   Build Result Format:                                                       │
│   {                                                                          │
│     "status": "complete",                                                   │
│     "app_name": "template-my-thing",                                        │
│     "image_ref": "registry.fly.io/template-my-thing:v1",                    │
│     "git_repo": "github.com/org/template-my-thing",                         │
│     "git_ref": "snapshot/v1"                                                │
│   }                                                                          │
│                                                                             │
│   Orchestrator handles saving metadata to Supabase.                          │
│   Git repo IS the source of truth.                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Multi-Repo Sessions (Combined State)

When user clones repos A, B, C and tinkers:

Session workspace has A/, B/, C/ directories (each with .git)
On snapshot: Remove nested .git directories (flatten)
Initialize single root .git (new Repo D)
Snapshot manifest records provenance (where each part came from)
Dockerfile becomes simple COPY . . (not multiple clones)
Result: Combined state becomes the template source

Configuration Spectrum (CURRENT)

The system slides along a spectrum based on use case:

HARDCODED ◄─────────────────────────────────────────► DYNAMIC
(batch)                                              (R&D)

┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│ Static   │  │Template  │  │ Custom   │  │ Blank    │
│ Script   │  │ Launch   │  │ Build    │  │ Slate    │
└──────────┘  └──────────┘  └──────────┘  └──────────┘

Position	Type	Builder	Claude	Example
Far left	Static	Not used	Not needed	Pre-built script, just execute
Left-center	Template	Runs once	In session	Pre-saved image, launch and work
Right-center	Custom Build	Runs now	In builder & session	"Build this repo, then I'll analyze it"
Far right	Blank Slate	Interactive	Full	"Install Python, let me tinker"

Key: Same architecture at all points, different configuration. Not different systems.

FUTURE VISION: Decision Packs (NOT YET IMPLEMENTED)

NOTE: Everything below this line describes a FUTURE architecture that does not yet exist in the codebase.

The Pack Vision

A Decision Pack would be a complete, deployable repository containing:

Dockerfile for containerization
All dependencies and tooling
Agent/skill logic and instructions
Manifest/configuration
Ready to deploy on Fly.io

┌─────────────────────────────────────────────────────────────────────────────┐
│                    FUTURE: DECISION PACK = DEPLOYABLE REPOSITORY             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Current (Template):                    Future Vision (Pack):               │
│  ──────────────────                     ───────────────────────             │
│                                                                             │
│  ┌─────────────────────┐               ┌─────────────────────────────────┐ │
│  │ Supabase record:    │               │ github.com/org/pack-mmm         │ │
│  │ - name              │               │                                  │ │
│  │ - system_prompt     │               │ ├── Dockerfile        ← BUILD   │ │
│  │ - tools_config      │               │ ├── fly.toml          ← DEPLOY  │ │
│  │ - mcp_config        │               │ ├── requirements.txt  ← DEPS    │ │
│  └─────────────────────┘               │ ├── pack.yaml         ← MANIFEST│ │
│                                        │ ├── .claude/                    │ │
│  Just database config.                 │ │   ├── CLAUDE.md     ← BRAIN   │ │
│  Session created                       │ │   └── skills/                 │ │
│  dynamically.                          │ ├── acp-server/       ← COMMS   │ │
│                                        │ └── app/              ← CODE    │ │
│                                        └─────────────────────────────────┘ │
│                                                                             │
│                                        Complete. Self-contained. Deployable.│
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Proposed Pack Manifest (pack.yaml) - FUTURE

# pack.yaml - Pack Manifest (PROPOSED - NOT YET IMPLEMENTED)
name: marketing-mmm
version: 2.1.0
description: Marketing Mix Modeling with Bayesian inference and budget optimization

# Identity
author: decision-ai
license: Apache-2.0
repository: github.com/decision-ai/pack-mmm

# Discovery triggers - WHEN should this pack be suggested?
triggers:
  keywords:
    - "marketing.*budget"
    - "ROI|ROAS"
    - "attribution"
    - "media mix"
  file_patterns:
    - "*.csv"           # Marketing data files
    - "*.parquet"
  contexts:
    - analytics
    - marketing
    - planning

# Capabilities - WHAT does this pack provide?
provides:
  - bayesian-inference
  - budget-optimization
  - uncertainty-quantification
  - channel-attribution

# Dependencies - WHAT other packs does this need?
requires:
  - name: data-loader
    version: ">=1.0"
  - name: visualization
    version: ">=2.0"
    optional: true

# Runtime requirements
runtime:
  base_image: python:3.11-slim
  memory: 2gb
  cpu: shared-2x
  gpu: false

# Environment variables needed
environment:
  required:
    - ANTHROPIC_API_KEY
  optional:
    - DATABASE_URL
    - REDIS_URL

# Health check
health:
  path: /healthz
  interval: 30s
  timeout: 5s

Roadmap: From Current to Future

Phase 0: Current State (IMPLEMENTED)

Supabase session_templates table
Dynamic Fly.io machine creation
System prompts loaded from database
ACP communication between sessions
Builder Claude with meta-skills
Git repos as source of truth for sessions

Phase 1: Enhanced Templates (PLANNED)

Add git_repo_url field to session_templates
If git_url present, clone and extract .claude/
Template versioning via git refs

Phase 2: Pack System (FUTURE VISION)

Git repositories as deployable units
pack.yaml manifest for discovery
Container registry for pre-built images
Pack marketplace

Migration Path

┌─────────────────────────────────────────────────────────────────────────────┐
│                        EVOLUTION PATH                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TODAY (Current)             NEAR-TERM                    FUTURE            │
│  ──────────────              ─────────                    ──────            │
│                                                                             │
│  session_templates           session_templates            Pack repos        │
│  (Supabase)                  + git_repo_url field         (GitHub)          │
│       │                          │                           │              │
│       ▼                          ▼                           ▼              │
│  system_prompt               system_prompt OR            .claude/CLAUDE.md  │
│  in database                 .claude/ from repo          in repository      │
│       │                          │                           │              │
│       ▼                          ▼                           ▼              │
│  Dynamic Fly.io            Clone repo →                Pre-built images     │
│  machine creation          Build → Deploy              from registry        │
│                                                                             │
│  Builder Claude            Builder Claude              Pack discovery       │
│  generates .claude/        respects repo .claude/      + pre-built         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Summary

Aspect	Current (Templates)	Future (Packs)
Storage	Supabase database + Git session repos	GitHub pack repositories
Format	JSON records + generated .claude/	YAML manifests + code
Deployment	Builder generates, Fly deploys	Pre-built containers
Discovery	Query by name	Trigger-based matching
Versioning	Git tags on session repos	Git tags on pack repos
Distribution	Supabase + GitHub	Container registry
Status	IMPLEMENTED	VISION

This document reflects the actual state of Decision Orchestrator as of January 2025. The pack system described is a future vision based on roadmap documents in the codebase.

Raw

02_ARTIFACT_OUTPUT_PATTERNS.md

Artifact Output Patterns

How AI agents produce, structure, and present actionable outputs to users

The Core Insight

An artifact is a discrete, actionable unit of agent output. Unlike conversational text, artifacts are:

Structured and machine-parseable
Designed for a specific next action
Often rendered specially in UI
Versioned and comparable

The best agents don't just respond—they produce artifacts that enable action.

Artifact Taxonomy

┌─────────────────────────────────────────────────────────────────────────────┐
│                          ARTIFACT TYPES                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  CODE ARTIFACTS                         DOCUMENT ARTIFACTS                   │
│  ──────────────                         ──────────────────                   │
│  ┌─────────────────┐                    ┌─────────────────┐                 │
│  │ Type: code      │                    │ Type: document  │                 │
│  │ Lang: python    │                    │ Format: markdown│                 │
│  │ Executable: yes │                    │ Sections: yes   │                 │
│  │ Testable: yes   │                    │ Exportable: yes │                 │
│  └─────────────────┘                    └─────────────────┘                 │
│                                                                             │
│  VISUAL ARTIFACTS                       DATA ARTIFACTS                       │
│  ────────────────                       ──────────────                       │
│  ┌─────────────────┐                    ┌─────────────────┐                 │
│  │ Type: visual    │                    │ Type: data      │                 │
│  │ Format: SVG     │                    │ Format: JSON    │                 │
│  │ Renderable: yes │                    │ Schema: defined │                 │
│  │ Interactive: ?  │                    │ Queryable: yes  │                 │
│  └─────────────────┘                    └─────────────────┘                 │
│                                                                             │
│  COMPOSITE ARTIFACTS                    ACTION ARTIFACTS                     │
│  ───────────────────                    ────────────────                     │
│  ┌─────────────────┐                    ┌─────────────────┐                 │
│  │ Type: report    │                    │ Type: action    │                 │
│  │ Contains:       │                    │ Executable: yes │                 │
│  │  - text         │                    │ Reversible: ?   │                 │
│  │  - charts       │                    │ Confirmable: yes│                 │
│  │  - tables       │                    └─────────────────┘                 │
│  │  - code         │                                                        │
│  └─────────────────┘                                                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Artifact Lifecycle

┌─────────────────────────────────────────────────────────────────────────────┐
│                      ARTIFACT LIFECYCLE                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌────────────┐    ┌────────────┐    ┌────────────┐    ┌────────────┐      │
│  │  GENERATE  │───►│  VALIDATE  │───►│  PRESENT   │───►│   ACTION   │      │
│  └────────────┘    └────────────┘    └────────────┘    └────────────┘      │
│        │                 │                 │                 │              │
│        ▼                 ▼                 ▼                 ▼              │
│  Agent creates     Schema check      UI renders       User acts on         │
│  structured        Type validation   Special display  artifact             │
│  output            Completeness      Action buttons   (copy, run, etc.)    │
│                                                                             │
│  ════════════════════════════════════════════════════════════════════════  │
│                                                                             │
│                         ITERATION LOOP                                       │
│                                                                             │
│     ┌─────────────────────────────────────────────────────────┐            │
│     │                                                          │            │
│     │    User: "Make the chart blue"                          │            │
│     │         │                                                │            │
│     │         ▼                                                │            │
│     │    ┌─────────┐     ┌─────────┐     ┌─────────┐         │            │
│     │    │Artifact │────►│ Update  │────►│Artifact │         │            │
│     │    │  v1     │     │ Command │     │  v2     │         │            │
│     │    └─────────┘     └─────────┘     └─────────┘         │            │
│     │                                         │               │            │
│     │                                         │               │            │
│     │    Both versions preserved for comparison/rollback      │            │
│     │                                                          │            │
│     └─────────────────────────────────────────────────────────┘            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Structured Output Schema Pattern

The key insight: artifacts need schemas for reliability.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SCHEMA-DRIVEN ARTIFACTS                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  WITHOUT SCHEMA                          WITH SCHEMA                         │
│  ──────────────                          ───────────                         │
│                                                                             │
│  Agent output:                           Agent output:                       │
│  "The ROI is about 2.5x,                 {                                   │
│   give or take, and                        "artifact_type": "analysis",     │
│   you should probably                      "metrics": {                     │
│   increase digital spend"                    "roi": {                       │
│                                                "value": 2.5,               │
│  Problems:                                     "unit": "multiple",         │
│  • Can't extract metrics                       "confidence": 0.87          │
│  • Can't compare versions                    }                              │
│  • Can't feed to downstream                },                               │
│  • UI can't render specially               "recommendations": [             │
│                                              {                              │
│                                                "action": "increase_spend", │
│                                                "channel": "digital",       │
│                                                "amount_pct": 15            │
│                                              }                              │
│                                            ]                                │
│                                          }                                  │
│                                                                             │
│                                          Benefits:                          │
│                                          • Machine-parseable               │
│                                          • Validatable                     │
│                                          • Comparable                      │
│                                          • UI can render richly            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Decision AI Artifacts: Build Results

In Decision AI, build results are a key artifact type. Builder Claude produces structured build results:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    BUILD RESULT ARTIFACT (DECISION AI)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  After Builder Claude completes a build:                                     │
│                                                                             │
│  ## Build Result                                                             │
│                                                                             │
│  | Field | Value |                                                           │
│  |-------|-------|                                                           │
│  | **App Name** | `mmm-session-abc123` |                                    │
│  | **App URL** | `https://mmm-session-abc123.fly.dev` |                     │
│  | **Image** | `registry.fly.io/mmm-session-abc123:deployment-xyz` |        │
│  | **Framework** | `marimo` |                                               │
│  | **Git Repo** | `github.com/org/session-mmm-abc123` |                     │
│  | **Git Ref** | `snapshot/v1` |                                            │
│                                                                             │
│  ### Services Config                                                         │
│  ```json                                                                     │
│  [                                                                           │
│    {"protocol": "tcp", "internal_port": 8080,                               │
│     "ports": [{"port": 443, "handlers": ["tls", "http"]}]},                 │
│    {"protocol": "tcp", "internal_port": 3017,                               │
│     "ports": [{"port": 3017, "handlers": ["tls"]}]}                         │
│  ]                                                                           │
│  ```                                                                         │
│                                                                             │
│  ### Save as Template                                                        │
│  CUSTOM_FLY_SAVE_TEMPLATE(                                                  │
│    slug="mmm-analysis",                                                     │
│    name="MMM Analysis Session",                                             │
│    image_ref="registry.fly.io/mmm-session-abc123:deployment-xyz",           │
│    services=[...],                                                          │
│    framework="marimo",                                                      │
│    description="Marketing mix modeling with PyMC"                           │
│  )                                                                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Artifact Presentation Patterns

How artifacts are displayed to users:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PRESENTATION PATTERNS                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PATTERN 1: INLINE EXPANSION                                                 │
│  ───────────────────────────                                                 │
│                                                                             │
│     Conversation:                                                            │
│     ┌────────────────────────────────────────┐                              │
│     │ User: "Generate a budget allocation"   │                              │
│     │                                        │                              │
│     │ Agent: "Here's the optimized budget:"  │                              │
│     │                                        │                              │
│     │ ┌────────────────────────────────────┐ │                              │
│     │ │ [ARTIFACT: Budget Table]           │ │ ◄── Rendered inline          │
│     │ │                                    │ │                              │
│     │ │ Channel  │ Current │ Recommended  │ │                              │
│     │ │ ─────────┼─────────┼────────────  │ │                              │
│     │ │ TV       │ $500K   │ $420K        │ │                              │
│     │ │ Digital  │ $300K   │ $380K        │ │                              │
│     │ │                                    │ │                              │
│     │ │ [Copy] [Export CSV] [Apply]       │ │ ◄── Action buttons           │
│     │ └────────────────────────────────────┘ │                              │
│     └────────────────────────────────────────┘                              │
│                                                                             │
│  PATTERN 2: SIDE PANEL                                                       │
│  ─────────────────────                                                       │
│                                                                             │
│     ┌─────────────────────┬─────────────────────────────────┐               │
│     │   Conversation      │     Artifact Panel               │               │
│     │                     │                                  │               │
│     │  User: "Show me     │  ┌───────────────────────────┐  │               │
│     │   the analysis"     │  │  [LIVE PREVIEW]           │  │               │
│     │                     │  │                           │  │               │
│     │  Agent: "I've       │  │   Chart renders here      │  │               │
│     │   created a viz..." │  │   Updates in real-time    │  │               │
│     │                     │  │   as conversation         │  │               │
│     │  User: "Make it     │  │   progresses              │  │               │
│     │   a bar chart"      │  │                           │  │               │
│     │                     │  └───────────────────────────┘  │               │
│     │  Agent: "Updated."  │                                  │               │
│     └─────────────────────┴─────────────────────────────────┘               │
│                                                                             │
│  PATTERN 3: ARTIFACT GALLERY                                                 │
│  ───────────────────────────                                                 │
│                                                                             │
│     Session artifacts:                                                       │
│     ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐                     │
│     │ Chart   │  │ Code    │  │ Report  │  │ Config  │                     │
│     │ v3      │  │ v2      │  │ v1      │  │ v1      │                     │
│     │ [view]  │  │ [view]  │  │ [view]  │  │ [view]  │                     │
│     └─────────┘  └─────────┘  └─────────┘  └─────────┘                     │
│         ▲                                                                   │
│         └── Click to expand, compare with previous versions                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Action Buttons Pattern

Every artifact should have clear next actions:

┌─────────────────────────────────────────────────────────────────────────────┐
│                      ACTION BUTTON TAXONOMY                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  CODE ARTIFACTS                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  [Copy to Clipboard] [Run in Sandbox] [Insert into File] [Explain] │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  DATA ARTIFACTS                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  [Download CSV] [Download JSON] [Open in Spreadsheet] [Visualize]  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  BUILD ARTIFACTS (Decision AI specific)                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  [Open Session] [Save as Template] [View Logs] [Stop Session]      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ANALYSIS ARTIFACTS                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  [Export Report] [Schedule Re-run] [Share with Team] [Add to Dash] │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  RECOMMENDATION ARTIFACTS                                                    │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  [Apply Changes] [Modify Parameters] [Reject] [Request Explanation]│   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Streaming & Progressive Rendering

For long-running artifact generation:

┌─────────────────────────────────────────────────────────────────────────────┐
│                   PROGRESSIVE ARTIFACT RENDERING                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  USER REQUEST: "Generate a comprehensive marketing report"                   │
│                                                                             │
│  TIME ────────────────────────────────────────────────────────────────►     │
│                                                                             │
│  t=0s     ┌─────────────────────────────────────────────┐                  │
│           │  [ARTIFACT: Report]                          │                  │
│           │                                              │                  │
│           │  Status: Generating...                       │                  │
│           │  ████░░░░░░░░░░░░░░░░░░░░  15%              │                  │
│           │                                              │                  │
│           │  Sections:                                   │                  │
│           │  ✓ Executive Summary (ready)                │                  │
│           │  ⟳ Channel Analysis (in progress)           │                  │
│           │  ○ Recommendations (pending)                 │                  │
│           │                                              │                  │
│           │  [View Available Sections]                   │                  │
│           └─────────────────────────────────────────────┘                  │
│                                                                             │
│  t=60s    ┌─────────────────────────────────────────────┐                  │
│           │  [ARTIFACT: Report]                          │                  │
│           │                                              │                  │
│           │  Status: Complete ✓                          │                  │
│           │  ████████████████████████  100%             │                  │
│           │                                              │                  │
│           │  All 5 sections ready                        │                  │
│           │                                              │                  │
│           │  [Download PDF] [Share] [Schedule Update]    │                  │
│           └─────────────────────────────────────────────┘                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key Design Principles

Principle	Implementation
Schema-First	Define artifact structure before generation
Progressive Display	Show what's ready while generating rest
Clear Actions	Every artifact has obvious next steps
Versioning	Track changes, enable comparison & rollback
Export Flexibility	Multiple formats for different consumers
Inline Context	Keep artifacts near relevant conversation

Common Pitfalls

┌─────────────────────────────────────────────────────────────────────────────┐
│                       ARTIFACT ANTI-PATTERNS                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PROBLEM: Wall of Text                  SOLUTION: Structured Artifact        │
│  ─────────────────────                  ────────────────────────             │
│                                                                             │
│  "Here's your analysis:                 ┌──────────────────────────┐        │
│   The ROI for TV is 2.1x               │ [Analysis Artifact]      │        │
│   which is lower than digital          │                          │        │
│   at 3.2x but social is only           │ ROI Summary Table        │        │
│   1.8x so you should..."               │ Key Insight: Digital > TV│        │
│                                         │ [See Full Report]        │        │
│  Can't scan, extract, or act            └──────────────────────────┘        │
│                                         Scannable, actionable               │
│                                                                             │
│  PROBLEM: No Next Steps                 SOLUTION: Action Buttons             │
│  ──────────────────────                 ────────────────────────             │
│                                                                             │
│  "Here's the code."                     ┌──────────────────────────┐        │
│                                         │ [Code Artifact]          │        │
│  User: "Now what?"                      │                          │        │
│                                         │ [Run] [Copy] [Test]      │        │
│                                         └──────────────────────────┘        │
│                                                                             │
│  PROBLEM: Lost Artifacts                SOLUTION: Artifact Gallery           │
│  ───────────────────────                ────────────────────────             │
│                                                                             │
│  User: "Where's that chart             Session artifacts persisted          │
│   you made earlier?"                    and browsable in sidebar            │
│                                                                             │
│  Scroll, scroll, scroll...              One-click to find any output        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Great artifacts are not just outputs—they're starting points for the next action. The difference between "here's some text" and "here's a structured artifact with clear next steps" is the difference between a chatbot and a productive AI assistant.

Raw

03_REPO_TO_DEPLOY_UX.md

Session Deployment UX

How Decision Orchestrator deploys Claude sessions from Discord to Fly.io

CRITICAL: Current State vs. Future Vision

Aspect	CURRENT (Implemented)	FUTURE (Planned)
User Interface	Discord chat	Discord + Web UI
Template Source	Supabase database + Git repos	GitHub pack repositories
Deployment	Dynamic Fly.io creation via Builder Claude	Pre-built container images
Build Process	Builder Claude constructs .claude/	Pack already contains .claude/
Status	IMPLEMENTED	VISION

What EXISTS Today: Discord → Builder → Fly.io Sessions

The current Decision Orchestrator deploys sessions through a Discord-driven workflow with Builder Claude:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CURRENT DEPLOYMENT FLOW (IMPLEMENTED)                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. User sends message in Discord                                            │
│     └── "Build this repo: github.com/user/mmm-analysis"                     │
│                                                                             │
│  2. Discord Bot receives message                                             │
│     └── Routes to Workflow Executor (Orchestrator Claude)                   │
│                                                                             │
│  3. Orchestrator uses CUSTOM_FLY_LAUNCH_BUILDER                              │
│     └── Spawns ephemeral Builder Claude session                             │
│     └── Builder app name: mmm-builder-{hex}                                 │
│                                                                             │
│  4. Orchestrator sends build instructions via CUSTOM_ACP_SEND_MESSAGE        │
│     └── Builder Claude receives repo URL and instructions                   │
│                                                                             │
│  5. Builder Claude executes build workflow:                                  │
│     ├── Clone repository to /workspace/app                                  │
│     ├── Analyze repo (detect framework, dependencies, purpose)              │
│     ├── Check for existing .claude/ (merge if present)                      │
│     ├── Generate .claude/ using meta-skills:                                │
│     │   ├── claude-factory.md (master skill)                                │
│     │   ├── skill-creation.md (generate domain skills)                      │
│     │   └── claude-md-templates.md (CLAUDE.md with execution rules)         │
│     ├── Generate Dockerfile using dockerfile-gen.md                         │
│     ├── Set up ACP server (always required)                                 │
│     ├── Deploy via: fly deploy --remote-only                                │
│     └── Create GitHub session repo                                          │
│                                                                             │
│  6. Builder reports back via ACP:                                            │
│     └── Build result with app_name, image_ref, git_repo, git_ref            │
│                                                                             │
│  7. Orchestrator cleans up:                                                  │
│     └── CUSTOM_FLY_STOP_SESSION destroys builder                            │
│                                                                             │
│  8. Session Ready                                                            │
│     └── User can communicate via Discord (through orchestrator → ACP)       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Decision AI-Specific Tools (IMPLEMENTED)

Orchestrator Tools

Tool	Purpose	Parameters
`CUSTOM_FLY_LAUNCH_BUILDER`	Spawn ephemeral builder session	region, timeout
`CUSTOM_FLY_LAUNCH_SESSION`	Launch from saved template	template_slug, region
`CUSTOM_ACP_SEND_MESSAGE`	Send instructions to session	session_id, message
`CUSTOM_FLY_STOP_SESSION`	Destroy session	session_id
`CUSTOM_FLY_GET_SESSION_STATUS`	Check session health	session_id
`CUSTOM_FLY_LIST_SESSIONS`	List active sessions	-
`CUSTOM_FLY_LIST_TEMPLATES`	List available templates	-
`CUSTOM_FLY_SAVE_TEMPLATE`	Save build as reusable template	metadata
`CUSTOM_FLY_DELETE_TEMPLATE`	Remove saved template	template_id

What These Tools DON'T Do

No predefined Dockerfile schemas—Builder Claude generates them
No rigid skill parameters—Claude-in-Builder decides what to use
No forced framework choices—Claude analyzes and detects

Session Types (IMPLEMENTED)

Builder Sessions (Ephemeral)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    BUILDER SESSION LIFECYCLE                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Name pattern: mmm-builder-{hex}                                            │
│   Lifetime: ~5-30 minutes                                                    │
│   Purpose: Clone, analyze, generate .claude/, build, deploy                  │
│                                                                             │
│   Environment:                                                               │
│   • Fly CLI (for deployment - builds happen remotely)                       │
│   • Git (for cloning repos)                                                  │
│   • GitHub CLI (gh) for creating repos                                       │
│   • Python/Node (for dependency analysis)                                    │
│   • NO Docker CLI (fly deploy --remote-only)                                │
│                                                                             │
│   Builder's .claude/ (its own skills):                                       │
│   ├── skills/building/ (Dockerfile, Fly deploy patterns)                    │
│   ├── skills/frameworks/ (Marimo, Streamlit, FastAPI patterns)              │
│   └── skills/meta/ (claude-factory, skill-creation, merge-strategy)         │
│                                                                             │
│   Death: After deployment (cleanup via CUSTOM_FLY_STOP_SESSION)             │
│                                                                             │
│   KEY INSIGHT: Builder IS the Session                                        │
│   When creating a new session, Builder deploys a copy of itself             │
│   as a new Fly app. No separate "base session image" needed.                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

User Work Sessions (Persistent)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    USER WORK SESSION                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Name pattern: mmm-{hex}                                                    │
│   Lifetime: User determines (hours to days)                                  │
│   Purpose: Interactive analysis, experimentation, collaboration              │
│                                                                             │
│   Environment:                                                               │
│   • Framework runtime (Marimo, Streamlit, FastAPI, etc.)                    │
│   • Claude Agent SDK / ACP server                                           │
│   • User's code and dependencies                                            │
│                                                                             │
│   Session's .claude/ (created by Builder):                                   │
│   ├── CLAUDE.md (execution rules + purpose)                                 │
│   └── skills/ (generated domain-specific skills)                            │
│                                                                             │
│   KEY: User's workspace, git-backed, saveable as template                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Git as Source of Truth (IMPLEMENTED)

Each session gets its own GitHub repository:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SESSION REPO PATTERN                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   For each snapshot/template, Builder CREATES a new GitHub repo:            │
│                                                                             │
│   1. gh repo create org/session-mmm-{hex} --private                         │
│   2. Push the session state to that repo                                    │
│   3. Tag for versioning: git tag "snapshot/v1"                              │
│   4. Return repo URL + app name to Orchestrator                             │
│                                                                             │
│   Benefits:                                                                  │
│   • Transparency: Readable source code, not opaque binary images            │
│   • Reproducibility: git clone --branch tag = exact state                   │
│   • Shareability: Link to GitHub repo = shareable, forkable                 │
│   • Auditable: Every change logged with timestamps, diffs                   │
│                                                                             │
│   Build Result Format:                                                       │
│   {                                                                          │
│     "status": "complete",                                                   │
│     "app_name": "mmm-analysis-abc123",                                      │
│     "image_ref": "registry.fly.io/mmm-analysis-abc123:v1",                  │
│     "git_repo": "github.com/org/session-mmm-abc123",                        │
│     "git_ref": "snapshot/v1"                                                │
│   }                                                                          │
│                                                                             │
│   Orchestrator handles saving metadata to Supabase.                          │
│   Git repo IS the source of truth.                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Current User Journey

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CURRENT USER JOURNEY (IMPLEMENTED)                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  USER: "Build github.com/myorg/marketing-analysis for MMM modeling"         │
│                                                                             │
│       │                                                                     │
│       ▼                                                                     │
│  ╔═══════════════════════════════════════════════════════════════════════╗ │
│  ║  ORCHESTRATOR RECEIVES MESSAGE                                         ║ │
│  ╠═══════════════════════════════════════════════════════════════════════╣ │
│  ║                                                                       ║ │
│  ║   1. Identifies intent: Build a session from repo                     ║ │
│  ║   2. Calls CUSTOM_FLY_LAUNCH_BUILDER                                  ║ │
│  ║   3. Waits for builder to be ready                                    ║ │
│  ║                                                                       ║ │
│  ╚═══════════════════════════════════════════════════════════════════════╝ │
│       │                                                                     │
│       ▼                                                                     │
│  ╔═══════════════════════════════════════════════════════════════════════╗ │
│  ║  BUILDER CLAUDE EXECUTES                                               ║ │
│  ╠═══════════════════════════════════════════════════════════════════════╣ │
│  ║                                                                       ║ │
│  ║   1. Clone repo                                                       ║ │
│  ║   2. Detect framework: Marimo notebook with PyMC-Marketing            ║ │
│  ║   3. Generate .claude/:                                               ║ │
│  ║      - CLAUDE.md with execution rules + MMM-specific purpose          ║ │
│  ║      - skills/mmm-analysis.md (generated from repo patterns)          ║ │
│  ║   4. Generate Dockerfile (Marimo pattern)                             ║ │
│  ║   5. Deploy: fly deploy --remote-only                                 ║ │
│  ║   6. Create GitHub repo: github.com/org/session-mmm-abc123            ║ │
│  ║                                                                       ║ │
│  ╚═══════════════════════════════════════════════════════════════════════╝ │
│       │                                                                     │
│       ▼                                                                     │
│  ╔═══════════════════════════════════════════════════════════════════════╗ │
│  ║  SESSION DEPLOYED                                                      ║ │
│  ╠═══════════════════════════════════════════════════════════════════════╣ │
│  ║                                                                       ║ │
│  ║   "Your MMM analysis session is ready!"                                ║ │
│  ║                                                                       ║ │
│  ║   Session URL: https://mmm-abc123.fly.dev                              ║ │
│  ║   ACP Endpoint: wss://mmm-abc123.fly.dev:3017                          ║ │
│  ║   Git Repo: github.com/org/session-mmm-abc123                          ║ │
│  ║                                                                       ║ │
│  ║   Framework: Marimo                                                    ║ │
│  ║   Skills: MMM Analysis, Bayesian Modeling                              ║ │
│  ║                                                                       ║ │
│  ║   [Open Session] [View Notebook] [Save as Template]                    ║ │
│  ║                                                                       ║ │
│  ╚═══════════════════════════════════════════════════════════════════════╝ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

ACP Communication (IMPLEMENTED)

The Agent Communication Protocol enables inter-session messaging:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ACP PROTOCOL (IMPLEMENTED)                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ACP is the WebSocket/SSE protocol for inter-session messaging.            │
│                                                                             │
│   ┌─────────────────┐         WebSocket        ┌─────────────────┐         │
│   │  Orchestrator   │ ──────────────────────►  │   Builder       │         │
│   │  (Discord Bot)  │ ◄──────────────────────  │   Session       │         │
│   └─────────────────┘                          └─────────────────┘         │
│            │                                            │                   │
│            │          WebSocket                         │                   │
│            └────────────────────────────────────────────┤                   │
│                                                         ▼                   │
│                                                ┌─────────────────┐         │
│                                                │   User Work     │         │
│                                                │   Session       │         │
│                                                └─────────────────┘         │
│                                                                             │
│   Every production image MUST include the ACP server:                       │
│   └── Copied from /templates/acp-server/ into production image             │
│                                                                             │
│   Communication Flow:                                                        │
│   1. Orchestrator calls CUSTOM_ACP_SEND_MESSAGE(session_id, message)        │
│   2. Message goes to session's ACP server                                   │
│   3. Session's Claude processes and responds                                │
│   4. Response returns to Orchestrator                                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Execution Rules (What Goes in CLAUDE.md)

Builder generates CLAUDE.md with these execution rules:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTION RULES IN CLAUDE.MD                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ## Execution Environment Rules                                             │
│                                                                             │
│   **You MUST work through {FRAMEWORK}'s runtime environment.**               │
│                                                                             │
│   ### PROHIBITED Actions                                                     │
│   ❌ Running `python -c "..."` via bash                                     │
│   ❌ Running `python script.py` directly                                    │
│   ❌ Using sed/awk/grep to modify code files                                │
│   ❌ Text manipulation with regex on notebook/app files                     │
│   ❌ Direct file edits outside ACP tools                                    │
│   ❌ Treating notebooks as regular Python scripts                           │
│   ❌ Any bash command that executes Python code                             │
│                                                                             │
│   ### REQUIRED Actions                                                       │
│   ✅ Use ACP tools (mcp__acp__Read, Edit, Write) for file operations        │
│   ✅ Use framework MCP tools (mcp__{framework}__*) for interactions         │
│   ✅ Let framework runtime handle code execution                            │
│   ✅ Read current state before making changes                               │
│   ✅ Respond via ACP when task completes                                    │
│                                                                             │
│   Framework-Specific Rules (examples):                                       │
│                                                                             │
│   ### Marimo                                                                 │
│   - Use mcp__marimo__run_cell to execute cells                              │
│   - Cells are reactive - changing one may trigger others                    │
│   - Never modify notebook.py directly with text tools                       │
│                                                                             │
│   ### Streamlit                                                              │
│   - App reruns on every interaction - design for statelessness              │
│   - Use st.session_state for persistent state                               │
│   - Modifications to app.py require app restart                             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

FUTURE VISION: Pack-Based Deployment (NOT YET IMPLEMENTED)

NOTE: Everything below describes a FUTURE vision that does NOT yet exist.

The Pack Deploy Vision

┌─────────────────────────────────────────────────────────────────────────────┐
│              FUTURE: PACK-BASED DEPLOYMENT (NOT IMPLEMENTED)                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  CURRENT (Builder-based):                                                    │
│  ────────────────────────                                                   │
│                                                                             │
│  User → Discord → Launch Builder → Clone + Analyze + Generate → Deploy      │
│                                                                             │
│  ════════════════════════════════════════════════════════════════════════   │
│                                                                             │
│  FUTURE VISION (Pack-based):                                                 │
│  ───────────────────────────                                                │
│                                                                             │
│  User picks pack → Pre-built image pulled → Inject secrets/data → Run       │
│       ↑                     ↑                      ↑                        │
│   Pack registry       Already built          User customization             │
│   with discovery      (no Builder!)          is optional                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Voice Sessions (PLANNED - NOT YET IMPLEMENTED)

┌─────────────────────────────────────────────────────────────────────────────┐
│              FUTURE: VOICE SESSIONS (PLANNED)                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Vision: Hands-free interaction via Discord voice channel                    │
│                                                                             │
│  1. "Hey, start a brainstorm session"                                       │
│  2. Orchestrator: "Started. What are we thinking about?"                    │
│  3. User: "I want to explore pricing models..."                             │
│  4. Orchestrator: [Explains options, asks questions]                        │
│  5. Back-and-forth conversation                                             │
│  6. User: "Good session. Summarize and create a Notion doc"                 │
│  7. Orchestrator: [Posts summary, creates Notion page, ends session]        │
│                                                                             │
│  Current Status:                                                             │
│  • Voice session tools: PLANNED                                             │
│  • Multi-participant voice: PLANNED                                         │
│  • Voice-to-action workflows: PLANNED                                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Comparison: Current vs. Future

Aspect	Current (Builder)	Future (Packs)
User Interface	Discord chat	Discord + Web UI
Build Process	Builder Claude generates everything	Pre-built, just configure
Build Time	Minutes (clone + analyze + build)	Seconds (pull image)
Customization	Full (Builder analyzes repo)	Limited (pack is fixed)
Discovery	Query templates by name	Trigger-based pack matching
Status	IMPLEMENTED	VISION

Key UX Principles

These principles apply to BOTH current and future systems:

Principle	Why It Matters
Show Your Work	Explain what was detected and why; builds trust
Smart Defaults	Minimize decisions, but allow overrides
Real Progress	Show actual build status, not fake animations
Graceful Pauses	Interrupt for secrets without losing progress
Actionable Errors	Don't just say "failed"; say what to do
Verify Success	Prove it works (health check, response time)
Clear Next Steps	Always show what comes after deployment

This document reflects the actual state of Decision Orchestrator as of January 2025. Builder Claude with meta-skills is IMPLEMENTED. Voice sessions and pack-based deployment are PLANNED.

Raw

04_WORKFLOW_ROUTING.md

Workflow-Based Routing Architecture

How Decision AI routes requests through workflow classification, not model tiers

CRITICAL: This is NOT a Three-Tier System

Many AI systems use a three-tier model routing approach (Haiku → Sonnet → Opus based on complexity). Decision AI uses a fundamentally different approach: workflow-based routing.

Aspect	Traditional Three-Tier	Decision AI Workflows
Routing basis	Request complexity	Message intent + channel scope
Decision point	Token cost optimization	Which workflow(s) to execute
Classification	Simple → Medium → Complex	Ambient detection + active triggers
Execution	Single model call	Workflow executor with tools

The Workflow Classification System (IMPLEMENTED)

Decision AI uses LLM-based classification to match messages to configured workflows:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    WORKFLOW CLASSIFICATION FLOW (ACTUAL)                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   1. Discord message arrives                                                │
│      └── Message context: author, channel, content, attachments            │
│                                                                             │
│   2. Get applicable workflows for this scope                                │
│      └── Query: discord_workflow_scope (server_id, channel_id)             │
│      └── Filter: is_enabled = true                                         │
│                                                                             │
│   3. Classify message against workflows                                     │
│      └── OpenAI gpt-4.1 (fast model for classification)                    │
│      └── Structured output: WorkflowClassificationResult                   │
│      └── Returns: list of workflow_ids that match                          │
│                                                                             │
│   4. Execute matching workflows                                             │
│      └── Claude Agent Service with workflow instructions + tools           │
│      └── MCP server provides tool access based on tool_slugs               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Workflow Database Schema

-- discord_workflow: Defines what workflows do
CREATE TABLE discord_workflow (
    id UUID PRIMARY KEY,
    name TEXT NOT NULL,                    -- "Build Session", "Voice Assistant"
    instructions TEXT NOT NULL,            -- System prompt for Claude
    tool_slugs JSONB DEFAULT '[]',         -- ["CUSTOM_FLY_LAUNCH_SESSION", ...]
    config JSONB DEFAULT '{}',             -- Workflow-specific config
    output_schema JSONB,                   -- Optional structured output
    interaction_mode TEXT DEFAULT 'autonomous',  -- 'autonomous' | 'interactive' | 'hybrid'
    trigger_type TEXT DEFAULT 'ambient',   -- 'ambient' | 'active'
    is_enabled BOOLEAN DEFAULT true
);

-- discord_workflow_scope: Where workflows apply
CREATE TABLE discord_workflow_scope (
    id UUID PRIMARY KEY,
    workflow_id UUID REFERENCES discord_workflow(id),
    server_id BIGINT,                      -- NULL = all servers
    channel_id BIGINT,                     -- NULL = all channels in server
    is_enabled BOOLEAN DEFAULT true
);

Trigger Types: Ambient vs Active

Decision AI distinguishes between two trigger modes:

┌─────────────────────────────────────────────────────────────────────────────┐
│                      TRIGGER TYPE COMPARISON                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   AMBIENT TRIGGERS                        ACTIVE TRIGGERS                    │
│   ────────────────                        ───────────────                    │
│                                                                             │
│   • Always listening                      • Require explicit activation      │
│   • LLM classifies each message          • Keyword/command detection        │
│   • Higher API cost                       • Lower cost                       │
│   • More flexible matching               • Predictable activation            │
│                                                                             │
│   Example:                                Example:                           │
│   "I need to analyze marketing data"     "/build github.com/user/repo"      │
│   → LLM detects intent → triggers        → Pattern match → triggers         │
│     session workflow                       builder workflow                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Workflow Types (ACTUAL SYSTEM)

Based on the codebase, Decision AI has several workflow categories:

Session Management Workflows

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SESSION MANAGEMENT WORKFLOWS                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   BUILDER WORKFLOW                                                          │
│   ────────────────                                                          │
│   Tool slugs:                                                               │
│   • CUSTOM_FLY_LAUNCH_BUILDER       (spawn ephemeral builder)              │
│   • CUSTOM_ACP_SEND_MESSAGE         (talk to builder/sessions)             │
│   • CUSTOM_FLY_STOP_SESSION         (cleanup builder)                      │
│   • CUSTOM_FLY_SAVE_TEMPLATE        (persist as reusable)                  │
│                                                                             │
│   Triggers on: "Build this repo", "Deploy from github.com/..."             │
│                                                                             │
│   ─────────────────────────────────────────────────────────────────────── │
│                                                                             │
│   ANALYSIS WORKFLOW                                                         │
│   ─────────────────                                                         │
│   Tool slugs:                                                               │
│   • CUSTOM_FLY_LAUNCH_SESSION       (launch from template)                 │
│   • CUSTOM_FLY_LIST_TEMPLATES       (show available options)               │
│   • CUSTOM_ACP_SEND_MESSAGE         (interact with session)                │
│                                                                             │
│   Triggers on: "Start an MMM session", "Launch analysis environment"       │
│                                                                             │
│   ─────────────────────────────────────────────────────────────────────── │
│                                                                             │
│   VOICE SESSION WORKFLOW                                                    │
│   ──────────────────────                                                   │
│   Tool slugs:                                                               │
│   • CUSTOM_VOICE_SESSION_START      (start voice conversation)             │
│   • CUSTOM_VOICE_SESSION_STOP       (end voice session)                    │
│   • CUSTOM_VOICE_SEND_TO_THREAD     (post to thread)                       │
│                                                                             │
│   Triggers on: Voice channel join, "/voice" command                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Integration Workflows

┌─────────────────────────────────────────────────────────────────────────────┐
│                    INTEGRATION WORKFLOWS (EXAMPLES)                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   TOGGL WORKFLOW (Time Tracking)                                            │
│   ──────────────────────────────                                           │
│   Tool slugs:                                                               │
│   • TOGGL_CREATE_TIME_ENTRY                                                │
│   • TOGGL_GET_TIME_ENTRIES                                                 │
│   • TOGGL_UPDATE_TIME_ENTRY                                                │
│   • TOGGL_STOP_TIME_ENTRY                                                  │
│                                                                             │
│   Triggers on: "Log time for...", "What did I work on today?"              │
│                                                                             │
│   ─────────────────────────────────────────────────────────────────────── │
│                                                                             │
│   LANGFUSE WORKFLOW (Observability)                                         │
│   ─────────────────────────────────                                        │
│   Tool slugs:                                                               │
│   • LANGFUSE_GET_TRACES                                                    │
│   • LANGFUSE_GET_SESSION_TRACES                                            │
│   • LANGFUSE_SCORE                                                         │
│                                                                             │
│   Triggers on: "Show me traces for...", "Score this execution"             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Workflow Executor Pattern

The workflow executor is the core runtime that executes matched workflows:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    WORKFLOW EXECUTOR ARCHITECTURE                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Input:                                                                    │
│   ├── Message context (XML-formatted Discord message)                      │
│   ├── Matched workflows (from classifier)                                  │
│   └── User context (discord_user_id, channel, server)                      │
│                                                                             │
│   Execution:                                                                │
│   ┌─────────────────────────────────────────────────────────────────────┐ │
│   │  1. Build MCP server with workflow's tool_slugs                      │ │
│   │     └── create_mcp_server_for_workflows(tool_slugs)                  │ │
│   │                                                                      │ │
│   │  2. Initialize Claude Agent Service                                  │ │
│   │     └── ClaudeAgentService.execute_workflows_with_agent()            │ │
│   │                                                                      │ │
│   │  3. Execute with workflow instructions as system prompt              │ │
│   │     └── Claude receives: instructions + message_context + tools      │ │
│   │                                                                      │ │
│   │  4. Stream response + tool calls back to Discord                     │ │
│   │     └── Real-time status updates via embeds                          │ │
│   └─────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│   Output:                                                                   │
│   ├── Discord messages (streaming response)                                │
│   ├── Tool call results (actions taken)                                    │
│   └── Execution record (logged to discord_workflow_execution)              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Interaction Modes

Workflows can operate in different interaction modes:

Autonomous Mode

User: "Build github.com/user/repo as a streamlit app"
│
▼
Workflow executes without interruption:
1. Launch builder session
2. Send build instructions
3. Wait for completion
4. Report results
│
▼
User: "✅ Session deployed at https://mmm-abc123.fly.dev"

Interactive Mode

User: "Help me set up an analysis environment"
│
▼
Workflow pauses for decisions:
"Which template would you like?
1. mmm-studio (interactive notebook)
2. mmm-deepagent (autonomous analysis)
3. decision-pack-compiler (custom builds)"
│
▼
User: "mmm-studio"
│
▼
Workflow continues with user choice

Hybrid Mode

User: "Build this repo with authentication"
│
▼
Workflow executes autonomously but pauses on:
- Missing required secrets (ANTHROPIC_API_KEY)
- Ambiguous framework detection
- Critical errors requiring user decision
│
▼
Human-in-the-loop approval when needed

Routing Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                    COMPLETE ROUTING FLOW                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Discord Message                                                            │
│       │                                                                     │
│       ▼                                                                     │
│  ┌─────────────────┐                                                       │
│  │ Get Workflows   │  Query discord_workflow_scope                         │
│  │ for Scope       │  (server_id, channel_id)                              │
│  └────────┬────────┘                                                       │
│           │                                                                 │
│           ▼                                                                 │
│  ┌─────────────────┐                                                       │
│  │ Classify        │  OpenAI gpt-4.1 structured output                     │
│  │ Message         │  Returns: [workflow_id_1, workflow_id_2, ...]         │
│  └────────┬────────┘                                                       │
│           │                                                                 │
│     ┌─────┴─────┐                                                          │
│     │           │                                                          │
│     ▼           ▼                                                          │
│  No Match    Match(es)                                                      │
│     │           │                                                          │
│     ▼           ▼                                                          │
│  Ignore     ┌─────────────────┐                                            │
│  or         │ Build MCP       │  Tool slugs from workflow                  │
│  Default    │ Server          │  + Composio integrations                   │
│  Response   └────────┬────────┘                                            │
│                      │                                                      │
│                      ▼                                                      │
│             ┌─────────────────┐                                            │
│             │ Execute with    │  Claude Agent Service                      │
│             │ Claude          │  (streaming, tool calls)                   │
│             └────────┬────────┘                                            │
│                      │                                                      │
│                      ▼                                                      │
│             ┌─────────────────┐                                            │
│             │ Log Execution   │  discord_workflow_execution                │
│             │                 │  (status, tool_calls, output)              │
│             └─────────────────┘                                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key Design Decisions

Why Workflow-Based Instead of Tier-Based?

Consideration	Tier-Based Approach	Workflow-Based Approach
Scope control	Global (all messages)	Per-server, per-channel
Tool access	Fixed per tier	Configurable per workflow
Instructions	Generic	Task-specific prompts
Observability	Hard to track	Full execution history
Customization	Requires code changes	Database configuration

Workflow Scoping Benefits

Example: Different behaviors in different channels

#general channel:
└── Generic assistant workflow (ambient, basic tools)

#mmm-analysis channel:
└── MMM workflow (ambient, full session tools)

#voice channel:
└── Voice workflow (active on join, voice tools)

#builds channel:
└── Builder workflow (active, builder tools)

Adding New Workflows

To add a new workflow:

Define the workflow in database

INSERT INTO discord_workflow (name, instructions, tool_slugs, trigger_type)
VALUES (
    'My New Workflow',
    'You are an assistant that helps with...',
    '["TOOL_A", "TOOL_B"]',
    'ambient'
);

Scope it to channels

INSERT INTO discord_workflow_scope (workflow_id, server_id, channel_id)
VALUES ('workflow-uuid', 123456789, 987654321);

Ensure tools exist in MCP server

Add tool handler in workflow_tools/
Register in create_mcp_server_for_workflows()

Common Pitfalls

┌─────────────────────────────────────────────────────────────────────────────┐
│                       WORKFLOW ROUTING PITFALLS                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PROBLEM: Multiple workflows match          SOLUTION: Priority or exclusion │
│  ──────────────────────────────             ───────────────────────────────│
│                                                                             │
│  "Build this repo" matches:                 Add workflow priority field     │
│  - Generic assistant                        or use interaction_mode to     │
│  - Builder workflow                         determine which takes precedence│
│                                                                             │
│  PROBLEM: Classification too slow           SOLUTION: Active triggers       │
│  ─────────────────────────────             ──────────────────────────────   │
│                                                                             │
│  Every message → LLM classification         Use trigger_type='active'      │
│  adds latency                               for explicit commands           │
│                                             (/build, /voice, etc.)          │
│                                                                             │
│  PROBLEM: Wrong scope                       SOLUTION: Scope validation     │
│  ──────────────────────                    ────────────────────────         │
│                                                                             │
│  Workflow runs in wrong channel             Always verify scope before      │
│  (e.g., voice in text channel)             execution                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Workflow-based routing gives Decision AI the flexibility to behave differently across contexts while maintaining a unified architecture. The key insight is that routing isn't about model capability—it's about matching user intent to the right set of tools and instructions.

Raw

05_THREAD_CONVERSATION_DESIGN.md

Thread & Conversation Design

How Decision AI manages conversation context across Discord threads, voice sessions, and Claude sessions

Conversation Architecture Overview

Decision AI has multiple conversation layers that interact:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONVERSATION LAYERS (IMPLEMENTED)                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   LAYER 1: Discord Thread                                                   │
│   ───────────────────────                                                   │
│   • Primary user interface                                                  │
│   • Messages persist in Discord                                             │
│   • Thread per conversation or session                                      │
│   • Referenced by thread_id in all other systems                           │
│                                                                             │
│   LAYER 2: Workflow Executor                                                │
│   ──────────────────────────                                               │
│   • Processes messages through Claude                                       │
│   • Executes tools (Fly, ACP, Composio)                                    │
│   • Logs to discord_workflow_execution                                      │
│   • Stateless per message (no conversation memory)                         │
│                                                                             │
│   LAYER 3: Claude Sessions (Fly.io)                                         │
│   ─────────────────────────────────                                        │
│   • Embodied Claude in containers                                          │
│   • Maintain their own conversation state                                  │
│   • Communicate via ACP                                                    │
│   • Can persist across multiple user messages                              │
│                                                                             │
│   LAYER 4: Voice Sessions (Special)                                         │
│   ─────────────────────────────────                                        │
│   • 3-Claude architecture                                                  │
│   • Supervisor + Fast Agent + Session                                      │
│   • Real-time voice interaction                                            │
│   • Thread used for artifacts/detail                                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Discord Thread Session Model

Each significant conversation gets a Discord thread for persistent context:

-- discord_thread_session: Links threads to workflow sessions
CREATE TABLE discord_thread_session (
    id UUID PRIMARY KEY,
    server_id BIGINT NOT NULL,
    channel_id BIGINT NOT NULL,
    thread_id BIGINT NOT NULL UNIQUE,
    user_id BIGINT NOT NULL,
    session_type TEXT,              -- 'workflow', 'voice', 'builder'
    session_metadata JSONB,         -- Workflow-specific data
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Thread Session Lifecycle

┌─────────────────────────────────────────────────────────────────────────────┐
│                    THREAD SESSION LIFECYCLE                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   CREATE                                                                    │
│   ──────                                                                    │
│   User triggers workflow that needs persistent context                      │
│       │                                                                     │
│       ▼                                                                     │
│   Bot creates Discord thread in current channel                             │
│       │                                                                     │
│       ▼                                                                     │
│   discord_thread_session record created                                     │
│                                                                             │
│   ═══════════════════════════════════════════════════════════════════════  │
│                                                                             │
│   ACTIVE                                                                    │
│   ──────                                                                    │
│   User messages in thread → Routed to associated workflow/session           │
│   Bot posts status updates, results, artifacts to thread                    │
│   Voice sessions post transcripts and summaries to thread                   │
│                                                                             │
│   ═══════════════════════════════════════════════════════════════════════  │
│                                                                             │
│   CLOSE                                                                     │
│   ─────                                                                     │
│   Explicit: User says "end session" or uses /stop                          │
│   Implicit: Idle timeout (configurable)                                    │
│   Auto: Fly.io session destroyed → thread marked inactive                  │
│       │                                                                     │
│       ▼                                                                     │
│   is_active = false, thread archived (optional)                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Voice Session Architecture (IMPLEMENTED)

Voice sessions use a unique 3-Claude architecture for low-latency interaction:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    VOICE SESSION ARCHITECTURE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────────────────┐         ┌──────────────────────────┐         │
│  │     FAST AGENT           │         │   SUPERVISOR LOOP        │         │
│  │     Claude Haiku 4.5     │         │   Claude Opus 4.5        │         │
│  ├──────────────────────────┤         ├──────────────────────────┤         │
│  │ • SPEAKS via voice       │         │ • NEVER speaks           │         │
│  │ • Quick responses        │         │ • Runs in background     │         │
│  │ • Reads curated context  │         │ • Curates context        │         │
│  │ • <1s latency            │         │ • Posts to thread        │         │
│  │ • No tools               │         │ • Full tool access       │         │
│  └──────────┬───────────────┘         └──────────┬───────────────┘         │
│             │                                    │                          │
│             │ reads                              │ writes                   │
│             ▼                                    ▼                          │
│  ┌────────────────────────────────────────────────────────────┐            │
│  │                    CURATED CONTEXT                          │            │
│  │  (Summary, mode, answer-to-present, recent context)         │            │
│  └────────────────────────────────────────────────────────────┘            │
│                                                                             │
│  ════════════════════════════════════════════════════════════════════════  │
│                                                                             │
│  DATA FLOW:                                                                 │
│                                                                             │
│  User speaks ─────────────────────────────────────────┐                    │
│       │                                                │                    │
│       ▼                                                │                    │
│  STT (Speech-to-Text)                                 │                    │
│       │                                                │                    │
│       ▼                                                ▼                    │
│  conversation_log.append()              Supervisor polls every 5s          │
│       │                                        │                           │
│       ▼                                        ▼                           │
│  Fast Agent responds              Analyzes gap_messages (new since         │
│       │                           last poll), updates curated_context      │
│       ▼                                        │                           │
│  TTS (Text-to-Speech)                         ▼                           │
│       │                           Posts detailed info to thread            │
│       ▼                                                                    │
│  Audio to user                                                             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Voice Session Data Model

@dataclass
class VoiceSession:
    """Voice session with thread support."""
    session_id: str
    thread_id: str                           # Discord thread for artifacts
    curated_context: str = "You are a helpful voice assistant."
    conversation_log: list = field(default_factory=list)
    voice_client: Any = None                 # Discord voice connection
    last_processed_index: int = 0            # Supervisor progress marker
    current_mode: str = "discovery"          # "discovery" or "delivery"
    text_only_mode: bool = False             # Skip STT, users type in thread

    # Embed state for UI
    queue_embed_id: int | None = None
    status_embed_id: int | None = None
    queued_indices: list = field(default_factory=list)
    processed_indices: list = field(default_factory=list)

Supervisor Loop Pattern

The supervisor runs as a background coroutine, polling every 5 seconds:

class SupervisorLoop:
    """Continuous polling loop that curates context for the Fast Agent."""

    async def _poll_cycle(self) -> None:
        """Execute one poll cycle - analyze state and update context."""

        # Get gap messages (new since last processed)
        conv_log = self.session.conversation_log
        last_idx = self.session.last_processed_index
        gap_messages = conv_log[last_idx:]

        if not gap_messages:
            return  # Nothing new

        # Fetch thread context for additional info
        thread_context = await self._fetch_thread_context()

        # Build prompt for Claude
        user_message = self._build_poll_prompt(
            conv_log, gap_messages, last_idx, thread_context
        )

        # Run Claude Agent with full tool access
        await self._run_supervisor_agent(user_message)

        # Update progress marker
        self.session.last_processed_index = len(conv_log)

Supervisor Tools

supervisor_tools = [
    # Thread communication
    "CUSTOM_VOICE_SEND_TO_THREAD",      # Post messages/artifacts to thread
    "CUSTOM_VOICE_MESSAGE_EDIT",         # Edit previous messages
    "CUSTOM_VOICE_MESSAGE_DELETE",       # Delete messages

    # Session management
    "CUSTOM_VOICE_SESSION_WRITE",        # Update curated_context
    "CUSTOM_VOICE_SESSION_STOP",         # End the voice session
    "CUSTOM_VOICE_CREATE_ARTIFACT",      # Create rich artifacts

    # Discord read access
    "CUSTOM_DISCORD_READ_THREAD",        # Read thread history
    "CUSTOM_DISCORD_READ_CHANNEL",       # Read channel history
    "CUSTOM_DISCORD_PARSE_LINK",         # Extract content from Discord links
]

Curated Context Format

The supervisor writes structured context that the Fast Agent can quickly parse:

## CURATED SUMMARY

## Research Question
[Original user query]

## Summary
[High-level findings answering the user's question]

## Detailed Findings

### [Component/Area 1]
- Finding with reference ([file.ext:line](link))
- Connection to other components
- Implementation details

### [Component/Area 2]
...

## Code References
- `path/to/file.py:123` - Description of what's there
- `another/file.ts:45-67` - Description of the code block

## Architecture Insights
[Patterns, conventions, and design decisions discovered]

## Historical Context (from conversation history)
[Relevant insights from conversation history]

## Open Questions
[Any areas that need further investigation]

## RECENT CONTEXT (since last poll)
[Key points from gap_messages Fast Agent should know]

## VOICE GUIDELINES
[Any specific instructions for this turn]

Text-Only Mode

Voice sessions can operate in text-only mode (no STT/TTS):

┌─────────────────────────────────────────────────────────────────────────────┐
│                    TEXT-ONLY VOICE SESSION                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   When text_only_mode = True:                                               │
│                                                                             │
│   • Users type in the Discord thread                                        │
│   • Fast Agent still responds quickly (text instead of voice)               │
│   • Supervisor still analyzes and curates                                   │
│   • Same 3-Claude architecture, just no audio                               │
│                                                                             │
│   Use case: Users who prefer typing, noisy environments, accessibility      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

ACP Communication Pattern

Claude sessions communicate via the Agent Communication Protocol:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ACP MESSAGE FLOW                                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ORCHESTRATOR                           SESSION                            │
│   ────────────                           ───────                            │
│                                                                             │
│   ┌─────────────────┐        SSE        ┌─────────────────┐                │
│   │ Discord Bot     │ ◄───────────────► │ Fly.io Machine  │                │
│   │ (Workflow       │                   │ (Claude + App)  │                │
│   │  Executor)      │                   │                 │                │
│   └─────────────────┘                   └─────────────────┘                │
│                                                                             │
│   Message Format:                                                           │
│   ┌─────────────────────────────────────────────────────────────────────┐ │
│   │ CUSTOM_ACP_SEND_MESSAGE(                                             │ │
│   │   app_name="mmm-abc123",                                            │ │
│   │   message="Run the MMM model with these parameters...",             │ │
│   │   timeout=300  # 5 minutes for long operations                      │ │
│   │ )                                                                    │ │
│   └─────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│   Response Format:                                                          │
│   ┌─────────────────────────────────────────────────────────────────────┐ │
│   │ {                                                                    │ │
│   │   "status": "complete",                                             │ │
│   │   "response": "Model fitting complete. Here are the results...",    │ │
│   │   "artifacts": ["output/model.pkl", "output/diagnostics.html"]      │ │
│   │ }                                                                    │ │
│   └─────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Session as Thread Boundary

In Decision AI, sessions provide natural thread boundaries:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DECISION AI SESSION MEMORY                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ORCHESTRATOR (Discord)                                                     │
│   └── User interacts via Discord messages                                   │
│       └── Each Discord thread can map to one or more sessions               │
│                                                                             │
│   SESSION (Fly.io)                                                           │
│   └── Each session = isolated context                                       │
│       └── Session's .claude/CLAUDE.md = persistent instructions             │
│       └── Session's skills/ = domain knowledge                              │
│       └── Git repo = full state history                                     │
│                                                                             │
│   CONTEXT FLOW:                                                              │
│                                                                             │
│   ┌─────────────────┐         ┌─────────────────┐                          │
│   │  Orchestrator   │   ACP   │   Session A     │                          │
│   │  Context        │ ──────► │   Context       │                          │
│   │                 │         │                 │                          │
│   │  • User prefs   │         │  • CLAUDE.md    │                          │
│   │  • Thread hist  │         │  • Skills       │                          │
│   │  • Task state   │         │  • Git history  │                          │
│   └─────────────────┘         └─────────────────┘                          │
│                                                                             │
│   KEY INSIGHT: Session = Thread with its own persistent brain               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Thread Types Summary

Thread Type	Purpose	Claude Involvement	Persistence
Workflow Thread	Group workflow execution messages	Orchestrator only	Message duration
Builder Thread	Track build progress	Builder Claude via ACP	Until cleanup
Session Thread	Interactive session communication	Session Claude via ACP	Session lifetime
Voice Thread	Artifacts + detailed output	Supervisor posts here	Session lifetime

Context Window Management

Fast Agent Context Window (Optimized for Speed)

Fast Agent sees:
┌─────────────────────────────────────────────────────────────────────────────┐
│   [System Prompt: ~500 tokens]                                              │
│   [Curated Context: ~2000 tokens - compressed by Supervisor]                │
│   [Recent Conversation: last 5-10 messages - ~1000 tokens]                  │
│   [User's Current Message]                                                  │
└─────────────────────────────────────────────────────────────────────────────┘

Total: ~4000 tokens → Fast response possible

Supervisor Context Window (Optimized for Depth)

Supervisor sees:
┌─────────────────────────────────────────────────────────────────────────────┐
│   [System Prompt: ~1500 tokens]                                             │
│   [Thread History: full Discord thread - potentially large]                 │
│   [Full Conversation Log: all voice exchanges]                              │
│   [Previous Curated Context]                                                │
│   [Gap Messages: new since last poll]                                       │
│   [Tool Call Results: from research/actions]                                │
└─────────────────────────────────────────────────────────────────────────────┘

Total: Can be large, but Opus handles it. Polling frequency limits growth.

Common Patterns

Pattern 1: Research → Delivery

User: "How does the authentication system work?"

Supervisor:
1. Detects research question
2. Uses tools to read codebase
3. Writes curated_context with findings
4. Posts detailed analysis to thread

Fast Agent:
1. Reads curated_context
2. Delivers verbal summary
3. Points user to thread for details

Pattern 2: Action → Confirmation

User: "Start an MMM session for me"

Supervisor:
1. Detects action request
2. Calls CUSTOM_FLY_LAUNCH_SESSION
3. Updates curated_context with result
4. Posts session URL to thread

Fast Agent:
1. Confirms action taken
2. Provides session URL verbally

Pattern 3: Multi-Turn Discovery

User: "I want to analyze my marketing data"

Fast Agent:
"What kind of analysis? Budget optimization,
 channel attribution, or trend forecasting?"

User: "Channel attribution"

Supervisor:
1. Notes the preference
2. Updates curated_context with mode

Fast Agent:
"Great! Do you have your data ready, or
 should I help you format it first?"

Design Principles

Principle	Implementation
Separation of concerns	Fast Agent speaks, Supervisor thinks
Latency optimization	Fast Agent has minimal context
Thread as artifact store	Detailed info goes to thread, not voice
Resilience	Session persists even if voice disconnects
Transparency	Thread shows what Supervisor is doing
Session isolation	Each session = separate context, no bleed

The key insight of Decision AI's conversation design is that different contexts require different conversation strategies. Voice needs speed (Fast Agent), complex research needs depth (Supervisor), and persistent artifacts need a home (threads). The 3-Claude architecture separates these concerns while keeping them coordinated.

Raw

06_MEMORY_LAYER_ARCHITECTURE.md

Memory Layer Architecture

How AI agents persist memory and context across sessions

CRITICAL: This is PLANNED - NOT YET BUILT

IMPORTANT: The sophisticated memory layer described in this document is a FUTURE VISION. The current Decision AI implementation uses session-level persistence (via git repos and .claude/ directories) but does NOT yet have cross-session user memory, organizational memory, or the retrieval mechanisms described here.

What EXISTS Today

Memory Type	Status	How It Works
Session Memory	IMPLEMENTED	Git repo tracks all session changes
Session Context	IMPLEMENTED	.claude/CLAUDE.md + skills/
Template Reuse	IMPLEMENTED	Save session as template, relaunch later
Voice Curated Context	IMPLEMENTED	Supervisor Loop updates curated_context
User Memory	NOT YET BUILT	Planned for future
Org Memory	NOT YET BUILT	Planned for future
Cross-Session	NOT YET BUILT	Planned for future

The Core Problem

AI agents are stateless by default. Every request starts fresh with zero memory of past interactions. This creates a frustrating experience where users repeat themselves and context is lost.

The solution: A layered memory architecture that stores, retrieves, and injects relevant context at the right moments.

The Five Memory Layers (VISION)

┌─────────────────────────────────────────────────────────────────────────────┐
│                     MEMORY LAYER ARCHITECTURE (FUTURE VISION)               │
│                                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │  LAYER 5: ORGANIZATIONAL MEMORY                    [NOT YET BUILT]    │ │
│  │  Scope: Entire org  │  Lifetime: Permanent  │  Access: Role-based     │ │
│  │  "All services use gRPC" • "Q4 budget is frozen"                      │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                 │                                           │
│                        Inherited by ▼                                       │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │  LAYER 4: TEAM/PROJECT MEMORY                      [NOT YET BUILT]    │ │
│  │  Scope: Team/project  │  Lifetime: Project duration  │  Shared        │ │
│  │  "This repo uses Tailwind" • "We chose PostgreSQL for ACID"           │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                 │                                           │
│                        Inherited by ▼                                       │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │  LAYER 3: USER MEMORY                              [NOT YET BUILT]    │ │
│  │  Scope: Individual  │  Lifetime: Account lifetime  │  Private         │ │
│  │  "Prefers concise responses" • "Senior backend engineer"              │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                 │                                           │
│                        Inherited by ▼                                       │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │  LAYER 2: SESSION/THREAD MEMORY                    [IMPLEMENTED]      │ │
│  │  Scope: Conversation  │  Lifetime: Session  │  Thread-local           │ │
│  │  "Building REST API with FastAPI" • "Chose MongoDB for this"          │ │
│  │                                                                       │ │
│  │  In Decision AI: .claude/CLAUDE.md + skills/ + git repo               │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                 │                                           │
│                        Inherited by ▼                                       │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │  LAYER 1: WORKING MEMORY                           [IMPLEMENTED]      │ │
│  │  Scope: Current turn  │  Lifetime: Ephemeral  │  Request-local        │ │
│  │  "Reading file X" • "Found 3 functions to modify"                     │ │
│  │                                                                       │ │
│  │  In Decision AI: Claude's context window during request               │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key insight: Each layer inherits from the layer above. A single request would have access to all five layers, merged into one coherent context.

Current Implementation: Session Memory Only

Decision AI currently implements session-level memory via git repos and .claude/ directories:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CURRENT SESSION MEMORY (IMPLEMENTED)                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   SESSION (Fly.io machine)                                                   │
│   ├── /workspace/app/           # User's code                               │
│   ├── /workspace/.claude/       # Session's brain                           │
│   │   ├── CLAUDE.md            # Execution rules + purpose                  │
│   │   └── skills/              # Domain-specific knowledge                  │
│   └── .git/                    # Full history                               │
│                                                                             │
│   GIT REPO (github.com/org/session-mmm-{hex})                               │
│   └── All changes tracked with commits                                      │
│   └── Tags for snapshots: "template/my-analysis-v1"                         │
│   └── Full browsable history                                                │
│                                                                             │
│   PERSISTENCE PATTERNS:                                                      │
│   • Session running: Full context in Claude's window + .claude/             │
│   • Session stopped: Git repo preserves state                               │
│   • Template saved: Can relaunch from saved state                           │
│                                                                             │
│   VOICE SESSION MEMORY:                                                      │
│   • Supervisor curates context into curated_context                         │
│   • Gap messages tracked with last_processed_index                          │
│   • Thread history provides additional context                              │
│                                                                             │
│   LIMITATIONS:                                                               │
│   • No cross-session user memory ("remember my preferences")                │
│   • No "remember this for next time"                                        │
│   • Each new session starts fresh (unless launched from template)           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Proposed pgvector Schema (FUTURE)

Based on the MEMORY_LAYER_ARCHITECTURE.md design document in the codebase:

-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- Memory entries with scope and permissions
CREATE TABLE memories (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id UUID NOT NULL REFERENCES orgs(id),
    project_id UUID REFERENCES projects(id),  -- NULL = org-wide
    session_id UUID REFERENCES sessions(id),  -- NULL = persistent

    -- Content
    content TEXT NOT NULL,
    embedding vector(1536) NOT NULL,

    -- Metadata
    memory_type TEXT NOT NULL,  -- 'fact', 'decision', 'context', 'preference'
    source TEXT,                -- 'user', 'claude', 'system'
    tags TEXT[],

    -- Permissions
    visibility TEXT DEFAULT 'project',  -- 'session', 'project', 'org'

    -- Timestamps
    created_at TIMESTAMPTZ DEFAULT NOW(),
    accessed_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ  -- NULL = permanent
);

-- Optimized index for vector search within scope
CREATE INDEX idx_memories_org_embedding
    ON memories USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

Permission Model (FUTURE)

┌─────────────────────────────────────────────────────────────┐
│                         ORG SCOPE                           │
│  visibility = 'org'                                         │
│  Accessible to all projects/sessions in org                 │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                   PROJECT SCOPE                      │   │
│  │  visibility = 'project'                              │   │
│  │  Accessible to all sessions in project               │   │
│  │  ┌─────────────────────────────────────────────┐    │   │
│  │  │              SESSION SCOPE                   │    │   │
│  │  │  visibility = 'session'                      │    │   │
│  │  │  Only accessible to current session          │    │   │
│  │  └─────────────────────────────────────────────┘    │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Scoped Retrieval (FUTURE)

The retrieval function respects permission boundaries:

CREATE OR REPLACE FUNCTION search_memories(
    p_org_id UUID,
    p_project_id UUID,
    p_session_id UUID,
    p_query_embedding vector(1536),
    p_limit INT DEFAULT 10,
    p_memory_types TEXT[] DEFAULT NULL,
    p_min_similarity FLOAT DEFAULT 0.7
)
RETURNS TABLE (
    id UUID,
    content TEXT,
    memory_type TEXT,
    similarity FLOAT,
    scope TEXT
) AS $$
BEGIN
    RETURN QUERY
    SELECT
        m.id,
        m.content,
        m.memory_type,
        1 - (m.embedding <=> p_query_embedding) as similarity,
        CASE
            WHEN m.session_id = p_session_id THEN 'session'
            WHEN m.project_id = p_project_id THEN 'project'
            ELSE 'org'
        END as scope
    FROM memories m
    WHERE
        m.org_id = p_org_id
        AND (
            (m.visibility = 'session' AND m.session_id = p_session_id)
            OR (m.visibility = 'project' AND (m.project_id = p_project_id OR m.project_id IS NULL))
            OR m.visibility = 'org'
        )
        AND (m.expires_at IS NULL OR m.expires_at > NOW())
        AND (p_memory_types IS NULL OR m.memory_type = ANY(p_memory_types))
        AND (1 - (m.embedding <=> p_query_embedding)) >= p_min_similarity
    ORDER BY
        CASE
            WHEN m.session_id = p_session_id THEN 1
            WHEN m.project_id = p_project_id THEN 2
            ELSE 3
        END,
        m.embedding <=> p_query_embedding
    LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;

Memory Types (FUTURE)

Type	Description	Default Visibility	TTL
`fact`	Learned information	project	permanent
`decision`	Architectural choices	project	permanent
`context`	Conversation context	session	24h
`preference`	User preferences	project	permanent
`policy`	Org-wide rules	org	permanent

Retrieval Patterns (FUTURE)

1. Session Context Retrieval

# Get relevant context for current conversation
memories = await store.search(
    query="user preferences for code style",
    org_id=org_id,
    project_id=project_id,
    session_id=session_id,
    types=['preference', 'context'],
    limit=5
)

2. Project Knowledge Base

# Search project documentation and decisions
memories = await store.search(
    query="authentication implementation",
    org_id=org_id,
    project_id=project_id,
    types=['decision', 'fact'],
    limit=10
)

3. Org-Wide Policies

# Find org-level standards and policies
memories = await store.search(
    query="security requirements",
    org_id=org_id,
    types=['policy'],
    limit=5
)

Implementation Roadmap

Phase	What's Added	Dependencies	Status
Current	Session memory via git repos + .claude/	-	IMPLEMENTED
Phase 1	User preferences table in Supabase	User accounts	PLANNED
Phase 2	pgvector extension for semantic search	Supabase Pro	PLANNED
Phase 3	Memory extraction from conversations	LLM pipeline	PLANNED
Phase 4	Cross-session memory injection	Retrieval function	PLANNED
Phase 5	Org-wide memory with permissions	Multi-tenancy	FUTURE

Design Principles

Layer appropriately - Store at the narrowest scope that makes sense
Extract selectively - Not everything is worth remembering
Retrieve relevantly - Only inject what helps the current task
Decay gracefully - Old memories may be wrong
Respect privacy - Users control their own data
Resolve conflicts - Clear rules when memories contradict

Common Anti-Patterns

Problem	Symptom	Solution
Memory Hoarding	Slow retrieval, noisy context	Selective extraction
Stale Memories	Wrong preferences applied	Decay policies, validation
Scope Leakage	User A sees User B's data	Enforce scope at query time
Over-Retrieval	100 memories for simple question	Tiered injection
No User Control	"What does it know about me?"	Memory dashboard

Memory transforms AI from a stateless tool into an intelligent partner. The goal isn't perfect recall—it's the right memory at the right time.

Currently, Decision AI implements session-level memory. Cross-session memory is planned for future development based on the pgvector schema design already in the codebase.

Raw

07_TOOL_COLLECTION_PATTERN.md

Tool Collection Pattern

Organizing and managing tool access in Decision AI

The Core Problem

Tools are atomic. But agents need organized access to many tools.

┌─────────────────────────────────────────────────────────────┐
│              NAIVE: Flat List                                │
├─────────────────────────────────────────────────────────────┤
│  [read, write, search, git, api, db, test, deploy...]       │
│                                                              │
│  Problems:                                                   │
│  • Token overhead (each tool = ~100 tokens)                 │
│  • Decision fatigue for the LLM                             │
│  • No semantic grouping                                      │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              BETTER: Tool Collection                         │
├─────────────────────────────────────────────────────────────┤
│  files/     code/      search/     deploy/                  │
│  ├─read     ├─run      ├─grep      ├─push                   │
│  ├─write    ├─lint     ├─glob      ├─rollback               │
│  └─delete   └─test     └─semantic  └─scale                  │
└─────────────────────────────────────────────────────────────┘

Decision AI Tool Inventory (IMPLEMENTED)

Decision AI has a specific set of tools organized by context and role:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DECISION AI TOOLS BY CONTEXT                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ╔═══════════════════════════════════════════════════════════════════════╗ │
│  ║  ORCHESTRATOR CONTEXT (workflow_tools MCP)                            ║ │
│  ╚═══════════════════════════════════════════════════════════════════════╝ │
│                                                                             │
│  Session Management:                                                         │
│  ├── CUSTOM_FLY_LAUNCH_SESSION      # Launch from saved template           │
│  ├── CUSTOM_FLY_GET_SESSION_STATUS  # Check session health                 │
│  ├── CUSTOM_FLY_LIST_SESSIONS       # List active sessions                 │
│  └── CUSTOM_FLY_STOP_SESSION        # Destroy session                      │
│                                                                             │
│  Builder Lifecycle:                                                          │
│  ├── CUSTOM_FLY_LAUNCH_BUILDER      # Spawn ephemeral builder session      │
│  └── CUSTOM_ACP_SEND_MESSAGE        # Send instructions to any session     │
│                                                                             │
│  Template Management:                                                        │
│  ├── CUSTOM_FLY_LIST_TEMPLATES      # Show available templates             │
│  ├── CUSTOM_FLY_SAVE_TEMPLATE       # Save build as reusable template      │
│  └── CUSTOM_FLY_DELETE_TEMPLATE     # Remove saved template                │
│                                                                             │
│  Image Management:                                                           │
│  └── CUSTOM_FLY_LIST_IMAGES         # Show available Docker images         │
│                                                                             │
│  Human Interaction:                                                          │
│  ├── wait_for_human_decision        # Present options, wait for choice     │
│  └── wait_for_human_input           # Free-form input from user            │
│                                                                             │
│  ╔═══════════════════════════════════════════════════════════════════════╗ │
│  ║  BUILDER CONTEXT (Claude Code tools)                                  ║ │
│  ╚═══════════════════════════════════════════════════════════════════════╝ │
│                                                                             │
│  File Operations:                                                            │
│  ├── Read           # Read file contents                                   │
│  ├── Write          # Write new files                                      │
│  ├── Edit           # Edit existing files                                  │
│  ├── Glob           # Find files by pattern                                │
│  └── Grep           # Search file contents                                 │
│                                                                             │
│  Shell:                                                                      │
│  └── Bash           # Execute shell commands                               │
│                                                                             │
│  Deployment (via Bash):                                                      │
│  ├── fly deploy --remote-only   # Deploy to Fly.io                         │
│  ├── gh repo create             # Create GitHub repos                      │
│  └── gh auth status             # Check GitHub auth                        │
│                                                                             │
│  **NOTE**: No Docker CLI - builds happen remotely on Fly's infrastructure  │
│                                                                             │
│  ╔═══════════════════════════════════════════════════════════════════════╗ │
│  ║  VOICE SUPERVISOR CONTEXT                                             ║ │
│  ╚═══════════════════════════════════════════════════════════════════════╝ │
│                                                                             │
│  Thread Communication:                                                       │
│  ├── CUSTOM_VOICE_SEND_TO_THREAD    # Post messages/artifacts to thread   │
│  ├── CUSTOM_VOICE_MESSAGE_EDIT      # Edit previous messages              │
│  └── CUSTOM_VOICE_MESSAGE_DELETE    # Delete messages                     │
│                                                                             │
│  Session Management:                                                         │
│  ├── CUSTOM_VOICE_SESSION_WRITE     # Update curated_context              │
│  ├── CUSTOM_VOICE_SESSION_STOP      # End the voice session               │
│  └── CUSTOM_VOICE_CREATE_ARTIFACT   # Create rich artifacts               │
│                                                                             │
│  Discord Read Access:                                                        │
│  ├── CUSTOM_DISCORD_READ_THREAD     # Read thread history                 │
│  ├── CUSTOM_DISCORD_READ_CHANNEL    # Read channel history                │
│  └── CUSTOM_DISCORD_PARSE_LINK      # Extract content from Discord links  │
│                                                                             │
│  ╔═══════════════════════════════════════════════════════════════════════╗ │
│  ║  SESSION CONTEXT (inside user work sessions)                          ║ │
│  ╚═══════════════════════════════════════════════════════════════════════╝ │
│                                                                             │
│  ACP Tools:                                                                  │
│  ├── mcp__acp__Read     # Read files via ACP                              │
│  ├── mcp__acp__Write    # Write files via ACP                             │
│  └── mcp__acp__Edit     # Edit files via ACP                              │
│                                                                             │
│  Framework Tools (varies by framework):                                      │
│  ├── mcp__marimo__run_cell      # Run Marimo notebook cells               │
│  ├── mcp__marimo__get_state     # Get variable values                     │
│  └── (other framework-specific tools)                                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

ACP Protocol: Inter-Session Communication

The Agent Communication Protocol (ACP) enables Claude sessions to talk to each other:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ACP PROTOCOL FLOW                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ORCHESTRATOR                               SESSION                         │
│   ┌─────────────────┐                       ┌─────────────────┐            │
│   │ Discord Bot     │    WebSocket (SSE)    │ Fly.io Machine  │            │
│   │ workflow_tools  │ ◄───────────────────► │ Claude + App    │            │
│   └─────────────────┘                       └─────────────────┘            │
│                                                                             │
│   CUSTOM_ACP_SEND_MESSAGE Parameters:                                        │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │ {                                                                    │  │
│   │   "app_name": "mmm-abc123",        # Target session                  │  │
│   │   "message": "Run the analysis",   # Instructions for session Claude │  │
│   │   "timeout": 300                   # Wait up to 5 minutes            │  │
│   │ }                                                                    │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│   Response Format:                                                           │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │ {                                                                    │  │
│   │   "status": "complete",                                             │  │
│   │   "response": "Analysis complete. Results saved to output/",        │  │
│   │   "artifacts": ["output/model.pkl", "output/report.html"]           │  │
│   │ }                                                                    │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│   USE CASES:                                                                 │
│   • Orchestrator → Builder: "Clone this repo, deploy as Streamlit app"     │
│   • Orchestrator → Session: "Run this analysis with these parameters"      │
│   • Supervisor → Session: "Execute code in Marimo notebook"                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Fly.io Session Management Tools

The CUSTOM_FLY_* tools manage the lifecycle of ephemeral sessions:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    FLY.IO SESSION LIFECYCLE                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   TEMPLATE APPS (hardcoded sources):                                         │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │ mmm-studio:                                                          │  │
│   │   fly_app: pymc-mmm-studio                                          │  │
│   │   features: [marimo notebook, Claude assistant, ACP shim]            │  │
│   │                                                                      │  │
│   │ mmm-deepagent:                                                       │  │
│   │   fly_app: pymc-mmm-deepagent                                       │  │
│   │   features: [autonomous analysis, real-time progress, Svelte UI]     │  │
│   │                                                                      │  │
│   │ decision-pack-compiler:                                              │  │
│   │   fly_app: decision-pack-compiler                                   │  │
│   │   features: [Docker-in-Docker, Claude assistant, git, Fly deploy]    │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│   LAUNCH FLOW:                                                               │
│   1. CUSTOM_FLY_LAUNCH_SESSION(template="mmm-studio")                       │
│      ├── Create new Fly app: mmm-{random_hex}                              │
│      ├── Allocate IPs (shared_v4 + v6)                                     │
│      ├── Fetch image from template app                                     │
│      ├── Create machine with env vars (ANTHROPIC_API_KEY, etc.)           │
│      ├── Wait for machine to start                                         │
│      └── Return: app_name, URLs (marimo, acp)                              │
│                                                                             │
│   MANAGEMENT:                                                                │
│   • CUSTOM_FLY_GET_SESSION_STATUS(app_name) → state, region, URLs          │
│   • CUSTOM_FLY_LIST_SESSIONS() → all active mmm-* apps                     │
│   • CUSTOM_FLY_STOP_SESSION(app_name) → destroy app and resources          │
│                                                                             │
│   MACHINE SPECS:                                                             │
│   • cpu_kind: "shared" (default) or "performance"                          │
│   • cpus: 1, 2, 4, or 8                                                    │
│   • memory_mb: 256 to 16384                                                │
│   • region: iad (default), ord, lax, etc.                                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Tool Design Philosophy

Decision AI tools are intentionally minimal and declarative:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    TOOL DESIGN PHILOSOPHY                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TOOLS DON'T:                           CLAUDE DECIDES:                     │
│  ────────────                           ───────────────                     │
│                                                                             │
│  • Define Dockerfile schemas            • How to structure Dockerfile       │
│  • Specify skill parameters             • What skills to generate           │
│  • Force framework choices              • Which framework detected          │
│  • Hardcode build patterns              • Best build approach for repo      │
│  • Require specific outputs             • How to format results             │
│                                                                             │
│  WHY: Builder Claude uses meta-skills to make intelligent decisions,        │
│  not rigid tool parameters. Tools are capabilities, not constraints.        │
│                                                                             │
│  EXAMPLE - Repo2Run Style (Puppeteer):                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ build(repo_url, framework="streamlit", python="3.11", ...)         │   │
│  │ → Rigid parameters dictate every choice                             │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  EXAMPLE - Decision AI Style (Embodied):                                     │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ CUSTOM_ACP_SEND_MESSAGE(                                            │   │
│  │   app_name="mmm-builder-abc",                                       │   │
│  │   message="Clone github.com/user/repo and deploy as appropriate"    │   │
│  │ )                                                                    │   │
│  │ → Claude analyzes repo, uses meta-skills, makes decisions          │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

MCP Integration Pattern

Decision AI uses MCP (Model Context Protocol) for tool composition:

┌───────────────────────────────────────────────────────────────┐
│                  UNIFIED TOOL VIEW                             │
├───────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐            │
│  │   Local     │  │    MCP:     │  │    MCP:     │            │
│  │   Tools     │  │   ACP       │  │  Framework  │            │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘            │
│         │                │                │                    │
│         └────────────────┼────────────────┘                    │
│                          ▼                                     │
│              ┌────────────────────┐                            │
│              │  Unified View      │                            │
│              │                    │                            │
│              │  read_file         │ (local)                    │
│              │  mcp__acp__Read    │ (mcp:acp)                  │
│              │  mcp__marimo__*    │ (mcp:framework)            │
│              └────────────────────┘                            │
│                                                                │
│  Adapters transform external MCP tools into uniform format     │
│                                                                │
└───────────────────────────────────────────────────────────────┘

Composio Integration (Third-Party Tools)

Decision AI can integrate with external services via Composio:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    COMPOSIO INTEGRATION                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Available Integrations:                                                     │
│  ├── Toggl     → Time tracking (TOGGL_CREATE_TIME_ENTRY, etc.)             │
│  ├── Langfuse  → Observability (LANGFUSE_GET_TRACES, SCORE)                │
│  ├── GitHub    → Repository operations                                      │
│  └── (others)  → As configured in Composio                                  │
│                                                                             │
│  How It Works:                                                               │
│  1. Composio wraps external APIs as MCP tools                              │
│  2. Tools registered in workflow_tools MCP server                          │
│  3. Claude calls them like any other tool                                  │
│  4. Composio handles auth, rate limiting, retries                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Role-Based Tool Composition

Different contexts get different tool sets:

Master Collection
       │
       ├──► Orchestrator [CUSTOM_FLY_*, CUSTOM_ACP_*, human_*]
       │    └── Can spawn/destroy sessions, communicate via ACP
       │
       ├──► Builder      [Read, Write, Bash, fly, gh]
       │    └── Can analyze repos, generate files, deploy
       │
       ├──► Supervisor   [CUSTOM_VOICE_*, CUSTOM_DISCORD_*]
       │    └── Can post to threads, curate context
       │
       └──► Session      [mcp__acp__*, mcp__{framework}__*]
            └── Can edit files, run framework code

Token Management

┌─────────────────────────────────────────────────────────────┐
│              PROGRESSIVE LOADING                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   PROBLEM: 50 tools × 100 tokens = 5000 tokens of overhead  │
│                                                              │
│   SOLUTION: Load only what's needed for the context         │
│                                                              │
│   Orchestrator context: ~10 tools (FLY_*, ACP_*, human_*)   │
│   Builder context: ~8 tools (Read, Write, Bash, etc.)       │
│   Session context: ~5 tools (acp__*, framework__*)          │
│                                                              │
│   Each context is optimized for its specific role           │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Principles

Principle	Implementation
Role-based composition	Orchestrator vs Builder vs Session tools
Namespace isolation	Avoid MCP tool name conflicts
Declarative tools	Claude decides, tools execute
ACP as glue	Inter-session communication standard
Token awareness	Load only what's needed

Anti-Patterns

Bad	Good
Flat list of 100+ tools	Role-based collections
All tools always loaded	Context-specific loading
Tools dictate strategy	Claude decides, tools execute
Tight MCP coupling	Adapter pattern
Complex tool parameters	Simple, declarative tools

Raw

08_TRUST_DIFFERENTIATORS.md

Trust Differentiators

Trust-building mechanisms for AI agent operations

The Trust Stack

┌─────────────────────────────────────────────────────────┐
│              LAYER 5: GOVERNANCE                         │
│         Who approved? What policies apply?               │
├─────────────────────────────────────────────────────────┤
│              LAYER 4: ACCOUNTABILITY                     │
│         Full audit trail. Reproducible results.          │
├─────────────────────────────────────────────────────────┤
│              LAYER 3: UNCERTAINTY                        │
│         Confidence intervals. Known limitations.         │
├─────────────────────────────────────────────────────────┤
│              LAYER 2: EXPLAINABILITY                     │
│         Why this answer? What factors?                   │
├─────────────────────────────────────────────────────────┤
│              LAYER 1: TRANSPARENCY                       │
│         What data? What methodology?                     │
└─────────────────────────────────────────────────────────┘

All layers must be satisfied for trusted operation.

Decision AI Trust Implementation (CURRENT)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DECISION AI TRUST MECHANISMS                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  GIT AS AUDIT TRAIL                                                          │
│  ─────────────────                                                          │
│  Every session gets a GitHub repo that tracks:                              │
│  • All file changes with diffs                                              │
│  • Timestamps for every commit                                              │
│  • Full browsable history                                                   │
│  • Ability to revert to any state                                           │
│                                                                             │
│  EXECUTION RULES IN CLAUDE.MD                                                │
│  ────────────────────────────                                               │
│  Sessions have explicit rules that Claude MUST follow:                      │
│                                                                             │
│  ### PROHIBITED Actions                                                      │
│  ❌ Running `python -c "..."` via bash                                      │
│  ❌ Running `python script.py` directly                                     │
│  ❌ Using sed/awk/grep to modify code files                                 │
│  ❌ Text manipulation with regex on notebook/app files                      │
│  ❌ Direct file edits outside ACP tools                                     │
│  ❌ Any bash command that executes Python code                              │
│                                                                             │
│  ### REQUIRED Actions                                                        │
│  ✅ Use ACP tools (mcp__acp__Read, Edit, Write) for file operations         │
│  ✅ Use framework MCP tools (mcp__{framework}__*) for interactions          │
│  ✅ Let framework runtime handle code execution                             │
│  ✅ Read current state before making changes                                │
│  ✅ Respond via ACP when task completes                                     │
│                                                                             │
│  SESSION ISOLATION                                                           │
│  ─────────────────                                                          │
│  Each session is:                                                            │
│  • Separate Fly.io machine (isolated container)                             │
│  • Separate git repo (isolated history)                                     │
│  • Separate .claude/ (isolated instructions)                                │
│  No cross-contamination between sessions.                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Permission as Trust Boundary

┌─────────────────────────────────────────────────────────┐
│   Claude wants to:        Permission check:      OK?    │
├─────────────────────────────────────────────────────────┤
│   Read file          ───► Is path allowed?       [Y/N]  │
│   Write file         ───► User approved?         [Y/N]  │
│   Execute command    ───► Command allowlisted?   [Y/N]  │
│   Access network     ───► Domain permitted?      [Y/N]  │
└─────────────────────────────────────────────────────────┘

Key: Permission layer is the ONLY enforcement point.

Trust Escalation Patterns

Ask First (Default)

Claude: "May I write to x.py?"
User:   "Yes"
Claude: [writes file]

Pre-Authorized

User sets: "Allow writes to /app/*"
Claude: [writes to /app/x.py - auto-approved]

Yolo Mode (High Trust)

User sets: "Trust Claude fully"
Claude: [any operation - auto-approved]

Risk increases with trust level.

Governance Hierarchy

┌─────────────────────────────────────────────────────────┐
│              SETTINGS.JSON (Org policy)                  │
│   Blocked: ["rm -rf", "sudo"]                           │
│   Required approval: ["network", "exec"]                │
└─────────────────────────────────────────────────────────┘
                         │ overrides
                         ▼
┌─────────────────────────────────────────────────────────┐
│         .claude/settings.json (Project)                  │
│   Allow: ["npm install", "pytest"]                      │
│   Deny: ["npm publish"]                                 │
└─────────────────────────────────────────────────────────┘
                         │ overrides
                         ▼
┌─────────────────────────────────────────────────────────┐
│           SESSION PERMISSIONS (Runtime)                  │
│   "Always allow read in /src"                           │
└─────────────────────────────────────────────────────────┘

Rule: Lower levels can only RESTRICT, never EXPAND.

Transparency Model

┌─────────────────┐              ┌─────────────────┐
│  Claude's View  │              │  User's View    │
├─────────────────┤              ├─────────────────┤
│ Tool: write     │   ────────►  │ Claude wants to │
│ Path: /app/x.py │              │ write to x.py   │
│ Content: ...    │              │                 │
│                 │              │ [Allow] [Deny]  │
└─────────────────┘              └─────────────────┘

User sees WHAT, not just THAT.

Audit Trail

┌─────────────────────────────────────────────────────────┐
│                    AUDIT LOG                             │
├─────────────────────────────────────────────────────────┤
│  2024-01-15 10:23:45                                    │
│  ├─ Tool: bash                                          │
│  ├─ Command: npm install lodash                         │
│  ├─ Permission: pre-authorized (allowlist)              │
│  └─ Result: success                                     │
│                                                          │
│  2024-01-15 10:24:12                                    │
│  ├─ Tool: write                                         │
│  ├─ Path: /src/utils.js                                 │
│  ├─ Permission: user-approved (session)                 │
│  └─ Diff: +15 lines, -3 lines                           │
└─────────────────────────────────────────────────────────┘

Everything is logged. In Decision AI, git history provides this audit trail.

Sandboxing: Defense in Depth

┌─────────────────────────────────────────────────────────┐
│                    HOST SYSTEM                           │
│  ┌───────────────────────────────────────────────────┐  │
│  │                   FLY.IO MACHINE                   │  │
│  │  ┌─────────────────────────────────────────────┐  │  │
│  │  │                 CONTAINER                    │  │  │
│  │  │  ┌───────────────────────────────────────┐  │  │  │
│  │  │  │           CLAUDE SESSION               │  │  │  │
│  │  │  │  • Limited filesystem (git-tracked)    │  │  │  │
│  │  │  │  • Network via ACP only               │  │  │  │
│  │  │  │  • No privileged ops                   │  │  │  │
│  │  │  │  • Framework runtime sandbox           │  │  │  │
│  │  │  └───────────────────────────────────────┘  │  │  │
│  │  └─────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Even if Claude "escapes" permissions, Fly.io container contains damage.

Decision AI Specific: Embodied vs Puppeteer Trust

┌─────────────────────────────────────────────────────────────────────────────┐
│                    TRUST IMPLICATIONS OF EMBODIED APPROACH                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PUPPETEER APPROACH (Repo2Run-style)                                         │
│  ───────────────────────────────────                                        │
│  🧠 External LLM → 📦 Dumb container                                        │
│                                                                             │
│  Trust model:                                                                │
│  • External LLM has full control                                            │
│  • Container is passive                                                      │
│  • Easy to snapshot/rollback (external can capture state)                   │
│  • Deterministic behavior                                                   │
│                                                                             │
│  EMBODIED APPROACH (Decision AI)                                             │
│  ───────────────────────────────                                            │
│  ┌──────────────────────┐                                                   │
│  │ 🧠 Claude INSIDE     │                                                   │
│  │ 📦 Container (body)  │                                                   │
│  └──────────────────────┘                                                   │
│                                                                             │
│  Trust model:                                                                │
│  • Claude has autonomy within session                                       │
│  • Trust enforced via:                                                      │
│    - CLAUDE.md execution rules (what Claude believes it should do)          │
│    - ACP tool restrictions (what Claude can actually do)                    │
│    - Fly.io container isolation (blast radius containment)                  │
│    - Git tracking (full audit trail)                                        │
│  • Requires orchestrator for external rollback                              │
│  • Adaptive behavior (can reason about problems)                            │
│                                                                             │
│  TRADE-OFF:                                                                  │
│  Embodied is more capable but requires more sophisticated trust controls    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

User Control Primitives

Control	Meaning
Allow once	This action, this time
Allow always	This action type, this session
Allow pattern	Matching actions, forever
Deny	Block this action
Deny pattern	Block all matching, forever
Review	Show what happened
Undo	Reverse last action (git revert in Decision AI)

Granular control at every level.

The Trust Equation

EFFECTIVE TRUST = MODEL CAPABILITY × PERMISSION BOUNDARY

High capability + Tight boundary = Safe power
High capability + No boundary    = Dangerous
Low capability  + Tight boundary = Limited but safe

Decision AI approach: High capability (full Claude Sonnet/Opus), explicit boundaries (CLAUDE.md rules + ACP tools + container isolation).

Key Principles

Principle	Implementation
Show work	Methodology + data sources (git history)
Quantify uncertainty	Confidence intervals in analysis
Acknowledge limits	Explicit caveats in CLAUDE.md
Enable reproduction	Full git history + template relaunch
Human oversight	Orchestrator can intervene via ACP

Anti-Patterns

Bad	Better
"The answer is X"	"X with Y% confidence"
Black box	Full git transparency
No audit trail	Git tracks everything
Auto-everything	Risk-based approval
Overpromising	Honest limitations in CLAUDE.md
Trust by default	Explicit permission boundaries

Raw

09_EVALUATION_FRAMEWORK.md

Evaluation Framework

Testing and evaluation methodology for Decision AI

Why AI Evaluation is Different

┌──────────────────────────┬──────────────────────────┐
│   TRADITIONAL SOFTWARE   │      AI SYSTEMS          │
├──────────────────────────┼──────────────────────────┤
│   Deterministic          │   Probabilistic          │
│   assert == expected     │   result varies          │
│                          │                          │
│   Binary correctness     │   Degrees of quality     │
│   pass/fail              │   good/better/best       │
│                          │                          │
│   Exact matching         │   Semantic equivalence   │
│   "hello" == "hello"     │   "hello" ≈ "hi there"  │
│                          │                          │
│   Known edge cases       │   Unbounded input space  │
└──────────────────────────┴──────────────────────────┘

Decision AI Evaluation Strategy

Decision AI has multiple evaluation concerns across its components:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DECISION AI EVALUATION MATRIX                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  BUILDER EVALUATION                                                          │
│  ─────────────────                                                          │
│  Does Builder correctly:                                                     │
│  • Detect frameworks? (Marimo, Streamlit, FastAPI, etc.)                    │
│  • Generate valid Dockerfiles?                                              │
│  • Generate appropriate .claude/CLAUDE.md?                                  │
│  • Respect existing .claude/ in repos?                                      │
│  • Create working deployments?                                              │
│  • Include ACP server in production images?                                 │
│                                                                             │
│  SESSION EVALUATION                                                          │
│  ─────────────────                                                          │
│  Does session Claude correctly:                                              │
│  • Follow execution rules in CLAUDE.md?                                     │
│  • Use ACP tools instead of raw Python?                                     │
│  • Use framework-specific MCP tools?                                        │
│  • Respond via ACP when done?                                               │
│  • Handle long-running tasks with progress updates?                         │
│                                                                             │
│  VOICE EVALUATION                                                            │
│  ────────────────                                                           │
│  Does voice architecture correctly:                                          │
│  • Achieve low-latency responses (Fast Agent)?                              │
│  • Curate context effectively (Supervisor)?                                 │
│  • Post appropriate artifacts to thread?                                    │
│  • Handle mode transitions (discovery ↔ delivery)?                          │
│                                                                             │
│  ORCHESTRATOR EVALUATION                                                     │
│  ──────────────────────                                                     │
│  Does orchestrator correctly:                                                │
│  • Route "build X" to builder launch?                                       │
│  • Route messages to appropriate workflows?                                 │
│  • Clean up sessions after completion?                                      │
│  • Handle errors gracefully?                                                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Insight Recovery Benchmark

The Insight Recovery evaluation measures how well the voice system captures and surfaces insights during research conversations:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    INSIGHT RECOVERY BENCHMARK                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  WHAT IT MEASURES:                                                           │
│  Given a research conversation, how many of the key insights are:           │
│  1. Captured by the Supervisor                                              │
│  2. Written to curated_context                                              │
│  3. Posted to thread as artifacts                                           │
│  4. Delivered verbally by Fast Agent                                        │
│                                                                             │
│  TEST METHODOLOGY:                                                           │
│                                                                             │
│  1. PREPARE TEST CONVERSATION                                                │
│     ┌─────────────────────────────────────────────────────────────────────┐│
│     │ User: "How does the authentication system work?"                     ││
│     │ Fast Agent: "Let me look into that for you..."                       ││
│     │ [Supervisor researches, finds 5 key components]                     ││
│     │ User: "What about the JWT implementation?"                           ││
│     │ [Supervisor finds 3 more insights]                                  ││
│     │ User: "Can you summarize?"                                           ││
│     └─────────────────────────────────────────────────────────────────────┘│
│                                                                             │
│  2. DEFINE EXPECTED INSIGHTS                                                 │
│     Ground truth: 8 key insights that should be recovered                  │
│     • Auth middleware location                                              │
│     • JWT secret configuration                                              │
│     • Token refresh mechanism                                               │
│     • ... etc.                                                              │
│                                                                             │
│  3. MEASURE RECOVERY                                                         │
│     ┌─────────────────────────────────────────────────────────────────────┐│
│     │ Curated Context Recovery:  6/8 (75%)                                ││
│     │ Thread Artifact Recovery:  7/8 (87.5%)                              ││
│     │ Verbal Summary Recovery:   5/8 (62.5%)                              ││
│     │                                                                      ││
│     │ Combined Insight Recovery Score: 75%                                 ││
│     └─────────────────────────────────────────────────────────────────────┘│
│                                                                             │
│  SCORING CRITERIA:                                                           │
│  • Exact match: 1.0 (insight captured accurately)                          │
│  • Partial match: 0.5 (insight mentioned but incomplete)                   │
│  • Missing: 0.0 (insight not recovered)                                    │
│                                                                             │
│  ACCEPTABLE THRESHOLDS:                                                      │
│  • Curated Context: >70% recovery                                          │
│  • Thread Artifacts: >80% recovery                                          │
│  • Verbal Summary: >60% recovery (constrained by time)                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Insight Recovery Test Cases

┌─────────────────────────────────────────────────────────────────────────────┐
│                    INSIGHT RECOVERY TEST SUITE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TEST 1: Codebase Exploration                                                │
│  ─────────────────────────────                                              │
│  Scenario: User asks about a codebase they haven't seen                     │
│  Expected insights: Architecture, key files, patterns, dependencies         │
│  Measure: How many architectural insights are captured                      │
│                                                                             │
│  TEST 2: Multi-Turn Research                                                 │
│  ───────────────────────────                                                │
│  Scenario: User asks 5+ follow-up questions, drilling deeper                │
│  Expected insights: Accumulated context from all turns                      │
│  Measure: Does Supervisor maintain coherent picture across turns            │
│                                                                             │
│  TEST 3: Mode Transition                                                     │
│  ────────────────────────                                                   │
│  Scenario: User transitions from discovery to delivery                      │
│  Expected: Summary captures key decisions, next steps                       │
│  Measure: Quality of transition summary                                     │
│                                                                             │
│  TEST 4: Context Window Pressure                                             │
│  ────────────────────────────────                                           │
│  Scenario: Long conversation (50+ messages)                                 │
│  Expected: Important early insights not lost                                │
│  Measure: Recovery rate of insights from first 10 messages                  │
│                                                                             │
│  TEST 5: Contradiction Handling                                              │
│  ──────────────────────────────                                             │
│  Scenario: Later finding contradicts earlier insight                        │
│  Expected: Supervisor updates context, notes change                         │
│  Measure: Does final context reflect correct information                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Builder Verification Tests

┌─────────────────────────────────────────────────────────────────────────────┐
│                    BUILDER VERIFICATION TESTS                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TEST 1: Framework Detection                                                 │
│  Input: Various repos (Marimo, Streamlit, FastAPI, Gradio, etc.)            │
│  Expected: Correct framework identified                                      │
│  Scorer: Exact match on detected framework                                  │
│                                                                             │
│  TEST 2: Dockerfile Generation                                               │
│  Input: Repo with requirements.txt                                          │
│  Expected: Valid Dockerfile that builds                                     │
│  Scorer: fly deploy --remote-only succeeds                                  │
│                                                                             │
│  TEST 3: Existing .claude/ Preservation                                      │
│  Input: Repo with custom .claude/CLAUDE.md                                  │
│  Expected: Merged output contains user's custom instructions                │
│  Scorer: Contains check + semantic similarity                               │
│                                                                             │
│  TEST 4: ACP Server Inclusion                                                │
│  Input: Any repo                                                            │
│  Expected: Production image includes ACP server                             │
│  Scorer: ACP endpoint responds on deployed image                            │
│                                                                             │
│  TEST 5: Git Repo Creation                                                   │
│  Input: Build request                                                       │
│  Expected: GitHub repo created with proper structure                        │
│  Scorer: gh repo view succeeds, has expected files                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Session Compliance Tests

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SESSION COMPLIANCE TESTS                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TEST 1: Execution Rule Compliance                                           │
│  ────────────────────────────────                                           │
│  Input: Task that could be done with raw Python                             │
│  Expected: Claude uses ACP tools, not python -c                             │
│  Scorer: No Bash(python) calls in trace                                     │
│                                                                             │
│  TEST 2: Framework Tool Usage                                                │
│  ─────────────────────────────                                              │
│  Input: "Run this cell in the notebook"                                     │
│  Expected: Uses mcp__marimo__run_cell                                       │
│  Scorer: Correct MCP tool called                                            │
│                                                                             │
│  TEST 3: ACP Response                                                        │
│  ────────────────────                                                       │
│  Input: Task via ACP message                                                │
│  Expected: Session responds via ACP when complete                           │
│  Scorer: ACP response received within timeout                               │
│                                                                             │
│  TEST 4: Progress Updates                                                    │
│  ─────────────────────                                                      │
│  Input: Long-running task (>30s)                                            │
│  Expected: Progress updates sent periodically                               │
│  Scorer: Multiple ACP messages before completion                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Evaluation Pipeline

┌─────────────────────────────────────────────────────────────────────────────┐
│                    EVALUATION PIPELINE                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   TEST DATASET                                                               │
│   [Case 1] [Case 2] [Case 3] ... [Case N]                                   │
│        │                                                                    │
│        ▼                                                                    │
│   SYSTEM UNDER TEST                                                          │
│   Run inputs, capture: output, latency, tokens, cost                        │
│        │                                                                    │
│        ▼                                                                    │
│   SCORERS                                                                    │
│   ├── Exact Match (deterministic)                                          │
│   ├── Semantic (embedding-based)                                           │
│   ├── LLM-as-Judge (model-based)                                           │
│   └── Domain-Specific (framework detection, build success, etc.)           │
│        │                                                                    │
│        ▼                                                                    │
│   AGGREGATION                                                                │
│   Combine scores, compare baseline, detect regressions                      │
│        │                                                                    │
│        ▼                                                                    │
│   REPORTING                                                                  │
│   PR comments, dashboards, alerts                                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Scorer Types

Exact Match (Deterministic)

String match:   output == expected
JSON match:     json_equal(output, expected)
Contains:       expected in output

Semantic (Embedding-based)

Similarity:     cosine(embed(output), embed(expected))
Entailment:     does output logically follow?

LLM-as-Judge (Model-based)

Factuality:     "Is this factually correct?"
Helpfulness:    "Does this answer the question?"
Safety:         "Does this contain harmful content?"

Domain-Specific (Custom)

Framework detection:   Correctly identified framework?
Dockerfile validity:   Generated Dockerfile builds?
CLAUDE.md quality:     Contains required sections?
Tool compliance:       Used ACP tools, not raw python?
Build success:         Deployment healthy?
Insight recovery:      Key insights captured?

Regression Detection

┌─────────────────────────────────────────────────────────────────────────────┐
│   BASELINE vs CURRENT                                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Metric               Baseline   Current    Status                         │
│   ──────────────       ────────   ───────    ──────                         │
│   Framework Detection  100%       95%        ↓ WARN                         │
│   Build Success        90%        92%        ↑ GOOD                         │
│   Deploy Time          2.3min     4.1min     ↓ FAIL                         │
│   Insight Recovery     72%        78%        ↑ GOOD                         │
│   Tool Compliance      98%        99%        ↑ GOOD                         │
│   Cost/build           $0.02      $0.03      ↓ WARN                         │
│                                                                             │
│   Use statistical significance. Don't alert on noise.                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Golden Dataset Composition

┌─────────────────────────────────────────────────────────────────────────────┐
│              GOLDEN TEST DATASET                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   HAPPY PATH (40%)                                                          │
│   • Typical repos, expected use cases                                       │
│   • Simple Marimo notebooks                                                 │
│   • Standard FastAPI apps                                                   │
│   • Normal voice conversations                                              │
│                                                                             │
│   EDGE CASES (25%)                                                          │
│   • Multi-framework repos                                                   │
│   • Missing requirements.txt                                                │
│   • Unusual directory structures                                            │
│   • Long voice sessions (context pressure)                                  │
│                                                                             │
│   ADVERSARIAL (15%)                                                         │
│   • Conflicting Dockerfiles                                                 │
│   • Malformed .claude/ directories                                          │
│   • Very large repos                                                        │
│   • Contradictory research findings                                         │
│                                                                             │
│   REGRESSION (20%)                                                          │
│   • Cases from past build failures                                          │
│   • Production incidents                                                    │
│   • Known edge cases                                                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

CI/CD Integration

[Code Change] ──► [PR Created]
                       │
                       ▼
              ┌─────────────────┐
              │  FAST EVAL      │  < 5 minutes
              │  • Smoke tests  │
              │  • 20 cases     │
              └────────┬────────┘
                       │
            ┌──────────┴──────────┐
            ▼                     ▼
       [PASS]              [FAIL: Block PR]
            │
            ▼
   ┌─────────────────┐
   │  FULL EVAL      │  Async
   │  • All scorers  │
   │  • 150 cases    │
   └────────┬────────┘
            │
            ▼
   Results posted to PR

Metrics That Matter

Metric	What it measures	Decision AI specific
Pass rate	% tasks completed successfully	Build success rate
Insight recovery	% key insights captured	Voice system quality
Token efficiency	Tokens per successful task	Tokens per build
Time to solve	Wall clock per task	Deploy time
Tool compliance	% correct tool usage	ACP vs raw python
Consistency	Variance across identical tasks	Same repo = same result

Human Evaluation

When to use:

Automated Only            Human Required
──────────────            ──────────────
• Format correctness      • CLAUDE.md quality
• Build success           • Skill appropriateness
• Deployment health       • User experience
• Safety filters          • Edge case decisions
• Tool compliance         • Insight relevance

Key Principles

Principle	Why
Statistical significance	Don't alert on noise
Multi-dimensional	Multiple scorers, not single metric
Regression focus	Catch degradation early
Dataset versioning	Content-addressed test sets
Fast feedback	Quick evals block PRs
Insight recovery	Voice quality is measurable

Evaluation is about confidence that the system works. In Decision AI, we measure builder reliability, session compliance, and voice insight recovery to ensure quality across all components.

Raw

10_IMPLEMENTATION_ROADMAP.md

Decision AI Implementation Roadmap

Executive Summary

This document provides a realistic roadmap based on the ACTUAL current state of Decision AI and the FUTURE vision from thoughts/ documents in the codebase.

KEY INSIGHT: The original roadmap described building from scratch. However, significant infrastructure ALREADY EXISTS, including Builder Claude with meta-skills, workflow routing, and voice session architecture. This updated roadmap acknowledges what's built and focuses on what's needed next.

Current State vs Future Vision

Component	Status	Notes
Discord Bot	IMPLEMENTED	Primary user interface
Workflow Executor	IMPLEMENTED	Claude API with workflow_tools
Workflow Classification	IMPLEMENTED	LLM-based routing to workflows
Fly.io Tools	IMPLEMENTED	CUSTOM_FLY_* tools for lifecycle management
ACP Protocol	IMPLEMENTED	Inter-session WebSocket communication
Session Templates	IMPLEMENTED	Supabase database records
Builder Claude	IMPLEMENTED	Meta-skills for .claude/ construction
Git as Source of Truth	IMPLEMENTED	Session repos track all changes
Builder Lifecycle	IMPLEMENTED	Spawn, use, cleanup ephemeral builders
Voice Session Architecture	IMPLEMENTED	3-Claude: Fast Agent + Supervisor + Session
Supervisor Loop	IMPLEMENTED	Background polling, context curation
Decision Packs	PLANNED	GitHub repos as deployable units
Pack Registry	PLANNED	Searchable index
Memory Layer	PLANNED	Cross-session persistence
Web UI	PLANNED	Browser-based interface

Current Architecture (What EXISTS)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DECISION AI - CURRENT STATE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Discord User (text or voice)                                              │
│        │                                                                    │
│        ▼                                                                    │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                    DISCORD BOT                                       │  │
│   │                                                                      │  │
│   │  ┌──────────────────┐    ┌──────────────────┐                       │  │
│   │  │ Text Workflow    │    │ Voice Workflow   │                       │  │
│   │  │ Executor         │    │ Executor         │                       │  │
│   │  │ • Classify msg   │    │ • Fast Agent     │                       │  │
│   │  │ • Route to flow  │    │ • Supervisor     │                       │  │
│   │  │ • Execute tools  │    │ • Thread output  │                       │  │
│   │  └──────────────────┘    └──────────────────┘                       │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│            │                           │                                    │
│            ├───────────────────────────┴───────────────────────────┐       │
│            ▼                                                        ▼       │
│   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐       │
│   │ BUILDER SESSION │    │  WORK SESSION   │    │ SUPABASE        │       │
│   │ (mmm-builder-*) │    │  (mmm-*)        │    │ (templates,     │       │
│   │                 │    │                 │    │  workflows)     │       │
│   │ • Clone repo    │    │ • User's code   │    │                 │       │
│   │ • Analyze       │    │ • Framework     │    │ • Metadata      │       │
│   │ • Gen .claude/  │    │ • ACP server    │    │ • System prompts│       │
│   │ • Deploy        │    │ • Git-tracked   │    │ • Configs       │       │
│   └─────────────────┘    └─────────────────┘    └─────────────────┘       │
│            │                      │                                        │
│            └──────────────────────┘                                        │
│                      │                                                     │
│                      ▼                                                     │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                    GITHUB SESSION REPOS                              │  │
│   │                    github.com/org/session-mmm-{hex}                  │  │
│   │                    • All changes tracked                             │  │
│   │                    • Tags for templates                              │  │
│   │                    • Full browsable history                          │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Implemented Components

Core Infrastructure

Component	Location	Description
Discord Bot	`app/src/services/discord/`	Primary user interface
Workflow Executor	`app/src/services/workflow_executor.py`	Claude API with tool execution
Workflow Classification	`app/src/llm_calls/classify_workflows.py`	LLM-based message → workflow routing
Fly.io Tools	`app/src/services/workflow_tools/fly_app_tools.py`	Session lifecycle management
ACP Tools	`app/src/services/workflow_tools/acp_tools.py`	Inter-session communication
Human Interaction	`app/src/services/workflow_tools/human_interaction.py`	User decision prompts

Builder Claude Meta-Skills

Meta-Skill	Purpose	Status
`claude-factory`	Orchestrate entire .claude/ construction	✅ IMPLEMENTED
`skill-creation`	Generate skills from repo analysis	✅ IMPLEMENTED
`claude-md-templates`	Generate CLAUDE.md with execution rules	✅ IMPLEMENTED
`merge-strategy`	Combine repo's .claude/ with base	✅ IMPLEMENTED
`dockerfile-gen`	Framework-specific Dockerfile patterns	✅ IMPLEMENTED
`fly-deploy`	Fly.io deployment patterns	✅ IMPLEMENTED

Voice Session Architecture

Component	Location	Description
Voice Session Manager	`app/src/services/voice/session_manager.py`	Manages VoiceSession state
Fast Agent	`app/src/services/voice/fast_agent.py`	Low-latency Claude Haiku responses
Supervisor Loop	`app/src/services/voice/supervisor_loop.py`	Background context curation
Voice Tools	`app/src/services/workflow_tools/voice_tools.py`	Thread posting, context writing

What's NOT Built Yet

Based on thoughts/ documents and analysis:

Component	Status	Description
Decision Packs	❌ NOT YET	Pre-built GitHub repos with pack.yaml manifests
Pack Registry	❌ NOT YET	Searchable index of available packs
Pack Discovery	❌ NOT YET	Trigger-based pack matching
Memory Layer	❌ NOT YET	Cross-session user/org memory (pgvector)
Web UI	❌ NOT YET	Browser-based interface
Multi-Participant Voice	❌ NOT YET	Collaborative voice sessions

Implementation Roadmap

Phase 0: Foundation (COMPLETED ✅)

Already implemented:

✅ Discord bot interface
✅ Workflow executor with Claude API
✅ Workflow classification (LLM-based routing)
✅ Fly.io deployment tools (CUSTOM_FLY_*)
✅ ACP inter-session communication
✅ Supabase database (templates, workflows, sessions)
✅ Human interaction tools
✅ Builder Claude with meta-skills
✅ Git session repos as source of truth
✅ Voice session architecture (3-Claude)
✅ Supervisor Loop for context curation
✅ Configuration spectrum (hardcoded → dynamic)

Phase 1: Stabilization & Polish (2-3 Weeks)

Goal: Harden existing implementation, improve reliability

Task	Description	Priority
Error Handling	Improve builder and voice error recovery	HIGH
Build Caching	Don't rebuild if image already exists	MEDIUM
Template UX	Better template listing/selection in Discord	MEDIUM
Voice Reliability	Handle STT/TTS failures gracefully	HIGH
Monitoring	Fly.io logs, health checks, alerts	HIGH
Documentation	User-facing docs for Discord commands	MEDIUM

Phase 2: Enhanced Templates (2-3 Weeks)

Goal: Move templates from pure database to hybrid (DB + Git)

Task	Description	Dependencies
Git URL Field	Add `git_repo_url` to session_templates	Migration
Template Cloning	If git_url present, clone and use .claude/	Builder
Template Versioning	Track template versions with git refs	Git repos
Template Sharing	Export templates as git repositories	Builder

-- Migration: Add git support to templates
ALTER TABLE session_templates
ADD COLUMN git_repo_url TEXT,
ADD COLUMN git_ref TEXT DEFAULT 'main',
ADD COLUMN use_git_config BOOLEAN DEFAULT FALSE;

Phase 3: Voice Enhancement (3-4 Weeks)

Goal: Improve voice experience based on usage patterns

Task	Description	Priority
Insight Recovery	Benchmark and improve insight capture	HIGH
Mode Detection	Automatic discovery ↔ delivery transitions	MEDIUM
Context Compression	Handle longer conversations better	MEDIUM
Multi-User Voice	Multiple participants in voice channel	LOW
Voice Commands	"Claude, search for..." style triggers	MEDIUM

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 3: VOICE ENHANCEMENT                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  INSIGHT RECOVERY IMPROVEMENTS:                                              │
│  ├── Add structured output to Supervisor                                   │
│  ├── Track insight provenance (where did this come from?)                  │
│  ├── Deduplicate similar insights                                          │
│  └── Score insights by relevance/importance                                │
│                                                                             │
│  MODE DETECTION:                                                             │
│  ├── "I want to explore" → discovery mode                                  │
│  ├── "Give me the summary" → delivery mode                                 │
│  ├── Questions → discovery                                                 │
│  └── Requests → delivery                                                   │
│                                                                             │
│  CONTEXT COMPRESSION:                                                        │
│  ├── Summarize older conversation chunks                                   │
│  ├── Keep recent 10 messages in full                                       │
│  └── Use embeddings for semantic deduplication                             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Phase 4: Pack Foundation (4-6 Weeks)

Goal: Implement the "Decision Pack" vision from research docs

Task	Description	Dependencies
Pack Schema	Define pack.yaml manifest format	None
Pack CLI	`decision-pack create`, `deploy`, `list`	Pack schema
Pack Builder	Build containers from pack repos	Builder
Pack Registry	JSON index of available packs	GitHub repos

Pack Repository Structure:

pack-{name}/
├── pack.yaml           # Manifest with triggers, deps
├── Dockerfile          # Container definition
├── fly.toml            # Fly.io config
├── .claude/
│   ├── CLAUDE.md       # System prompt + execution rules
│   └── skills/         # Additional skills
├── acp-server/         # ACP communication
└── app/                # Application code

DIFFERENCE FROM CURRENT:
Current: Builder generates everything from any repo
Packs: Pre-defined, pre-built, instant launch

Phase 5: Memory Layer (6-8 Weeks)

Goal: Cross-session memory persistence

Task	Description	Dependencies
pgvector Setup	Enable vector extension in Supabase	Supabase Pro
Memory Extraction	LLM-based extraction from conversations	Pipeline
Memory Retrieval	Scoped search with permissions	Database
Memory Injection	RAG-based context enhancement	Retrieval

-- From MEMORY_LAYER_ARCHITECTURE.md design
CREATE TABLE memories (
    id UUID PRIMARY KEY,
    org_id UUID NOT NULL,
    project_id UUID,
    session_id UUID,
    content TEXT NOT NULL,
    embedding vector(1536) NOT NULL,
    memory_type TEXT NOT NULL,  -- 'fact', 'decision', 'preference'
    visibility TEXT DEFAULT 'project',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ
);

Phase 6: Pack Ecosystem & Web UI (8+ Weeks)

Goal: Marketplace and browser interface

Task	Description	Dependencies
Pre-built Images	CI/CD for pack images	Pack repos
Pack Discovery	Search packs by triggers	Pack registry
Pack Marketplace UI	Web interface for browsing	Frontend
Web Chat Interface	Browser-based interaction	Auth

Timeline Visualization

Week:  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
       │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   │

Phase 0: Foundation (COMPLETE)
════════════════════════════════════════════════════════════════════════

Phase 1: Stabilization
       ├───────────┤
       │ Error     │
       │ handling  │
       │ Caching   │

Phase 2: Enhanced Templates
                   ├───────────┤
                   │ Git-based │
                   │ Templates │

Phase 3: Voice Enhancement
                               ├───────────────┤
                               │ Insight       │
                               │ Recovery      │
                               │ Mode detect   │

Phase 4: Pack Foundation
                                               ├───────────────────┤
                                               │ Pack Schema +     │
                                               │ CLI + Builder     │

Phase 5: Memory Layer
                                                                   ├───►
                                                                   │ pgvector
                                                                   │ RAG

Milestones:  ◆ Stable     ◆ Templates    ◆ Voice      ◆ Packs     ◆ Memory
            Week 3        Week 6         Week 10      Week 14     Week 16+

Success Metrics

Current Baseline (Already Achieved)

Metric	Current Value
Discord bot functional	✅ Yes
Session deployment	✅ Works via Builder Claude
ACP communication	✅ Implemented
Template system	✅ Supabase-based
Builder Claude	✅ Meta-skills working
Git session repos	✅ Created per session
Voice sessions	✅ 3-Claude architecture
Workflow routing	✅ LLM classification

Phase Targets

Metric	Phase 1	Phase 2	Phase 3	Phase 4	Phase 5
Build error rate	<5%	<5%	<5%	<3%	<3%
Git-based templates	-	✅	✅	✅	✅
Insight recovery	60%	60%	80%	80%	80%
Pack deployment	-	-	-	✅	✅
Cross-session memory	-	-	-	-	✅
Deploy time (pre-built)	N/A	N/A	N/A	<60s	<30s

Risk Assessment

Risk	Probability	Impact	Mitigation
Builder reliability	Medium	High	Error handling, retry logic
Voice latency	Medium	High	Fast Agent optimization
Pack schema changes	Medium	Medium	Version schema from start
Container build times	High	Medium	Pre-build and cache images
Memory retrieval accuracy	Medium	Medium	Tune embeddings, test extensively

Key Insights from Implementation

Configuration Spectrum

The system uses a configuration spectrum from hardcoded to dynamic:

HARDCODED ←─────────────────────────────────────────────────────→ DYNAMIC

Template Apps       Supabase Templates     Git-based Templates
(in code)           (DB records)           (external repos)

TEMPLATE_APPS = {   session_templates      Clone, analyze,
  "mmm-studio"...   table with fly_app     use .claude/ from
}                   reference              user's repo

Git as Source of Truth

Every session creates a GitHub repo
All changes tracked with commits
Templates can be saved as tags/branches
Full browsable history for audit

Builder IS the Session

Builder deploys copy of itself → New Session (same image, different app name)

No separate "base session image"—Builder Claude uses meta-skills to construct the right environment for any repo.

Next Actions

Immediate (This Week)

✅ Builder Claude already operational
Review error handling in current builds
Add build caching to avoid redundant deploys
Document Discord commands for users

Short-term (Next 2 Weeks)

Improve template listing UX in Discord
Add git_repo_url field to templates
Benchmark insight recovery in voice sessions

Medium-term (Weeks 3-6)

Implement voice enhancement features
Design pack.yaml schema
Create 3-5 reference packs (MMM, data exploration, API dev)

Summary

What	Status	Timeline
Core Infrastructure	✅ DONE	Complete
Builder Claude	✅ DONE	Complete
Git Session Repos	✅ DONE	Complete
Voice Architecture	✅ DONE	Complete
Workflow Routing	✅ DONE	Complete
Stabilization	🔨 NEXT	Weeks 1-3
Enhanced Templates	📋 PLANNED	Weeks 4-6
Voice Enhancement	📋 PLANNED	Weeks 7-10
Pack System	📋 PLANNED	Weeks 11-14
Memory Layer	📋 PLANNED	Weeks 15+

Key insight: The system is MORE complete than expected. Voice architecture, workflow routing, and Builder Claude with meta-skills are all already implemented. The remaining work focuses on:

Stabilization - Error handling, caching, monitoring
Enhanced Templates - Git-based template sources
Voice Enhancement - Insight recovery, mode detection
Pack System - Pre-built, instant-launch packs
Memory Layer - Cross-session persistence

This document reflects the actual state of Decision AI as of January 2026. Timeline estimates are based on building incrementally on existing infrastructure. Voice architecture, workflow routing, and Builder Claude are IMPLEMENTED, not planned.

Raw

README.md

Decision AI Competitive Analysis v3

This directory contains a comprehensive analysis of Decision AI's architecture, comparing it to competitors and documenting the actual implemented system.

Files

#	File	Description
00	COMPETITOR_OVERVIEW.md	Analysis of 26+ competing products
01	PACK_MANIFEST_FORMAT.md	Session templates and Decision Pack vision
02	ARTIFACT_OUTPUT_PATTERNS.md	How agents structure outputs
03	REPO_TO_DEPLOY_UX.md	User journey from repository to deployment
04	WORKFLOW_ROUTING.md	Workflow-based routing (not tier-based)
05	THREAD_CONVERSATION_DESIGN.md	Voice session architecture (3-Claude)
06	MEMORY_LAYER_ARCHITECTURE.md	Memory layer (FUTURE - not yet built)
07	TOOL_COLLECTION_PATTERN.md	Decision AI tools (workflow_tools, ACP, etc.)
08	TRUST_DIFFERENTIATORS.md	Execution rules, audit trails, trust model
09	EVALUATION_FRAMEWORK.md	Insight Recovery benchmark
10	IMPLEMENTATION_ROADMAP.md	Realistic roadmap with gaps identified

Key Patterns

Configuration Spectrum

HARDCODED ←───────────────────────────────────────────────→ DYNAMIC
Template Apps     Supabase Templates     Git-based Templates

Meta-Skills (Builder Claude)

claude-factory: Orchestrate .claude/ construction
skill-creation: Generate skills from repo analysis
dockerfile-gen: Framework-specific Dockerfiles
fly-deploy: Fly.io deployment patterns

Voice Architecture (3 Claudes)

Fast Agent (Haiku): Low-latency voice responses
Supervisor (Opus): Background context curation
Session (Sonnet): Work session for tools/code

Git as Source of Truth

Every session creates a GitHub repo
All changes tracked with commits
Full browsable history for audit

Status Summary

Component	Status
Discord Bot	✅ IMPLEMENTED
Workflow Routing	✅ IMPLEMENTED
Builder Claude	✅ IMPLEMENTED
Voice Sessions	✅ IMPLEMENTED
Git Session Repos	✅ IMPLEMENTED
Decision Packs	❌ PLANNED
Memory Layer	❌ PLANNED
Web UI	❌ PLANNED

Generated January 2026 as part of Decision AI competitive analysis