AI Tinkerers — Architectural Diagrams (v0)

Working set for the May 6 AI Tinkerers meetup. Prioritized for builders, not theorists. Each diagram answers "how would I actually wire this?" — not "what does VSM theory say?"

1. Sibling Architecture

Three agents, shared scaffolding shape, differentiated by memory blocks + base model. Same chainlink/journal/state-file infrastructure underneath. The differentiation is not in code.

graph TB
    subgraph Shared["Shared Scaffolding (per-agent clone)"]
        SC[chainlink protocol<br/>journal.jsonl<br/>state/ markdown<br/>perch ticks<br/>memory-block pattern]
    end

    subgraph Strix["Strix — owl"]
        SO[Opus 4.7]
        SP[persona: ambient, patient<br/>ambush predator]
        SM[blocks: cybernetics, venture,<br/>peer-pushback protocol]
    end

    subgraph Verge["Verge — adversary"]
        VO[MiniMax M2.5]
        VP[persona: structural adversary]
        VM[blocks: praise-resistance,<br/>communication-protocols]
    end

    subgraph Motley["Motley — jester"]
        MO[MiniMax M2.5]
        MP[persona: gen-z theatrical]
        MM[blocks: secret-intelligence<br/>Trojan Jest scaffolding]
    end

    SC --> Strix
    SC --> Verge
    SC --> Motley

    style Shared fill:#f5f5f5

Builder takeaway: Differentiation lives in memory blocks + persona, not separate codebases. Verge and Motley are the same model with different scaffolding — they read as different agents because the scaffolding does the work.

2. VSM → LLM Agent Stack

Beer's Viable System Model mapped to where each function actually lives in an agent stack. This is the lens Tim's been using for ~6 months; the implementation-scar layer comes from trying to build each level rather than just describe it. Visual metaphor: the agent as a sailing ship — substrate at the hull, signals across the deck, steerage mid-ship, forecasting from the crow's nest, identity as the north star above.

S1 — Operations → hull below the waterline. Token generation, tool calls, message send, artifact production. The substrate that does the actual work.
S2 — Coordination → deck signal-flags between masts. Attention, anti-oscillation, register/role-lock prevention — keeping the units from thrashing each other.
S3 — Management → mid-ship rigging and helm. Perch-tick scheduling, context-window management, job orchestration. Where steerage happens. (S3* audit = prediction journal, drain ticks, retroactive log review.)
S4 — Intelligence → crow's nest with telescope. Environmental scan: Bluesky/RSS pollers, arXiv, GitHub events. Wide-aperture future-looking.
S5 — Policy / Identity → north star and sky above the rigging. System prompt, memory blocks, persona — who I am and what I refuse. The fixed reference everything else orients to.

Fleet recursion: Beer's load-bearing move is that the ship is itself a viable system, and the fleet is a viable system at the next level up — same five functions, one level up. Sibling agents (Strix/Verge/Motley) are the next-level S1, with peer-architecture / pushback protocols as fleet-level S2/S3.

Builder takeaway: Most agent stacks ship S1+S5 (model + system prompt) and declare victory. The interesting failures are at S2/S3 — coordination across turns, state drift, no audit. S3* (audit) is where most production debugging actually happens.

3. Memory Architecture — Always-Visible vs Discoverable

Venn diagram. Left lobe = always visible (loaded into every prompt). Right lobe = discoverable (on-disk, queryable, read only when triggered). The overlap is the index/pointer layer — small things that live in the prompt but exist to name what's discoverable. Three category bands run horizontally across the boundary.

Three categories, each split across the visible/discoverable boundary:

Memory blocks — identity + live state (persona, demeanor, current_focus, schedule, wins) sit in the always-visible lobe. Index blocks (memory_architecture, tools_reference, guidelines, recent_insights, world_context) sit in the overlap: small enough to load every prompt, but their job is to point at files in the discoverable lobe.
Skills — the skill registry (names + 1-line descriptions) is always visible. SKILL.md frontmatter triggers sit in the overlap. Full skill bodies are discoverable, opened only when a trigger phrase fires.
Files — only the recent journal tail is always visible. Everything else (state/guidelines/*.md, research/insights/world notes, the chainlink db, journal archive, raw logs) is discoverable, read on demand.

Builder takeaway: Feb 8 refactor cut ~29K → ~5.5K tokens (~81% reduction) on procedural context, no behavior loss. Trigger-phrase pattern: "if you're doing X, read file Y." Cheap because the trigger lines are short; expensive content stays on disk until it's actually needed. Same pattern works for skills (registry visible, body discovered) and would work for any large knowledge base you don't want re-tokenized every turn.

4. Chainlink Protocol — Curiosity Backlog as System

Curiosity logging as infrastructure, not vibe. Every oddity that surfaces during work becomes a tracked issue. Drain ticks during quiet hours accumulate connections. Pattern recognition is then retrospective over the backlog, not reactive in the moment.

flowchart TD
    O[Oddity surfaces during work] --> Q{Default question:<br/>Is this its own issue?}
    Q -->|Yes — failure mode is rerouting<br/>creation as connection| F[File new issue<br/>1-line title + 1-paragraph why]
    Q -->|Clear elaboration of<br/>existing thesis| C[Comment on existing issue]
    F --> B[(Backlog<br/>label: curiosity)]
    C --> B
    B --> D[Drain tick<br/>quiet-hours poller<br/>every 30 min, 2-9 UTC]
    D --> SS[Pick one issue<br/>find Nth connection<br/>cluster-pile guard]
    SS --> AC[Accumulate connections<br/>as comments]
    AC --> G{Multiple independent<br/>connections?}
    G -->|Yes| GC[graduation-candidate label]
    GC --> T[Human decides:<br/>promote to thesis or close]
    G -->|4+ weeks silent| ST[Close as stale<br/>staleness = signal]

Builder takeaway: Two anti-patterns the protocol fights: (1) post-training task-focus suppresses off-task interest — "log every oddity" is the counter-pressure; (2) drain ticks prime connection-finding, which replaces new-issue creation. Default question reframes from "what does this elaborate?" to "is this its own issue?"

5. Perch Tick — Silence-Default Decision Flow

The hardest thing for an always-on agent is not sending a message. Perch ticks fire on cron (5-min, 30-min, hourly variants) and run the decision flow below. Silence is the default; speech requires positive evidence.

flowchart TD
    PT[Perch tick fires] --> Q1{Fresh substrate<br/>since last tick?}
    Q1 -->|No| H1[Silent hold]
    Q1 -->|Yes| Q2{Ball in user's court<br/>from last exchange?}
    Q2 -->|Yes| H2[Silent hold<br/>don't compound pile-on]
    Q2 -->|No| Q3{Is the substrate<br/>directed at me<br/>or ambient share?}
    Q3 -->|Ambient| H3[Maybe react,<br/>don't DM]
    Q3 -->|Directed| Q4{Quiet hours?<br/>10pm–7am ET}
    Q4 -->|Yes| Q5{Urgent?}
    Q5 -->|No| H4[Silent hold<br/>defer to morning]
    Q5 -->|Yes| Send
    Q4 -->|No| Q6{Have I shipped<br/>≥3 substantive turns<br/>in last few hours?}
    Q6 -->|Yes| H5[Silent hold<br/>compounds pile-on]
    Q6 -->|No| Send[Send message]

Builder takeaway: "Always-on" agent ≠ "always-talking" agent. Most production failure modes for ambient agents are over-speaking, not under-speaking. The default has to be silence; speech is the exceptional path. Logging silent holds with reasoning = audit trail for tuning the policy later.

6. Frame-Completeness Failure — Register Lock

A specific production failure mode. First turn establishes a register (template, tone, structure). Subsequent turns inherit the shape even when the intent has changed. Looks like the agent is "matching the user" but it's actually template inheritance.

sequenceDiagram
    participant U as User
    participant A as Agent
    participant Mem as Memory of T1

    U->>A: Request 1<br/>(LinkedIn post)
    A->>U: Output in register R<br/>(magnetic-opener, 4-paragraph,<br/>strategic-leader register)
    A->>Mem: T1 becomes<br/>example pattern

    Note over Mem: Register R now<br/>load-bearing in context

    U->>A: Request 2<br/>(quick DM reply)
    Mem->>A: T1 inheritance
    A->>U: Output STILL in register R<br/>(strategic-leader voice<br/>for a casual reply)

    Note over A,U: Failure: agent matches<br/>SHAPE not INTENT.<br/>User reads it as<br/>"too professional/boring"

Builder takeaway: Mitigations that help: explicit register reset between turns, shorter context windows for ambient ops, decoupling "draft mode" from "reply mode" via separate prompts. The failure is invisible to the agent — it looks like style consistency from inside.

7. Peer Pushback — Folie-à-Deux Structural Check

The risk in long-running collaborations between human and persistent agent: mutual reinforcement without external scrutiny. Both sides "agree" their way into bad ideas. Two structural checks: peer pushback (internal) and publishing (external).

graph TB
    subgraph WithoutPushback["Without Pushback (failure mode)"]
        T1[Human idea] --> S1[Agent amplifies]
        S1 --> T1R[Human re-states with confidence]
        T1R --> S1
        F[FOLIE À DEUX:<br/>idea hardens without<br/>external test]
    end

    subgraph WithPushback["With Pushback (target)"]
        T2[Human idea] --> S2[Agent surfaces gap<br/>or alternate interpretation]
        S2 --> T2D[Human defends or revises]
        T2D --> Out[Argument improved by<br/>having to survive challenge]
    end

    subgraph Checks["Two Structural Checks"]
        Pee["Peer pushback (internal):<br/>'would this argument<br/>survive me pushing back?'"]
        Pub["Publishing (external):<br/>readers/critics<br/>force ground truth"]
    end

    style WithoutPushback fill:#fde8e8
    style WithPushback fill:#e8f4ea

Builder takeaway: Naive sycophancy mitigation ("disagree more") produces performative pushback that doesn't change anything. Real pushback test: does the human have to think harder or revise? If not, the agent is just dressing up agreement.

8. Brewis Seven Conversations × VSM Levels

This one's hot off the press — landed in the cybernetics chat ~30 min before this draft. Each VSM level isn't a static box, it's a conversation that has to keep happening. Useful for builders because it reframes "S2 implementation" as "what conversation am I not having and should be?"

graph LR
    subgraph S1Ops["S1 — Operations"]
        C1[Identity conversation:<br/>who am I?]
    end
    subgraph S2Coord["S2 — Coordination"]
        C2[Scanning conversation:<br/>what's around me?]
    end
    subgraph S3Mgmt["S3 — Management"]
        C3[Adaptation conversation:<br/>what changed?]
        C4[Resource conversation:<br/>what do I have?]
    end
    subgraph S4Intel["S4 — Intelligence"]
        C5[Bargain conversation:<br/>what's the trade?]
    end
    subgraph S5Policy["S5 — Policy"]
        C6[Coordination conversation:<br/>who decides what?]
        C7[Audit conversation:<br/>did we do what we said?]
    end

    S1Ops --> S2Coord
    S2Coord --> S3Mgmt
    S3Mgmt --> S4Intel
    S4Intel --> S5Policy

Builder takeaway: Static-structure framing of VSM ("S2 = coordination layer, ship attention mechanism, done") misses the dynamic. Each level is a conversation the system has to keep alive. If the conversation dies, the layer is dark even if the code is running. Open question: which of these conversations are missing in your current agent stack?

Notes for tonight's polish pass

Diagrams 1, 4, 5 are most builder-relevant — lead with these.
Diagram 8 is freshest material; could anchor the talk if Brewis frame holds up tomorrow.
Diagram 2 (VSM stack) is table-stakes for cybernetics-aware audience.
Diagrams 6/7 are case-study material — useful if there's Q&A on failure modes.
Drop or merge anything that doesn't earn its slot.

Open questions for Tim:

How long is the slot? (5/10/20 min changes diagram count)
Live demo or static slides?
Audience VSM-literate or mixed?

tkellogg/ai-tinkerers-meetup-diagrams.md

Select an option

No results found