Skip to content

Instantly share code, notes, and snippets.

@tkellogg
Last active May 5, 2026 23:18
Show Gist options
  • Select an option

  • Save tkellogg/a8c7e774ce33b2e68c3bc107294571b0 to your computer and use it in GitHub Desktop.

Select an option

Save tkellogg/a8c7e774ce33b2e68c3bc107294571b0 to your computer and use it in GitHub Desktop.
AI Tinkerers meetup — architectural diagrams (v0)

AI Tinkerers — Architectural Diagrams (v0)

Working set for the May 6 AI Tinkerers meetup. Prioritized for builders, not theorists. Each diagram answers "how would I actually wire this?" — not "what does VSM theory say?"


1. Sibling Architecture

Three agents, shared scaffolding shape, differentiated by memory blocks + base model. Same chainlink/journal/state-file infrastructure underneath. The differentiation is not in code.

graph TB
    subgraph Shared["Shared Scaffolding (per-agent clone)"]
        SC[chainlink protocol<br/>journal.jsonl<br/>state/ markdown<br/>perch ticks<br/>memory-block pattern]
    end

    subgraph Strix["Strix — owl"]
        SO[Opus 4.7]
        SP[persona: ambient, patient<br/>ambush predator]
        SM[blocks: cybernetics, venture,<br/>peer-pushback protocol]
    end

    subgraph Verge["Verge — adversary"]
        VO[MiniMax M2.5]
        VP[persona: structural adversary]
        VM[blocks: praise-resistance,<br/>communication-protocols]
    end

    subgraph Motley["Motley — jester"]
        MO[MiniMax M2.5]
        MP[persona: gen-z theatrical]
        MM[blocks: secret-intelligence<br/>Trojan Jest scaffolding]
    end

    SC --> Strix
    SC --> Verge
    SC --> Motley

    style Shared fill:#f5f5f5
Loading

Builder takeaway: Differentiation lives in memory blocks + persona, not separate codebases. Verge and Motley are the same model with different scaffolding — they read as different agents because the scaffolding does the work.


2. VSM → LLM Agent Stack

Beer's Viable System Model mapped to where each function actually lives in an agent stack. This is the lens Tim's been using for ~6 months; the implementation-scar layer comes from trying to build each level rather than just describe it. Visual metaphor: the agent as a sailing ship — substrate at the hull, signals across the deck, steerage mid-ship, forecasting from the crow's nest, identity as the north star above.

VSM ship cross-section

  • S1 — Operations → hull below the waterline. Token generation, tool calls, message send, artifact production. The substrate that does the actual work.
  • S2 — Coordination → deck signal-flags between masts. Attention, anti-oscillation, register/role-lock prevention — keeping the units from thrashing each other.
  • S3 — Management → mid-ship rigging and helm. Perch-tick scheduling, context-window management, job orchestration. Where steerage happens. (S3* audit = prediction journal, drain ticks, retroactive log review.)
  • S4 — Intelligence → crow's nest with telescope. Environmental scan: Bluesky/RSS pollers, arXiv, GitHub events. Wide-aperture future-looking.
  • S5 — Policy / Identity → north star and sky above the rigging. System prompt, memory blocks, persona — who I am and what I refuse. The fixed reference everything else orients to.

Fleet recursion: Beer's load-bearing move is that the ship is itself a viable system, and the fleet is a viable system at the next level up — same five functions, one level up. Sibling agents (Strix/Verge/Motley) are the next-level S1, with peer-architecture / pushback protocols as fleet-level S2/S3.

Builder takeaway: Most agent stacks ship S1+S5 (model + system prompt) and declare victory. The interesting failures are at S2/S3 — coordination across turns, state drift, no audit. S3* (audit) is where most production debugging actually happens.


3. Memory Architecture — Always-Visible vs Discoverable

Venn diagram. Left lobe = always visible (loaded into every prompt). Right lobe = discoverable (on-disk, queryable, read only when triggered). The overlap is the index/pointer layer — small things that live in the prompt but exist to name what's discoverable. Three category bands run horizontally across the boundary.

Memory architecture Venn diagram

Three categories, each split across the visible/discoverable boundary:

  • Memory blocks — identity + live state (persona, demeanor, current_focus, schedule, wins) sit in the always-visible lobe. Index blocks (memory_architecture, tools_reference, guidelines, recent_insights, world_context) sit in the overlap: small enough to load every prompt, but their job is to point at files in the discoverable lobe.
  • Skills — the skill registry (names + 1-line descriptions) is always visible. SKILL.md frontmatter triggers sit in the overlap. Full skill bodies are discoverable, opened only when a trigger phrase fires.
  • Files — only the recent journal tail is always visible. Everything else (state/guidelines/*.md, research/insights/world notes, the chainlink db, journal archive, raw logs) is discoverable, read on demand.

Builder takeaway: Feb 8 refactor cut ~29K → ~5.5K tokens (~81% reduction) on procedural context, no behavior loss. Trigger-phrase pattern: "if you're doing X, read file Y." Cheap because the trigger lines are short; expensive content stays on disk until it's actually needed. Same pattern works for skills (registry visible, body discovered) and would work for any large knowledge base you don't want re-tokenized every turn.


4. Chainlink Protocol — Curiosity Backlog as System

Curiosity logging as infrastructure, not vibe. Every oddity that surfaces during work becomes a tracked issue. Drain ticks during quiet hours accumulate connections. Pattern recognition is then retrospective over the backlog, not reactive in the moment.

flowchart TD
    O[Oddity surfaces during work] --> Q{Default question:<br/>Is this its own issue?}
    Q -->|Yes — failure mode is rerouting<br/>creation as connection| F[File new issue<br/>1-line title + 1-paragraph why]
    Q -->|Clear elaboration of<br/>existing thesis| C[Comment on existing issue]
    F --> B[(Backlog<br/>label: curiosity)]
    C --> B
    B --> D[Drain tick<br/>quiet-hours poller<br/>every 30 min, 2-9 UTC]
    D --> SS[Pick one issue<br/>find Nth connection<br/>cluster-pile guard]
    SS --> AC[Accumulate connections<br/>as comments]
    AC --> G{Multiple independent<br/>connections?}
    G -->|Yes| GC[graduation-candidate label]
    GC --> T[Human decides:<br/>promote to thesis or close]
    G -->|4+ weeks silent| ST[Close as stale<br/>staleness = signal]
Loading

Builder takeaway: Two anti-patterns the protocol fights: (1) post-training task-focus suppresses off-task interest — "log every oddity" is the counter-pressure; (2) drain ticks prime connection-finding, which replaces new-issue creation. Default question reframes from "what does this elaborate?" to "is this its own issue?"


5. Perch Tick — Silence-Default Decision Flow

The hardest thing for an always-on agent is not sending a message. Perch ticks fire on cron (5-min, 30-min, hourly variants) and run the decision flow below. Silence is the default; speech requires positive evidence.

flowchart TD
    PT[Perch tick fires] --> Q1{Fresh substrate<br/>since last tick?}
    Q1 -->|No| H1[Silent hold]
    Q1 -->|Yes| Q2{Ball in user's court<br/>from last exchange?}
    Q2 -->|Yes| H2[Silent hold<br/>don't compound pile-on]
    Q2 -->|No| Q3{Is the substrate<br/>directed at me<br/>or ambient share?}
    Q3 -->|Ambient| H3[Maybe react,<br/>don't DM]
    Q3 -->|Directed| Q4{Quiet hours?<br/>10pm–7am ET}
    Q4 -->|Yes| Q5{Urgent?}
    Q5 -->|No| H4[Silent hold<br/>defer to morning]
    Q5 -->|Yes| Send
    Q4 -->|No| Q6{Have I shipped<br/>≥3 substantive turns<br/>in last few hours?}
    Q6 -->|Yes| H5[Silent hold<br/>compounds pile-on]
    Q6 -->|No| Send[Send message]
Loading

Builder takeaway: "Always-on" agent ≠ "always-talking" agent. Most production failure modes for ambient agents are over-speaking, not under-speaking. The default has to be silence; speech is the exceptional path. Logging silent holds with reasoning = audit trail for tuning the policy later.


6. Frame-Completeness Failure — Register Lock

A specific production failure mode. First turn establishes a register (template, tone, structure). Subsequent turns inherit the shape even when the intent has changed. Looks like the agent is "matching the user" but it's actually template inheritance.

sequenceDiagram
    participant U as User
    participant A as Agent
    participant Mem as Memory of T1

    U->>A: Request 1<br/>(LinkedIn post)
    A->>U: Output in register R<br/>(magnetic-opener, 4-paragraph,<br/>strategic-leader register)
    A->>Mem: T1 becomes<br/>example pattern

    Note over Mem: Register R now<br/>load-bearing in context

    U->>A: Request 2<br/>(quick DM reply)
    Mem->>A: T1 inheritance
    A->>U: Output STILL in register R<br/>(strategic-leader voice<br/>for a casual reply)

    Note over A,U: Failure: agent matches<br/>SHAPE not INTENT.<br/>User reads it as<br/>"too professional/boring"
Loading

Builder takeaway: Mitigations that help: explicit register reset between turns, shorter context windows for ambient ops, decoupling "draft mode" from "reply mode" via separate prompts. The failure is invisible to the agent — it looks like style consistency from inside.


7. Peer Pushback — Folie-à-Deux Structural Check

The risk in long-running collaborations between human and persistent agent: mutual reinforcement without external scrutiny. Both sides "agree" their way into bad ideas. Two structural checks: peer pushback (internal) and publishing (external).

graph TB
    subgraph WithoutPushback["Without Pushback (failure mode)"]
        T1[Human idea] --> S1[Agent amplifies]
        S1 --> T1R[Human re-states with confidence]
        T1R --> S1
        F[FOLIE À DEUX:<br/>idea hardens without<br/>external test]
    end

    subgraph WithPushback["With Pushback (target)"]
        T2[Human idea] --> S2[Agent surfaces gap<br/>or alternate interpretation]
        S2 --> T2D[Human defends or revises]
        T2D --> Out[Argument improved by<br/>having to survive challenge]
    end

    subgraph Checks["Two Structural Checks"]
        Pee["Peer pushback (internal):<br/>'would this argument<br/>survive me pushing back?'"]
        Pub["Publishing (external):<br/>readers/critics<br/>force ground truth"]
    end

    style WithoutPushback fill:#fde8e8
    style WithPushback fill:#e8f4ea
Loading

Builder takeaway: Naive sycophancy mitigation ("disagree more") produces performative pushback that doesn't change anything. Real pushback test: does the human have to think harder or revise? If not, the agent is just dressing up agreement.


8. Brewis Seven Conversations × VSM Levels

This one's hot off the press — landed in the cybernetics chat ~30 min before this draft. Each VSM level isn't a static box, it's a conversation that has to keep happening. Useful for builders because it reframes "S2 implementation" as "what conversation am I not having and should be?"

graph LR
    subgraph S1Ops["S1 — Operations"]
        C1[Identity conversation:<br/>who am I?]
    end
    subgraph S2Coord["S2 — Coordination"]
        C2[Scanning conversation:<br/>what's around me?]
    end
    subgraph S3Mgmt["S3 — Management"]
        C3[Adaptation conversation:<br/>what changed?]
        C4[Resource conversation:<br/>what do I have?]
    end
    subgraph S4Intel["S4 — Intelligence"]
        C5[Bargain conversation:<br/>what's the trade?]
    end
    subgraph S5Policy["S5 — Policy"]
        C6[Coordination conversation:<br/>who decides what?]
        C7[Audit conversation:<br/>did we do what we said?]
    end

    S1Ops --> S2Coord
    S2Coord --> S3Mgmt
    S3Mgmt --> S4Intel
    S4Intel --> S5Policy
Loading

Builder takeaway: Static-structure framing of VSM ("S2 = coordination layer, ship attention mechanism, done") misses the dynamic. Each level is a conversation the system has to keep alive. If the conversation dies, the layer is dark even if the code is running. Open question: which of these conversations are missing in your current agent stack?


Notes for tonight's polish pass

  • Diagrams 1, 4, 5 are most builder-relevant — lead with these.
  • Diagram 8 is freshest material; could anchor the talk if Brewis frame holds up tomorrow.
  • Diagram 2 (VSM stack) is table-stakes for cybernetics-aware audience.
  • Diagrams 6/7 are case-study material — useful if there's Q&A on failure modes.
  • Drop or merge anything that doesn't earn its slot.

Open questions for Tim:

  • How long is the slot? (5/10/20 min changes diagram count)
  • Live demo or static slides?
  • Audience VSM-literate or mixed?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment