Skip to content

Instantly share code, notes, and snippets.

@swapp1990
Created February 16, 2026 15:50
Show Gist options
  • Select an option

  • Save swapp1990/fc9b73b226daa22e4233e8940b68aac7 to your computer and use it in GitHub Desktop.

Select an option

Save swapp1990/fc9b73b226daa22e4233e8940b68aac7 to your computer and use it in GitHub Desktop.
Molty Memory Observer Plan — post-completion memory extraction for runs and discussions

Molty Memory Observer

Status: PLANNED Created: 2026-02-16

1. Problem Statement

Molty dispatches tasks and delivers notifications but is blind to what actually happens during runs and discussions. Claude generates rich artifacts (JSONL ACKs with step notes, context summaries, output files) but nobody reads or aggregates them. Molty cannot answer "what happened today?" or "what's the status of project X?" because there's no memory layer between raw events and conversational knowledge.

2. Solution Overview

Add a post-completion observer to simple_runner.js that triggers after every RUN_COMPLETE and DISCUSS_COMPLETE. The observer collects the raw bundle (task description, JSONL ACK notes, output file, context summary) and sends it to Gemini Flash for extraction. The AI distills what happened, what decisions were made, what failed, and what to remember — then appends it to Molty's memory (both as a daily markdown digest and indexed into clawdbot's semantic memory via memory_search). A new /checkin command in reply_handler.js lets Swap ask Molty for a conversational summary of recent activity.

3. Comparison Table

Aspect Current State After Implementation
What Molty knows about runs Just "RUN_COMPLETE" status Full summary: what changed, decisions, blockers, outcomes
What Molty knows about discussions Nothing after DISCUSS_COMPLETE Key conclusions, decisions, action items
"What happened today?" Manual — Swap reads Telegram history /checkin gives conversational daily summary
"Status of project X?" Check GitHub/dashboard manually Molty recalls recent activity per project
Cross-run context Each run starts fresh Molty can reference prior work in conversations
Memory storage 4 hand-written diary entries Auto-growing daily digests + searchable memory

4. User Flow

  1. Claude completes a /run task — writes RUN_COMPLETE ACK to JSONL
  2. simple_runner.js detects RUN_COMPLETE, collects the raw bundle (task.json, JSONL notes, output.md, context.md)
  3. simple_runner.js calls Gemini Flash with the bundle + extraction prompt
  4. AI returns structured memory: project, outcome, decisions, blockers, what to remember
  5. Memory gets appended to memory/events/YYYY-MM-DD.jsonl and written to memory/diary/YYYY-MM-DD.md
  6. Memory also gets indexed into clawdbot's semantic memory store for future conversations
  7. Swap types /checkin in any Telegram thread
  8. Molty reads today's diary + recent events, synthesizes a conversational status update

5. Scope

P0 — Must Have

Event Capture (simple_runner.js):

  • After RUN_COMPLETE: collect task description, all JSONL step notes, output file content, context summary
  • After DISCUSS_COMPLETE: collect discussion topic, all turn files, final output summary
  • Bundle all collected data into a single prompt payload

AI Extraction (Gemini Flash):

  • Send bundle to Gemini Flash with extraction prompt
  • Prompt asks for: project name, one-line outcome, key decisions made, blockers hit, files/features changed, what Molty should remember
  • Parse structured response (JSON format)

Memory Storage:

  • Append extracted event to memory/events/YYYY-MM-DD.jsonl (one event per line)
  • Append human-readable entry to memory/diary/YYYY-MM-DD.md (daily diary, one section per event)
  • Index into clawdbot semantic memory using memory_search-compatible format

Checkin Command (reply_handler.js):

  • /checkin — reads today's diary, summarizes via Gemini Flash, sends conversational response to thread
  • /checkin <project> — filters events by project name, gives project-specific status

P1 — Should Have

  • Weekly digest aggregation (summarize the week every Sunday)
  • Memory pruning — archive events older than 30 days to monthly summaries
  • Dashboard widget showing recent memory entries

Out of Scope (v2+)

  • Mid-run observation (watching RUNNING ACKs in real-time)
  • Tmux scrollback capture and analysis
  • Cross-project dependency tracking
  • Proactive notifications ("you haven't touched project X in 5 days")

6. Risks & Mitigations

Risk Impact Mitigation
Gemini Flash API failures block completion notifications High — delays Telegram delivery Fire-and-forget: memory extraction runs async, doesn't block ACK delivery
Extraction quality inconsistent Med — garbage memories Structured prompt with strict JSON schema; validate output before storing
Memory files grow unbounded Low — disk space Daily files stay small (5-15 events/day); weekly digest + monthly archival
Clawdbot semantic memory index gets stale Med — Molty forgets Re-index diary files on gateway restart; memory entries include timestamps
Bundle too large for Gemini Flash context Low — extraction fails Cap bundle at 8K tokens; truncate JSONL to last 20 step notes
simple_runner.js restart during extraction Low — lost memory Write raw bundle to pending/ dir first; process on next startup if missed

7. Success Criteria Checklist

Event Capture

  • RUN_COMPLETE triggers memory extraction with correct bundle
  • DISCUSS_COMPLETE triggers memory extraction with correct bundle
  • Bundle includes: task description, step notes, output file, context summary
  • Extraction is fire-and-forget (does not block ACK delivery to Telegram)

AI Extraction

  • Gemini Flash call returns structured JSON with required fields
  • Extraction prompt produces consistent, useful summaries
  • Failed extractions are logged but don't crash simple_runner
  • Bundle size is capped to fit within Gemini Flash context window

Memory Storage

  • Events appended to memory/events/YYYY-MM-DD.jsonl
  • Human-readable diary entry written to memory/diary/YYYY-MM-DD.md
  • Memory indexed into clawdbot semantic memory store
  • Memory entries include: timestamp, project, runId, outcome, decisions, blockers

Checkin Command

  • /checkin returns today's activity summary in the requesting thread
  • /checkin <project> filters to project-specific activity
  • Response is conversational and under 3000 chars (Telegram-friendly)
  • Empty days return "Nothing completed today" gracefully

8. End-to-End Test List

  • E2E-1: Run a /run echo hello task → verify memory event appears in today's JSONL + diary
  • E2E-2: Complete a /discuss session → verify discuss memory event appears in today's JSONL + diary
  • E2E-3: Run /checkin after a completed run → Molty responds with accurate summary of the run
  • E2E-4: Run /checkin moltbot-web after a run on that project → response is filtered to that project only
  • E2E-5: Run /checkin with no activity today → Molty responds gracefully ("nothing completed today")
  • E2E-6: Trigger RUN_COMPLETE with Gemini Flash API down → ACK delivery still works, error logged
  • E2E-7: Complete 3 runs in one day → diary has 3 sections, /checkin summarizes all 3
  • E2E-8: Ask Molty in conversation "what did we do on project X recently?" → semantic memory search returns relevant events

9. Manual Testing Checklist

Smoke Test (2 min)

  • Send /run echo test and wait for completion — memory file created
  • Check memory/events/ has today's JSONL file with at least one entry
  • Check memory/diary/ has today's markdown file with readable content
  • Send /checkin — get a response (not an error)

Feature Test (5 min)

  • RUN_COMPLETE memory entry has correct project name and outcome
  • DISCUSS_COMPLETE memory entry captures discussion conclusions
  • /checkin response mentions the correct tasks completed today
  • /checkin <project> only mentions events for that project
  • Memory entry includes decisions and blockers from the run

Regression Test (2 min)

  • simple_runner heartbeat still running after memory extraction
  • ACK delivery timing unchanged (no noticeable delay from async extraction)
  • reply_handler still handles /run and /discuss commands normally
  • Gateway logs show no errors from memory indexing
  • Existing Molty conversations work — no regression from new memory tools

Implementation Order

Each step is a separate /run task:

  1. Add memory extraction to simple_runner.js — After RUN_COMPLETE/DISCUSS_COMPLETE, collect bundle, call Gemini Flash, write to events JSONL + diary markdown
  2. Add /checkin command to reply_handler.js — Parse command, read diary, call Gemini Flash for summary, send response to thread
  3. Index memories into clawdbot semantic store — After writing diary entry, also write to clawdbot-compatible memory location for semantic search
  4. Test with real runs — Execute test tasks, verify full pipeline end-to-end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment