Molty Memory Observer

Status: PLANNED Created: 2026-02-16

1. Problem Statement

Molty dispatches tasks and delivers notifications but is blind to what actually happens during runs and discussions. Claude generates rich artifacts (JSONL ACKs with step notes, context summaries, output files) but nobody reads or aggregates them. Molty cannot answer "what happened today?" or "what's the status of project X?" because there's no memory layer between raw events and conversational knowledge.

2. Solution Overview

Add a post-completion observer to simple_runner.js that triggers after every RUN_COMPLETE and DISCUSS_COMPLETE. The observer collects the raw bundle (task description, JSONL ACK notes, output file, context summary) and sends it to Gemini Flash for extraction. The AI distills what happened, what decisions were made, what failed, and what to remember — then appends it to Molty's memory (both as a daily markdown digest and indexed into clawdbot's semantic memory via memory_search). A new /checkin command in reply_handler.js lets Swap ask Molty for a conversational summary of recent activity.

3. Comparison Table

Aspect	Current State	After Implementation
What Molty knows about runs	Just "RUN_COMPLETE" status	Full summary: what changed, decisions, blockers, outcomes
What Molty knows about discussions	Nothing after DISCUSS_COMPLETE	Key conclusions, decisions, action items
"What happened today?"	Manual — Swap reads Telegram history	`/checkin` gives conversational daily summary
"Status of project X?"	Check GitHub/dashboard manually	Molty recalls recent activity per project
Cross-run context	Each run starts fresh	Molty can reference prior work in conversations
Memory storage	4 hand-written diary entries	Auto-growing daily digests + searchable memory

4. User Flow

Claude completes a /run task — writes RUN_COMPLETE ACK to JSONL
simple_runner.js detects RUN_COMPLETE, collects the raw bundle (task.json, JSONL notes, output.md, context.md)
simple_runner.js calls Gemini Flash with the bundle + extraction prompt
AI returns structured memory: project, outcome, decisions, blockers, what to remember
Memory gets appended to memory/events/YYYY-MM-DD.jsonl and written to memory/diary/YYYY-MM-DD.md
Memory also gets indexed into clawdbot's semantic memory store for future conversations
Swap types /checkin in any Telegram thread
Molty reads today's diary + recent events, synthesizes a conversational status update

5. Scope

P0 — Must Have

Event Capture (simple_runner.js):

After RUN_COMPLETE: collect task description, all JSONL step notes, output file content, context summary
After DISCUSS_COMPLETE: collect discussion topic, all turn files, final output summary
Bundle all collected data into a single prompt payload

AI Extraction (Gemini Flash):

Send bundle to Gemini Flash with extraction prompt
Prompt asks for: project name, one-line outcome, key decisions made, blockers hit, files/features changed, what Molty should remember
Parse structured response (JSON format)

Memory Storage:

Append extracted event to memory/events/YYYY-MM-DD.jsonl (one event per line)
Append human-readable entry to memory/diary/YYYY-MM-DD.md (daily diary, one section per event)
Index into clawdbot semantic memory using memory_search-compatible format

Checkin Command (reply_handler.js):

/checkin — reads today's diary, summarizes via Gemini Flash, sends conversational response to thread
/checkin <project> — filters events by project name, gives project-specific status

P1 — Should Have

Weekly digest aggregation (summarize the week every Sunday)
Memory pruning — archive events older than 30 days to monthly summaries
Dashboard widget showing recent memory entries

Out of Scope (v2+)

Mid-run observation (watching RUNNING ACKs in real-time)
Tmux scrollback capture and analysis
Cross-project dependency tracking
Proactive notifications ("you haven't touched project X in 5 days")

6. Risks & Mitigations

Risk	Impact	Mitigation
Gemini Flash API failures block completion notifications	High — delays Telegram delivery	Fire-and-forget: memory extraction runs async, doesn't block ACK delivery
Extraction quality inconsistent	Med — garbage memories	Structured prompt with strict JSON schema; validate output before storing
Memory files grow unbounded	Low — disk space	Daily files stay small (5-15 events/day); weekly digest + monthly archival
Clawdbot semantic memory index gets stale	Med — Molty forgets	Re-index diary files on gateway restart; memory entries include timestamps
Bundle too large for Gemini Flash context	Low — extraction fails	Cap bundle at 8K tokens; truncate JSONL to last 20 step notes
simple_runner.js restart during extraction	Low — lost memory	Write raw bundle to pending/ dir first; process on next startup if missed

7. Success Criteria Checklist

Event Capture

RUN_COMPLETE triggers memory extraction with correct bundle
DISCUSS_COMPLETE triggers memory extraction with correct bundle
Bundle includes: task description, step notes, output file, context summary
Extraction is fire-and-forget (does not block ACK delivery to Telegram)

AI Extraction

Gemini Flash call returns structured JSON with required fields
Extraction prompt produces consistent, useful summaries
Failed extractions are logged but don't crash simple_runner
Bundle size is capped to fit within Gemini Flash context window

Memory Storage

Events appended to memory/events/YYYY-MM-DD.jsonl
Human-readable diary entry written to memory/diary/YYYY-MM-DD.md
Memory indexed into clawdbot semantic memory store
Memory entries include: timestamp, project, runId, outcome, decisions, blockers

Checkin Command

/checkin returns today's activity summary in the requesting thread
/checkin <project> filters to project-specific activity
Response is conversational and under 3000 chars (Telegram-friendly)
Empty days return "Nothing completed today" gracefully

8. End-to-End Test List

E2E-1: Run a /run echo hello task → verify memory event appears in today's JSONL + diary
E2E-2: Complete a /discuss session → verify discuss memory event appears in today's JSONL + diary
E2E-3: Run /checkin after a completed run → Molty responds with accurate summary of the run
E2E-4: Run /checkin moltbot-web after a run on that project → response is filtered to that project only
E2E-5: Run /checkin with no activity today → Molty responds gracefully ("nothing completed today")
E2E-6: Trigger RUN_COMPLETE with Gemini Flash API down → ACK delivery still works, error logged
E2E-7: Complete 3 runs in one day → diary has 3 sections, /checkin summarizes all 3
E2E-8: Ask Molty in conversation "what did we do on project X recently?" → semantic memory search returns relevant events

9. Manual Testing Checklist

Smoke Test (2 min)

Send /run echo test and wait for completion — memory file created
Check memory/events/ has today's JSONL file with at least one entry
Check memory/diary/ has today's markdown file with readable content
Send /checkin — get a response (not an error)

Feature Test (5 min)

RUN_COMPLETE memory entry has correct project name and outcome
DISCUSS_COMPLETE memory entry captures discussion conclusions
/checkin response mentions the correct tasks completed today
/checkin <project> only mentions events for that project
Memory entry includes decisions and blockers from the run

Regression Test (2 min)

simple_runner heartbeat still running after memory extraction
ACK delivery timing unchanged (no noticeable delay from async extraction)
reply_handler still handles /run and /discuss commands normally
Gateway logs show no errors from memory indexing
Existing Molty conversations work — no regression from new memory tools

Implementation Order

Each step is a separate /run task:

Add memory extraction to simple_runner.js — After RUN_COMPLETE/DISCUSS_COMPLETE, collect bundle, call Gemini Flash, write to events JSONL + diary markdown
Add /checkin command to reply_handler.js — Parse command, read diary, call Gemini Flash for summary, send response to thread
Index memories into clawdbot semantic store — After writing diary entry, also write to clawdbot-compatible memory location for semantic search
Test with real runs — Execute test tasks, verify full pipeline end-to-end

swapp1990/molty-memory-observer-plan.md

Select an option

No results found