Think of this like explaining how a really good assistant keeps notes across hundreds of conversations - so they never forget what you told them last week.
Last Updated: 2026-02-04
Every time you start a new conversation with an AI, it's like meeting someone with total amnesia. They have no idea what you discussed yesterday, what you're working on, or what mistakes they made before.
This system solves that. It gives the AI a persistent memory that:
- Remembers past conversations and learnings
- Finds relevant past experiences when you ask new questions
- Gets smarter over time by learning from mistakes
Think of it like how your brain works:
flowchart TD
subgraph HOT["Fast Memory (like working memory)"]
H1["Recent stuff - last few hours"]
H2["Instantly accessible"]
H3["Small but fast"]
end
subgraph WARM["Medium Memory (like short-term memory)"]
W1["Last few weeks of work"]
W2["Searchable by meaning"]
W3["344,000+ memories stored"]
end
subgraph COLD["Deep Memory (like long-term memory)"]
C1["Everything ever learned"]
C2["Takes longer to search"]
C3["Most accurate when needed"]
end
QUERY["Your question"] --> HOT
HOT -->|"not found"| WARM
WARM -->|"need more accuracy"| COLD
COLD --> ANSWER["Best answer"]
| When you ask... | The system checks... | Speed |
|---|---|---|
| "What did we just do?" | Fast Memory | Instant |
| "How did we fix that bug last week?" | Medium Memory | 1-2 seconds |
| "Find the best approach based on all past work" | Deep Memory + Reranking | 3-5 seconds |
Imagine you're searching through thousands of notes:
Method 1: Exact Word Match (BM25)
- Like Ctrl+F on steroids
- Great for: Error codes, function names, specific terms
- Example: Searching "error 0x8007001F" finds exactly that
Method 2: Meaning-Based Search (Dense/Semantic)
- Understands what you MEAN, not just what you typed
- Great for: "How do I log in?" finds docs about "authentication"
- Example: "spawn agents" also finds "create workers" and "launch processes"
This System Uses BOTH - runs them simultaneously, then combines results.
flowchart LR
Q["Your search query"]
Q --> BM25["Exact Match Search"]
Q --> DENSE["Meaning Search"]
BM25 --> COMBINE["Combine Results<br/>(RRF algorithm)"]
DENSE --> COMBINE
COMBINE --> RERANK["AI Re-scorer<br/>picks the best"]
RERANK --> RESULTS["Top 10 Results"]
RRF (Reciprocal Rank Fusion) - Fancy name for: "Take the top results from both searches and smartly merge them so you get the best of both worlds."
What it is: A small database file on your computer
Analogy: Like a personal journal that's always with you
Stores:
- Every task completed (with success/failure notes)
- Learnings from mistakes
- Patterns noticed over time
Why it matters: Instant access, no internet needed, your data stays private
What it is: A cloud service that stores memories as mathematical representations
Analogy: Like a librarian who understands meaning, not just keywords
The Technical Bit (Simplified):
- Every piece of text gets converted to a list of 768 numbers
- These numbers capture the "meaning" of the text
- Similar meanings = similar numbers = found together in searches
Example:
"How to authenticate users"
becomes → [0.23, -0.45, 0.12, ... 768 numbers ...]
"Login system implementation"
becomes → [0.21, -0.43, 0.14, ... similar numbers! ...]
Because the numbers are similar, searching for one finds the other.
What it is: A full note-taking system (like Notion or Obsidian)
Analogy: Like a personal wiki with everything organized
Stores:
- Documentation and how-to guides
- Important decisions and why they were made
- Long-form learnings and reflections
The system saves memories automatically at key moments:
flowchart TD
START["Session Starts"] -->|"Load memories"| WORK["You work with the AI"]
WORK -->|"Every 5 min or 25 actions"| CHECKPOINT["Quick checkpoint saved"]
CHECKPOINT --> WORK
WORK --> STOP["Session Ends"]
STOP -->|"Full save"| SAVE["Save everything:<br/>- What was done<br/>- What was learned<br/>- Connections to past work"]
What it is: A system that runs maintenance tasks automatically
Analogy: Like a house that cleans itself when you're not looking
How it works:
- When you start a session, it checks: "Has daily cleanup run? Has weekly deduplication run?"
- If something is overdue, it runs in the background
- You never have to think about it
Beyond just storing memories, the system connects them:
flowchart TD
E1["Episode: Set up auth"] -->|"led to"| E2["Episode: Add JWT tokens"]
E1 -->|"mentions"| C1["Concept: authentication"]
E2 -->|"mentions"| C2["Concept: JWT"]
C1 <-->|"related"| C2
E3["Episode: First auth attempt<br/>(3 months ago)"] -->|"evolved into"| E1
What this enables:
- "What led to this decision?" - Follow the chain backwards
- "What's related to authentication?" - Find all connected topics
- "How has our approach evolved?" - See the history
What are embeddings?
Think of it like this: You can't search a library by "feel" or "meaning" - books are just physical objects. But if you could convert every book into a special code that captures its meaning, then books about similar topics would have similar codes.
That's exactly what embedding models do with text.
The Models Used:
| Purpose | Model | Why This One |
|---|---|---|
| Fast local search | MiniLM (small) | Speed over quality for recent items |
| Main search | Gemini (medium) | Best accuracy vs. speed tradeoff |
| Images + Text | Qwen3-VL (multimodal) | Can understand screenshots and diagrams |
Multimodal = Can handle multiple types of content (text, images, maybe video)
| Term | Plain English |
|---|---|
| Vector | A list of numbers representing meaning |
| Embedding | The process of converting text to vectors |
| Semantic Search | Finding by meaning, not exact words |
| BM25 | A formula for scoring exact word matches |
| HNSW | A clever way to organize vectors so search is fast |
| Cosine Similarity | Measuring how similar two vectors are (1.0 = identical, 0 = unrelated) |
| Reranking | Having a smarter AI re-score results for accuracy |
| Context Mesh | A web of connections between memories |
The Problem: With 344,000+ memories, checking each one would take forever.
The Solution: HNSW (Hierarchical Navigable Small World)
Analogy: Imagine finding a friend in a city of millions:
- Without HNSW: Knock on every door (slow!)
- With HNSW:
- Start at a "hub person" who knows many people
- Ask "Do you know anyone like [description]?"
- They point you to someone closer to your target
- Repeat until you find them
The "hierarchy" means there are different levels - city-wide connectors, neighborhood connectors, block connectors - so you can zoom in quickly.
Result: Instead of checking 344,000 items, you check maybe 100-200 and find equally good results.
What it is: A second, smarter AI that re-evaluates search results
Why bother? The initial search is fast but not always perfectly ordered. The reranker:
- Looks at each result more carefully
- Considers the full context of your question
- Reorders them so the best answer is truly #1
Analogy:
- First search = Quickly grabbing 20 potentially relevant books
- Reranking = Actually reading the intros to put them in order of usefulness
Not all old information is outdated. The system is smart about this:
| Type of Memory | Decay |
|---|---|
| Universal truths ("2+2=4") | Never fades |
| Recent context ("we decided X yesterday") | Stays strong |
| Old context ("3 months ago we were exploring Y") | Gradually fades unless reinforced |
How it works: Each memory has a "recency score" that slightly lowers its ranking over time - unless it keeps being accessed or is flagged as timeless.
The system doesn't just store memories - it learns from them:
flowchart TD
A["Something happens"] --> B["Record what happened"]
B --> C["Look for patterns<br/>(Same mistake 3+ times?)"]
C -->|"Pattern found"| D["Research best practices"]
D --> E["Suggest a fix"]
E --> F["Implement the fix"]
F --> G["Track if it worked"]
G -->|"Didn't work"| C
G -->|"Worked!"| H["Store as lesson learned"]
| Without This System | With This System |
|---|---|
| Every conversation starts fresh | Builds on all past work |
| Same mistakes repeated | Learns from failures |
| Can't find past solutions | Instant recall of relevant history |
| "What did we discuss?" = lost | Full searchable memory |
| Generic responses | Personalized to your context |
- 344,000+ memories stored and searchable
- 985 episodes of past work tracked
- 1,654 knowledge base documents
- <1 second for most searches
- Local-first - works offline, your data stays yours
This is the same system described in the technical documentation, just explained for humans who want to understand without becoming database engineers.