Skip to content

Instantly share code, notes, and snippets.

@adambkovacs
Created February 4, 2026 22:55
Show Gist options
  • Select an option

  • Save adambkovacs/9e6dcfa403143d72ac5cd7caef2862d6 to your computer and use it in GitHub Desktop.

Select an option

Save adambkovacs/9e6dcfa403143d72ac5cd7caef2862d6 to your computer and use it in GitHub Desktop.
How Claude Remembers Things - Memory Architecture Explained for Non-Engineers

How Claude Remembers Things: A Memory System Explained

Think of this like explaining how a really good assistant keeps notes across hundreds of conversations - so they never forget what you told them last week.

Last Updated: 2026-02-04


The Problem: AI Amnesia

Every time you start a new conversation with an AI, it's like meeting someone with total amnesia. They have no idea what you discussed yesterday, what you're working on, or what mistakes they made before.

This system solves that. It gives the AI a persistent memory that:

  • Remembers past conversations and learnings
  • Finds relevant past experiences when you ask new questions
  • Gets smarter over time by learning from mistakes

The Big Picture: Three Types of Memory

Think of it like how your brain works:

flowchart TD
    subgraph HOT["Fast Memory (like working memory)"]
        H1["Recent stuff - last few hours"]
        H2["Instantly accessible"]
        H3["Small but fast"]
    end

    subgraph WARM["Medium Memory (like short-term memory)"]
        W1["Last few weeks of work"]
        W2["Searchable by meaning"]
        W3["344,000+ memories stored"]
    end

    subgraph COLD["Deep Memory (like long-term memory)"]
        C1["Everything ever learned"]
        C2["Takes longer to search"]
        C3["Most accurate when needed"]
    end

    QUERY["Your question"] --> HOT
    HOT -->|"not found"| WARM
    WARM -->|"need more accuracy"| COLD
    COLD --> ANSWER["Best answer"]
Loading

What This Means in Practice

When you ask... The system checks... Speed
"What did we just do?" Fast Memory Instant
"How did we fix that bug last week?" Medium Memory 1-2 seconds
"Find the best approach based on all past work" Deep Memory + Reranking 3-5 seconds

How Searching Works: Finding a Needle in a Haystack

The Two Ways to Search

Imagine you're searching through thousands of notes:

Method 1: Exact Word Match (BM25)

  • Like Ctrl+F on steroids
  • Great for: Error codes, function names, specific terms
  • Example: Searching "error 0x8007001F" finds exactly that

Method 2: Meaning-Based Search (Dense/Semantic)

  • Understands what you MEAN, not just what you typed
  • Great for: "How do I log in?" finds docs about "authentication"
  • Example: "spawn agents" also finds "create workers" and "launch processes"

This System Uses BOTH - runs them simultaneously, then combines results.

The Magic of Hybrid Search

flowchart LR
    Q["Your search query"]
    Q --> BM25["Exact Match Search"]
    Q --> DENSE["Meaning Search"]
    BM25 --> COMBINE["Combine Results<br/>(RRF algorithm)"]
    DENSE --> COMBINE
    COMBINE --> RERANK["AI Re-scorer<br/>picks the best"]
    RERANK --> RESULTS["Top 10 Results"]
Loading

RRF (Reciprocal Rank Fusion) - Fancy name for: "Take the top results from both searches and smartly merge them so you get the best of both worlds."


The Memory Layers Explained

Layer 1: Local Database (AgentDB)

What it is: A small database file on your computer

Analogy: Like a personal journal that's always with you

Stores:

  • Every task completed (with success/failure notes)
  • Learnings from mistakes
  • Patterns noticed over time

Why it matters: Instant access, no internet needed, your data stays private

Layer 2: Vector Database (Qdrant)

What it is: A cloud service that stores memories as mathematical representations

Analogy: Like a librarian who understands meaning, not just keywords

The Technical Bit (Simplified):

  • Every piece of text gets converted to a list of 768 numbers
  • These numbers capture the "meaning" of the text
  • Similar meanings = similar numbers = found together in searches

Example:

"How to authenticate users"
   becomes → [0.23, -0.45, 0.12, ... 768 numbers ...]

"Login system implementation"
   becomes → [0.21, -0.43, 0.14, ... similar numbers! ...]

Because the numbers are similar, searching for one finds the other.

Layer 3: Knowledge Base (Cortex)

What it is: A full note-taking system (like Notion or Obsidian)

Analogy: Like a personal wiki with everything organized

Stores:

  • Documentation and how-to guides
  • Important decisions and why they were made
  • Long-form learnings and reflections

When Things Get Saved (Automatically)

The system saves memories automatically at key moments:

flowchart TD
    START["Session Starts"] -->|"Load memories"| WORK["You work with the AI"]
    WORK -->|"Every 5 min or 25 actions"| CHECKPOINT["Quick checkpoint saved"]
    CHECKPOINT --> WORK
    WORK --> STOP["Session Ends"]
    STOP -->|"Full save"| SAVE["Save everything:<br/>- What was done<br/>- What was learned<br/>- Connections to past work"]
Loading

The Lazy Scheduler

What it is: A system that runs maintenance tasks automatically

Analogy: Like a house that cleans itself when you're not looking

How it works:

  1. When you start a session, it checks: "Has daily cleanup run? Has weekly deduplication run?"
  2. If something is overdue, it runs in the background
  3. You never have to think about it

Making Connections: The Context Mesh

Beyond just storing memories, the system connects them:

flowchart TD
    E1["Episode: Set up auth"] -->|"led to"| E2["Episode: Add JWT tokens"]
    E1 -->|"mentions"| C1["Concept: authentication"]
    E2 -->|"mentions"| C2["Concept: JWT"]
    C1 <-->|"related"| C2
    E3["Episode: First auth attempt<br/>(3 months ago)"] -->|"evolved into"| E1
Loading

What this enables:

  • "What led to this decision?" - Follow the chain backwards
  • "What's related to authentication?" - Find all connected topics
  • "How has our approach evolved?" - See the history

The Embedding Models (The Brains Behind Search)

What are embeddings?

Think of it like this: You can't search a library by "feel" or "meaning" - books are just physical objects. But if you could convert every book into a special code that captures its meaning, then books about similar topics would have similar codes.

That's exactly what embedding models do with text.

The Models Used:

Purpose Model Why This One
Fast local search MiniLM (small) Speed over quality for recent items
Main search Gemini (medium) Best accuracy vs. speed tradeoff
Images + Text Qwen3-VL (multimodal) Can understand screenshots and diagrams

Multimodal = Can handle multiple types of content (text, images, maybe video)


Technical Terms Decoded

Term Plain English
Vector A list of numbers representing meaning
Embedding The process of converting text to vectors
Semantic Search Finding by meaning, not exact words
BM25 A formula for scoring exact word matches
HNSW A clever way to organize vectors so search is fast
Cosine Similarity Measuring how similar two vectors are (1.0 = identical, 0 = unrelated)
Reranking Having a smarter AI re-score results for accuracy
Context Mesh A web of connections between memories

HNSW: Why Search is Fast

The Problem: With 344,000+ memories, checking each one would take forever.

The Solution: HNSW (Hierarchical Navigable Small World)

Analogy: Imagine finding a friend in a city of millions:

  1. Without HNSW: Knock on every door (slow!)
  2. With HNSW:
    • Start at a "hub person" who knows many people
    • Ask "Do you know anyone like [description]?"
    • They point you to someone closer to your target
    • Repeat until you find them

The "hierarchy" means there are different levels - city-wide connectors, neighborhood connectors, block connectors - so you can zoom in quickly.

Result: Instead of checking 344,000 items, you check maybe 100-200 and find equally good results.


Reranking: The Quality Check

What it is: A second, smarter AI that re-evaluates search results

Why bother? The initial search is fast but not always perfectly ordered. The reranker:

  1. Looks at each result more carefully
  2. Considers the full context of your question
  3. Reorders them so the best answer is truly #1

Analogy:

  • First search = Quickly grabbing 20 potentially relevant books
  • Reranking = Actually reading the intros to put them in order of usefulness

Time Decay: Old Memories Fade (Intelligently)

Not all old information is outdated. The system is smart about this:

Type of Memory Decay
Universal truths ("2+2=4") Never fades
Recent context ("we decided X yesterday") Stays strong
Old context ("3 months ago we were exploring Y") Gradually fades unless reinforced

How it works: Each memory has a "recency score" that slightly lowers its ranking over time - unless it keeps being accessed or is flagged as timeless.


The Reflection Loop: Getting Smarter Over Time

The system doesn't just store memories - it learns from them:

flowchart TD
    A["Something happens"] --> B["Record what happened"]
    B --> C["Look for patterns<br/>(Same mistake 3+ times?)"]
    C -->|"Pattern found"| D["Research best practices"]
    D --> E["Suggest a fix"]
    E --> F["Implement the fix"]
    F --> G["Track if it worked"]
    G -->|"Didn't work"| C
    G -->|"Worked!"| H["Store as lesson learned"]
Loading

Summary: Why This Matters

Without This System With This System
Every conversation starts fresh Builds on all past work
Same mistakes repeated Learns from failures
Can't find past solutions Instant recall of relevant history
"What did we discuss?" = lost Full searchable memory
Generic responses Personalized to your context

Quick Numbers

  • 344,000+ memories stored and searchable
  • 985 episodes of past work tracked
  • 1,654 knowledge base documents
  • <1 second for most searches
  • Local-first - works offline, your data stays yours

This is the same system described in the technical documentation, just explained for humans who want to understand without becoming database engineers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment