Skip to content

Instantly share code, notes, and snippets.

@coderplay
Last active July 10, 2025 16:52
Show Gist options
  • Select an option

  • Save coderplay/2719d8d736b397ef5675f076a9982418 to your computer and use it in GitHub Desktop.

Select an option

Save coderplay/2719d8d736b397ef5675f076a9982418 to your computer and use it in GitHub Desktop.

Mem0 Architecture Documentation

Table of Contents

  1. Overview
  2. High-Level Architecture
  3. Core Components
  4. Data Flow
  5. Storage Architecture
  6. Memory Levels & Scoping
  7. Configuration System
  8. API Layers
  9. Deployment Options
  10. Performance Considerations

Overview

Mem0 is an intelligent memory layer for AI applications that provides personalized, persistent memory capabilities. It's designed as a modular, extensible system that can integrate with various LLMs, embedding models, and storage backends.

Key Features

  • Multi-level Memory Scoping: User, Agent, and Session level memories
  • Semantic Search: Vector-based memory retrieval
  • Change Tracking: Complete audit trail of memory operations
  • Graph Relationships: Entity relationship mapping (optional)
  • Multi-provider Support: 20+ LLM providers, 10+ vector stores, 14+ embedding models

High-Level Architecture

graph TB
    User["User Application"]
    
    subgraph CoreAPI["Mem0 Core API"]
        Memory["Memory Class<br/>(Sync API)"]
        AsyncMemory["AsyncMemory<br/>(Async API)"]
        Client["MemoryClient<br/>(Remote API)"]
    end
    
    Config["MemoryConfig<br/>Central Configuration"]
    
    subgraph Processing["AI Processing"]
        LLM["LLM Layer<br/>(20+ providers)"]
        Embeddings["Embedding Layer<br/>(14+ providers)"]
    end
    
    subgraph Storage["Storage Systems"]
        VectorDB["Vector Store<br/>(10+ providers)"]
        HistoryDB["History Database<br/>(SQLite)"]
        GraphDB["Graph Store<br/>(Neo4j/Memgraph)"]
    end
    
    Platform["Mem0 Platform<br/>(Hosted Service)"]
    
    User --> Memory
    User --> AsyncMemory
    User --> Client
    
    Memory --> Config
    AsyncMemory --> Config
    Client --> Platform
    
    Memory --> LLM
    Memory --> Embeddings
    AsyncMemory --> LLM
    AsyncMemory --> Embeddings
    
    Memory --> VectorDB
    Memory --> HistoryDB
    Memory --> GraphDB
    AsyncMemory --> VectorDB
    AsyncMemory --> HistoryDB
    AsyncMemory --> GraphDB
    
    Config --> LLM
    Config --> Embeddings
    Config --> VectorDB
    Config --> HistoryDB
    Config --> GraphDB
Loading

Core Components

1. Memory Engine

The central component that orchestrates all memory operations.

Key Classes:

  • Memory: Synchronous memory operations
  • AsyncMemory: Asynchronous memory operations
  • MemoryClient: Remote API client

Core Operations:

  • add(): Store new memories from conversations
  • get(): Retrieve specific memory by ID
  • get_all(): List memories with filtering
  • search(): Semantic search across memories
  • update(): Modify existing memories
  • delete(): Remove memories
  • history(): Get change history for a memory

2. LLM Integration Layer

Supports 20+ LLM providers for fact extraction and memory processing.

Supported Providers:

  • OpenAI (GPT-3.5, GPT-4, GPT-4o)
  • Anthropic (Claude 3.5)
  • Google (Gemini Pro)
  • AWS Bedrock
  • Azure OpenAI
  • Groq, Together, Ollama, and more

Key Functions:

  • Fact extraction from conversations
  • Memory update decision making
  • Structured output generation

3. Embedding Layer

Converts text to vector representations for semantic search.

Supported Providers:

  • OpenAI (text-embedding-ada-002, text-embedding-3-small/large)
  • HuggingFace transformers
  • Google Vertex AI
  • Together AI
  • AWS Bedrock
  • And more

4. Vector Storage Layer

Stores memory embeddings and metadata for fast semantic retrieval.

Supported Vector Stores:

  • Local: Qdrant, Chroma, FAISS
  • Cloud: Pinecone, Weaviate, MongoDB Atlas
  • Enterprise: Azure AI Search, Elasticsearch

5. History Database

SQLite database that tracks all memory operations for audit trails.

Schema:

CREATE TABLE history (
    id           TEXT PRIMARY KEY,
    memory_id    TEXT,
    old_memory   TEXT,
    new_memory   TEXT,
    event        TEXT,          -- 'ADD', 'UPDATE', 'DELETE'
    created_at   DATETIME,
    updated_at   DATETIME,
    is_deleted   INTEGER,
    actor_id     TEXT,
    role         TEXT
)

6. Graph Memory (Optional)

Manages entity relationships and connections between memories.

Supported Graph Stores:

  • Neo4j
  • Memgraph
  • AWS Neptune

Data Flow

Memory Addition Flow

sequenceDiagram
    participant User
    participant Memory
    participant LLM
    participant Embeddings
    participant VectorStore
    participant HistoryDB
    participant GraphStore
    
    User->>Memory: add(messages, user_id)
    Memory->>LLM: extract_facts(messages)
    LLM-->>Memory: extracted_facts[]
    
    loop For each fact
        Memory->>Embeddings: embed(fact)
        Embeddings-->>Memory: vector
        Memory->>VectorStore: search_similar(vector)
        VectorStore-->>Memory: existing_memories[]
    end
    
    Memory->>LLM: update_memory_decisions(facts, existing)
    LLM-->>Memory: actions[ADD/UPDATE/DELETE]
    
    loop For each action
        alt ADD
            Memory->>VectorStore: insert(vector, metadata)
            Memory->>HistoryDB: add_history(memory_id, null, new_memory, "ADD")
        else UPDATE
            Memory->>VectorStore: update(memory_id, vector, metadata)
            Memory->>HistoryDB: add_history(memory_id, old_memory, new_memory, "UPDATE")
        else DELETE
            Memory->>VectorStore: delete(memory_id)
            Memory->>HistoryDB: add_history(memory_id, old_memory, null, "DELETE")
        end
        
        opt Graph enabled
            Memory->>GraphStore: update_relationships(entities)
        end
    end
    
    Memory-->>User: memory_results[]
Loading

Memory Search Flow

sequenceDiagram
    participant User
    participant Memory
    participant Embeddings
    participant VectorStore
    participant GraphStore
    
    User->>Memory: search(query, user_id)
    Memory->>Embeddings: embed(query)
    Embeddings-->>Memory: query_vector
    
    Memory->>VectorStore: search(query_vector, filters)
    VectorStore-->>Memory: similar_memories[]
    
    opt Graph enabled
        Memory->>GraphStore: search_related_entities(query)
        GraphStore-->>Memory: entity_relationships[]
        Memory->>Memory: combine_results(memories, relationships)
    end
    
    Memory-->>User: search_results[]
Loading

Storage Architecture

Physical Storage Layout

~/.mem0/                          # Default storage directory
├── history.db                   # SQLite history database
├── qdrant/                       # Qdrant vector store (default)
│   ├── collection/              # Vector embeddings + metadata
│   │   ├── segments/           # Vector data segments
│   │   └── meta.json           # Collection metadata
│   └── wal/                    # Write-ahead log
├── migrations_qdrant/          # Migration-related vectors
└── graph_store/                # Graph database files (if enabled)

Memory Storage by Level

All memory levels are stored in the same physical storage but logically separated using metadata:

graph TD
    subgraph Physical["Physical Storage"]
        VDB[Vector Database]
        HDB[History Database]
        GDB[Graph Database]
    end
    
    subgraph Logical["Logical Separation"]
        U["User Level<br/>user_id: alice"]
        S["Session Level<br/>user_id: alice<br/>run_id: session_123"]
        A["Agent Level<br/>user_id: alice<br/>agent_id: assistant_v1"]
    end
    
    U --> VDB
    S --> VDB
    A --> VDB
    
    U --> HDB
    S --> HDB
    A --> HDB
    
    U --> GDB
    S --> GDB
    A --> GDB
Loading

Data Separation Strategy

Level Scope Filter Example
User Personal preferences, long-term facts user_id: "alice"
Session Conversation context, temporary state user_id: "alice", run_id: "chat_123"
Agent AI assistant behaviors, workflows user_id: "alice", agent_id: "assistant_v1"

Memory Levels & Scoping

Memory Hierarchy

graph LR
    subgraph Scoping["Memory Scoping"]
        U["User Level<br/>Personal Preferences<br/>Long-term Facts"]
        A["Agent Level<br/>AI Behaviors<br/>Workflows"]
        S["Session Level<br/>Conversation Context<br/>Temporary State"]
    end
    
    U -.-> A
    U -.-> S
    A -.-> S
    
    subgraph Examples["Examples"]
        U1["Likes Italian food<br/>Lives in San Francisco<br/>Prefers morning meetings"]
        A1["Use formal tone<br/>Focus on technical details<br/>Remember coding preferences"]
        S1["Current project: Web app<br/>Debugging React issue<br/>Working on API integration"]
    end
    
    U --- U1
    A --- A1
    S --- S1
Loading

Retrieval Strategy

When searching memories, the system can:

  1. Scope by single level: Only user memories
  2. Combine multiple levels: User + Agent memories
  3. Hierarchical search: Session → Agent → User (fallback)

Configuration System

Configuration Hierarchy

graph TB
    subgraph Layers["Configuration Layers"]
        DC["Default Config<br/>Built-in defaults"]
        EC["Environment Config<br/>Environment variables"]
        FC["File Config<br/>YAML/JSON files"]
        PC["Programmatic Config<br/>Python objects"]
    end
    
    DC --> EC
    EC --> FC
    FC --> PC
    
    subgraph Components["Config Components"]
        LLM["LLM Config<br/>Provider, model, API keys"]
        EMB["Embedder Config<br/>Provider, dimensions"]
        VEC["Vector Store Config<br/>Provider, connection details"]
        GRA["Graph Store Config<br/>Provider, database URL"]
        HIS["History DB Config<br/>SQLite file path"]
    end
    
    PC --> LLM
    PC --> EMB
    PC --> VEC
    PC --> GRA
    PC --> HIS
Loading

Example Configuration

from mem0 import Memory
from mem0.configs.base import MemoryConfig

config = MemoryConfig(
    llm={
        "provider": "openai",
        "config": {
            "model": "gpt-4o",
            "temperature": 0.1
        }
    },
    embedder={
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-large"
        }
    },
    vector_store={
        "provider": "qdrant",
        "config": {
            "collection_name": "my_memories",
            "path": "./my_vector_db"
        }
    },
    graph_store={
        "provider": "neo4j",
        "config": {
            "url": "bolt://localhost:7687",
            "username": "neo4j",
            "password": "password"
        }
    }
)

memory = Memory(config)

API Layers

Three API Interfaces

graph TB
    subgraph Interfaces["API Interfaces"]
        SYNC["Synchronous API<br/>Memory class<br/>Direct method calls"]
        ASYNC["Asynchronous API<br/>AsyncMemory class<br/>Async/await support"]
        CLIENT["Client API<br/>MemoryClient class<br/>HTTP REST calls"]
    end
    
    subgraph UseCases["Use Cases"]
        SYNC_UC["Simple scripts<br/>Jupyter notebooks<br/>Blocking operations"]
        ASYNC_UC["High-performance apps<br/>Web servers<br/>Concurrent processing"]
        CLIENT_UC["Remote deployments<br/>Microservices<br/>Platform integration"]
    end
    
    SYNC --- SYNC_UC
    ASYNC --- ASYNC_UC
    CLIENT --- CLIENT_UC
    
    subgraph Backend["Backend Services"]
        LOCAL["Local Storage<br/>File system"]
        PLATFORM["Mem0 Platform<br/>Hosted service"]
    end
    
    SYNC --> LOCAL
    ASYNC --> LOCAL
    CLIENT --> PLATFORM
Loading

API Usage Examples

Synchronous API:

from mem0 import Memory

memory = Memory()
result = memory.add("I love pizza", user_id="alice")
memories = memory.get_all(user_id="alice")

Asynchronous API:

from mem0 import AsyncMemory
import asyncio

async def main():
    memory = AsyncMemory()
    result = await memory.add("I love pizza", user_id="alice")
    memories = await memory.get_all(user_id="alice")

asyncio.run(main())

Client API:

from mem0 import MemoryClient

client = MemoryClient(api_key="your-api-key")
result = client.add("I love pizza", user_id="alice")
memories = client.get_all(user_id="alice")

Deployment Options

Local Development

graph LR
    subgraph Local["Local Machine"]
        APP["Python Application"]
        MEM0["Mem0 Library"]
        VDB["Vector DB<br/>Qdrant/Chroma"]
        HDB["SQLite History"]
    end
    
    APP --> MEM0
    MEM0 --> VDB
    MEM0 --> HDB
    
    subgraph External["External APIs"]
        OPENAI["OpenAI API"]
        HF["HuggingFace"]
    end
    
    MEM0 -.-> OPENAI
    MEM0 -.-> HF
Loading

Production Self-Hosted

graph TB
    subgraph AppTier["Application Tier"]
        APP1["App Instance 1"]
        APP2["App Instance 2"]
        LB["Load Balancer"]
    end
    
    subgraph StorageTier["Storage Tier"]
        VDB["Vector Database<br/>Pinecone/Weaviate"]
        HDB["History Database<br/>PostgreSQL"]
        GDB["Graph Database<br/>Neo4j"]
    end
    
    subgraph ExtServices["External Services"]
        LLM["LLM APIs<br/>OpenAI/Anthropic"]
        EMB["Embedding APIs<br/>OpenAI/Cohere"]
    end
    
    LB --> APP1
    LB --> APP2
    
    APP1 --> VDB
    APP1 --> HDB
    APP1 --> GDB
    APP2 --> VDB
    APP2 --> HDB
    APP2 --> GDB
    
    APP1 -.-> LLM
    APP1 -.-> EMB
    APP2 -.-> LLM
    APP2 -.-> EMB
Loading

Mem0 Platform (Hosted)

graph TB
    subgraph YourApp["Your Application"]
        APP["Your App"]
        CLIENT["MemoryClient"]
    end
    
    subgraph Platform["Mem0 Platform"]
        API["REST API"]
        ENGINE["Memory Engine"]
        VDB["Managed Vector DB"]
        HDB["Managed History DB"]
        GDB["Managed Graph DB"]
    end
    
    APP --> CLIENT
    CLIENT --> API
    API --> ENGINE
    ENGINE --> VDB
    ENGINE --> HDB
    ENGINE --> GDB
Loading

Performance Considerations

Scaling Factors

Component Scaling Strategy Considerations
Vector Search Horizontal sharding, Index optimization Query latency, Memory usage
History Database Read replicas, Partitioning Storage growth, Query performance
LLM Calls Caching, Batching, Rate limiting API costs, Response time
Embeddings Local models, Batch processing Inference time, Model size

Performance Optimizations

  1. Embedding Caching: Cache embeddings to avoid recomputation
  2. Batch Operations: Process multiple memories simultaneously
  3. Async Processing: Use AsyncMemory for concurrent operations
  4. Vector Index Tuning: Optimize HNSW parameters for your use case
  5. Memory Pruning: Implement retention policies for old memories

Memory Usage Patterns

graph LR
    subgraph Growth["Memory Growth Over Time"]
        T1["Week 1<br/>100 memories<br/>~1MB storage"]
        T2["Month 1<br/>1K memories<br/>~10MB storage"]
        T3["Year 1<br/>50K memories<br/>~500MB storage"]
    end
    
    T1 --> T2 --> T3
    
    subgraph Optimization["Optimization Strategies"]
        COMPRESS["Vector Compression<br/>Reduce dimensions"]
        PRUNE["Memory Pruning<br/>Remove old/irrelevant"]
        ARCHIVE["Cold Storage<br/>Archive old data"]
    end
    
    T3 -.-> COMPRESS
    T3 -.-> PRUNE
    T3 -.-> ARCHIVE
Loading

Conclusion

Mem0's architecture provides a robust, scalable foundation for building AI applications with persistent memory capabilities. The modular design allows for easy customization and scaling, while the multi-level memory system enables sophisticated context management for different use cases.

Key architectural benefits:

  • Modularity: Swap providers without code changes
  • Scalability: From local development to enterprise deployment
  • Flexibility: Support for multiple memory paradigms
  • Auditability: Complete change tracking
  • Performance: Optimized for real-time AI applications

Core Memory Operations

1. add() - Create New Memories

def add(
    self,
    messages,
    *,
    user_id: Optional[str] = None,
    agent_id: Optional[str] = None,
    run_id: Optional[str] = None,
    metadata: Optional[Dict[str, Any]] = None,
    infer: bool = True,
    memory_type: Optional[str] = None,
    prompt: Optional[str] = None,
)
  • Purpose: Create new memories from messages/conversations
  • Key Features: Supports fact extraction, raw storage, procedural memories
  • Returns: Dictionary with results and optional relations

2. get() - Retrieve Single Memory

def get(self, memory_id)
  • Purpose: Retrieve a specific memory by its ID
  • Returns: Memory object or None if not found

3. get_all() - List All Memories

def get_all(
    self,
    *,
    user_id: Optional[str] = None,
    agent_id: Optional[str] = None,
    run_id: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    limit: int = 100,
)
  • Purpose: List all memories with optional filtering
  • Returns: Dictionary with results list and optional relations

4. search() - Semantic Search

def search(
    self,
    query: str,
    *,
    user_id: Optional[str] = None,
    agent_id: Optional[str] = None,
    run_id: Optional[str] = None,
    limit: int = 100,
    filters: Optional[Dict[str, Any]] = None,
    threshold: Optional[float] = None,
)
  • Purpose: Semantic search across memories using vector similarity
  • Returns: Dictionary with ranked results and optional relations

5. update() - Modify Memory

def update(self, memory_id, data)
  • Purpose: Update the content of an existing memory
  • Returns: Success message

6. delete() - Remove Single Memory

def delete(self, memory_id)
  • Purpose: Delete a specific memory by ID
  • Returns: Success message

7. delete_all() - Bulk Delete

def delete_all(
    self, 
    user_id: Optional[str] = None, 
    agent_id: Optional[str] = None, 
    run_id: Optional[str] = None
)
  • Purpose: Delete all memories matching the specified filters
  • Returns: Success message with count

History & Audit APIs

8. history() - Get Change History

def history(self, memory_id)
  • Purpose: Get complete change history for a specific memory
  • Returns: List of all ADD/UPDATE/DELETE operations for that memory

Management APIs

9. reset() - Complete Reset

def reset(self)
  • Purpose: Delete all memories and reset the entire memory store
  • Warning: Destructive operation - removes everything

10. chat() - Chat Interface (Not Implemented)

def chat(self, query)
  • Status: Raises NotImplementedError
  • Purpose: Reserved for future chat functionality

Configuration APIs

11. from_config() - Create from Config Dict

@classmethod
def from_config(cls, config_dict: Dict[str, Any])
  • Purpose: Create Memory instance from configuration dictionary
  • Returns: Memory instance

12. __init__() - Constructor

def __init__(self, config: MemoryConfig = MemoryConfig())
  • Purpose: Initialize Memory instance with configuration
  • Default: Uses default MemoryConfig if none provided

AsyncMemory Class - Identical API Set

The AsyncMemory class provides exactly the same APIs but with async/await support:

  • async def add()
  • async def get()
  • async def get_all()
  • async def search()
  • async def update()
  • async def delete()
  • async def delete_all()
  • async def history()
  • async def reset()
  • async def chat() (not implemented)
  • @classmethod async def from_config()

Summary

Total APIs: 12 public methods (10 implemented + 2 special)

  • 8 Core Operations: add, get, get_all, search, update, delete, delete_all, history
  • 2 Management: reset, chat (placeholder)
  • 2 Configuration: init, from_config

Key Features:

  • Multi-level scoping (user_id, agent_id, run_id)
  • Semantic search with vector embeddings
  • Complete audit trail via history
  • Bulk operations (get_all, delete_all)
  • Custom metadata support
  • Graph relationships (when enabled)
  • Async support via AsyncMemory
  • Fact extraction or raw storage modes

All APIs support both synchronous and asynchronous execution patterns depending on which class you use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment