Mem0 Architecture Documentation

Overview
High-Level Architecture
Core Components
Data Flow
Storage Architecture
Memory Levels & Scoping
Configuration System
API Layers
Deployment Options
Performance Considerations

Overview

Mem0 is an intelligent memory layer for AI applications that provides personalized, persistent memory capabilities. It's designed as a modular, extensible system that can integrate with various LLMs, embedding models, and storage backends.

Key Features

Multi-level Memory Scoping: User, Agent, and Session level memories
Semantic Search: Vector-based memory retrieval
Change Tracking: Complete audit trail of memory operations
Graph Relationships: Entity relationship mapping (optional)
Multi-provider Support: 20+ LLM providers, 10+ vector stores, 14+ embedding models

High-Level Architecture

graph TB
    User["User Application"]
    
    subgraph CoreAPI["Mem0 Core API"]
        Memory["Memory Class<br/>(Sync API)"]
        AsyncMemory["AsyncMemory<br/>(Async API)"]
        Client["MemoryClient<br/>(Remote API)"]
    end
    
    Config["MemoryConfig<br/>Central Configuration"]
    
    subgraph Processing["AI Processing"]
        LLM["LLM Layer<br/>(20+ providers)"]
        Embeddings["Embedding Layer<br/>(14+ providers)"]
    end
    
    subgraph Storage["Storage Systems"]
        VectorDB["Vector Store<br/>(10+ providers)"]
        HistoryDB["History Database<br/>(SQLite)"]
        GraphDB["Graph Store<br/>(Neo4j/Memgraph)"]
    end
    
    Platform["Mem0 Platform<br/>(Hosted Service)"]
    
    User --> Memory
    User --> AsyncMemory
    User --> Client
    
    Memory --> Config
    AsyncMemory --> Config
    Client --> Platform
    
    Memory --> LLM
    Memory --> Embeddings
    AsyncMemory --> LLM
    AsyncMemory --> Embeddings
    
    Memory --> VectorDB
    Memory --> HistoryDB
    Memory --> GraphDB
    AsyncMemory --> VectorDB
    AsyncMemory --> HistoryDB
    AsyncMemory --> GraphDB
    
    Config --> LLM
    Config --> Embeddings
    Config --> VectorDB
    Config --> HistoryDB
    Config --> GraphDB

Core Components

1. Memory Engine

The central component that orchestrates all memory operations.

Key Classes:

Memory: Synchronous memory operations
AsyncMemory: Asynchronous memory operations
MemoryClient: Remote API client

Core Operations:

add(): Store new memories from conversations
get(): Retrieve specific memory by ID
get_all(): List memories with filtering
search(): Semantic search across memories
update(): Modify existing memories
delete(): Remove memories
history(): Get change history for a memory

2. LLM Integration Layer

Supports 20+ LLM providers for fact extraction and memory processing.

Supported Providers:

OpenAI (GPT-3.5, GPT-4, GPT-4o)
Anthropic (Claude 3.5)
Google (Gemini Pro)
AWS Bedrock
Azure OpenAI
Groq, Together, Ollama, and more

Key Functions:

Fact extraction from conversations
Memory update decision making
Structured output generation

3. Embedding Layer

Converts text to vector representations for semantic search.

Supported Providers:

OpenAI (text-embedding-ada-002, text-embedding-3-small/large)
HuggingFace transformers
Google Vertex AI
Together AI
AWS Bedrock
And more

4. Vector Storage Layer

Stores memory embeddings and metadata for fast semantic retrieval.

Supported Vector Stores:

Local: Qdrant, Chroma, FAISS
Cloud: Pinecone, Weaviate, MongoDB Atlas
Enterprise: Azure AI Search, Elasticsearch

5. History Database

SQLite database that tracks all memory operations for audit trails.

Schema:

CREATE TABLE history (
    id           TEXT PRIMARY KEY,
    memory_id    TEXT,
    old_memory   TEXT,
    new_memory   TEXT,
    event        TEXT,          -- 'ADD', 'UPDATE', 'DELETE'
    created_at   DATETIME,
    updated_at   DATETIME,
    is_deleted   INTEGER,
    actor_id     TEXT,
    role         TEXT
)

6. Graph Memory (Optional)

Manages entity relationships and connections between memories.

Supported Graph Stores:

Neo4j
Memgraph
AWS Neptune

Data Flow

Memory Addition Flow

sequenceDiagram
    participant User
    participant Memory
    participant LLM
    participant Embeddings
    participant VectorStore
    participant HistoryDB
    participant GraphStore
    
    User->>Memory: add(messages, user_id)
    Memory->>LLM: extract_facts(messages)
    LLM-->>Memory: extracted_facts[]
    
    loop For each fact
        Memory->>Embeddings: embed(fact)
        Embeddings-->>Memory: vector
        Memory->>VectorStore: search_similar(vector)
        VectorStore-->>Memory: existing_memories[]
    end
    
    Memory->>LLM: update_memory_decisions(facts, existing)
    LLM-->>Memory: actions[ADD/UPDATE/DELETE]
    
    loop For each action
        alt ADD
            Memory->>VectorStore: insert(vector, metadata)
            Memory->>HistoryDB: add_history(memory_id, null, new_memory, "ADD")
        else UPDATE
            Memory->>VectorStore: update(memory_id, vector, metadata)
            Memory->>HistoryDB: add_history(memory_id, old_memory, new_memory, "UPDATE")
        else DELETE
            Memory->>VectorStore: delete(memory_id)
            Memory->>HistoryDB: add_history(memory_id, old_memory, null, "DELETE")
        end
        
        opt Graph enabled
            Memory->>GraphStore: update_relationships(entities)
        end
    end
    
    Memory-->>User: memory_results[]

Memory Search Flow

sequenceDiagram
    participant User
    participant Memory
    participant Embeddings
    participant VectorStore
    participant GraphStore
    
    User->>Memory: search(query, user_id)
    Memory->>Embeddings: embed(query)
    Embeddings-->>Memory: query_vector
    
    Memory->>VectorStore: search(query_vector, filters)
    VectorStore-->>Memory: similar_memories[]
    
    opt Graph enabled
        Memory->>GraphStore: search_related_entities(query)
        GraphStore-->>Memory: entity_relationships[]
        Memory->>Memory: combine_results(memories, relationships)
    end
    
    Memory-->>User: search_results[]

Storage Architecture

Physical Storage Layout

~/.mem0/                          # Default storage directory
├── history.db                   # SQLite history database
├── qdrant/                       # Qdrant vector store (default)
│   ├── collection/              # Vector embeddings + metadata
│   │   ├── segments/           # Vector data segments
│   │   └── meta.json           # Collection metadata
│   └── wal/                    # Write-ahead log
├── migrations_qdrant/          # Migration-related vectors
└── graph_store/                # Graph database files (if enabled)

Memory Storage by Level

All memory levels are stored in the same physical storage but logically separated using metadata:

graph TD
    subgraph Physical["Physical Storage"]
        VDB[Vector Database]
        HDB[History Database]
        GDB[Graph Database]
    end
    
    subgraph Logical["Logical Separation"]
        U["User Level<br/>user_id: alice"]
        S["Session Level<br/>user_id: alice<br/>run_id: session_123"]
        A["Agent Level<br/>user_id: alice<br/>agent_id: assistant_v1"]
    end
    
    U --> VDB
    S --> VDB
    A --> VDB
    
    U --> HDB
    S --> HDB
    A --> HDB
    
    U --> GDB
    S --> GDB
    A --> GDB

Data Separation Strategy

Level	Scope	Filter Example
User	Personal preferences, long-term facts	`user_id: "alice"`
Session	Conversation context, temporary state	`user_id: "alice", run_id: "chat_123"`
Agent	AI assistant behaviors, workflows	`user_id: "alice", agent_id: "assistant_v1"`

Memory Levels & Scoping

Memory Hierarchy

graph LR
    subgraph Scoping["Memory Scoping"]
        U["User Level<br/>Personal Preferences<br/>Long-term Facts"]
        A["Agent Level<br/>AI Behaviors<br/>Workflows"]
        S["Session Level<br/>Conversation Context<br/>Temporary State"]
    end
    
    U -.-> A
    U -.-> S
    A -.-> S
    
    subgraph Examples["Examples"]
        U1["Likes Italian food<br/>Lives in San Francisco<br/>Prefers morning meetings"]
        A1["Use formal tone<br/>Focus on technical details<br/>Remember coding preferences"]
        S1["Current project: Web app<br/>Debugging React issue<br/>Working on API integration"]
    end
    
    U --- U1
    A --- A1
    S --- S1

Retrieval Strategy

When searching memories, the system can:

Scope by single level: Only user memories
Combine multiple levels: User + Agent memories
Hierarchical search: Session → Agent → User (fallback)

Configuration System

Configuration Hierarchy

graph TB
    subgraph Layers["Configuration Layers"]
        DC["Default Config<br/>Built-in defaults"]
        EC["Environment Config<br/>Environment variables"]
        FC["File Config<br/>YAML/JSON files"]
        PC["Programmatic Config<br/>Python objects"]
    end
    
    DC --> EC
    EC --> FC
    FC --> PC
    
    subgraph Components["Config Components"]
        LLM["LLM Config<br/>Provider, model, API keys"]
        EMB["Embedder Config<br/>Provider, dimensions"]
        VEC["Vector Store Config<br/>Provider, connection details"]
        GRA["Graph Store Config<br/>Provider, database URL"]
        HIS["History DB Config<br/>SQLite file path"]
    end
    
    PC --> LLM
    PC --> EMB
    PC --> VEC
    PC --> GRA
    PC --> HIS

Example Configuration

from mem0 import Memory
from mem0.configs.base import MemoryConfig

config = MemoryConfig(
    llm={
        "provider": "openai",
        "config": {
            "model": "gpt-4o",
            "temperature": 0.1
        }
    },
    embedder={
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-large"
        }
    },
    vector_store={
        "provider": "qdrant",
        "config": {
            "collection_name": "my_memories",
            "path": "./my_vector_db"
        }
    },
    graph_store={
        "provider": "neo4j",
        "config": {
            "url": "bolt://localhost:7687",
            "username": "neo4j",
            "password": "password"
        }
    }
)

memory = Memory(config)

API Layers

Three API Interfaces

graph TB
    subgraph Interfaces["API Interfaces"]
        SYNC["Synchronous API<br/>Memory class<br/>Direct method calls"]
        ASYNC["Asynchronous API<br/>AsyncMemory class<br/>Async/await support"]
        CLIENT["Client API<br/>MemoryClient class<br/>HTTP REST calls"]
    end
    
    subgraph UseCases["Use Cases"]
        SYNC_UC["Simple scripts<br/>Jupyter notebooks<br/>Blocking operations"]
        ASYNC_UC["High-performance apps<br/>Web servers<br/>Concurrent processing"]
        CLIENT_UC["Remote deployments<br/>Microservices<br/>Platform integration"]
    end
    
    SYNC --- SYNC_UC
    ASYNC --- ASYNC_UC
    CLIENT --- CLIENT_UC
    
    subgraph Backend["Backend Services"]
        LOCAL["Local Storage<br/>File system"]
        PLATFORM["Mem0 Platform<br/>Hosted service"]
    end
    
    SYNC --> LOCAL
    ASYNC --> LOCAL
    CLIENT --> PLATFORM

API Usage Examples

Synchronous API:

from mem0 import Memory

memory = Memory()
result = memory.add("I love pizza", user_id="alice")
memories = memory.get_all(user_id="alice")

Asynchronous API:

from mem0 import AsyncMemory
import asyncio

async def main():
    memory = AsyncMemory()
    result = await memory.add("I love pizza", user_id="alice")
    memories = await memory.get_all(user_id="alice")

asyncio.run(main())

Client API:

from mem0 import MemoryClient

client = MemoryClient(api_key="your-api-key")
result = client.add("I love pizza", user_id="alice")
memories = client.get_all(user_id="alice")

Deployment Options

Local Development

graph LR
    subgraph Local["Local Machine"]
        APP["Python Application"]
        MEM0["Mem0 Library"]
        VDB["Vector DB<br/>Qdrant/Chroma"]
        HDB["SQLite History"]
    end
    
    APP --> MEM0
    MEM0 --> VDB
    MEM0 --> HDB
    
    subgraph External["External APIs"]
        OPENAI["OpenAI API"]
        HF["HuggingFace"]
    end
    
    MEM0 -.-> OPENAI
    MEM0 -.-> HF

Production Self-Hosted

graph TB
    subgraph AppTier["Application Tier"]
        APP1["App Instance 1"]
        APP2["App Instance 2"]
        LB["Load Balancer"]
    end
    
    subgraph StorageTier["Storage Tier"]
        VDB["Vector Database<br/>Pinecone/Weaviate"]
        HDB["History Database<br/>PostgreSQL"]
        GDB["Graph Database<br/>Neo4j"]
    end
    
    subgraph ExtServices["External Services"]
        LLM["LLM APIs<br/>OpenAI/Anthropic"]
        EMB["Embedding APIs<br/>OpenAI/Cohere"]
    end
    
    LB --> APP1
    LB --> APP2
    
    APP1 --> VDB
    APP1 --> HDB
    APP1 --> GDB
    APP2 --> VDB
    APP2 --> HDB
    APP2 --> GDB
    
    APP1 -.-> LLM
    APP1 -.-> EMB
    APP2 -.-> LLM
    APP2 -.-> EMB

Mem0 Platform (Hosted)

graph TB
    subgraph YourApp["Your Application"]
        APP["Your App"]
        CLIENT["MemoryClient"]
    end
    
    subgraph Platform["Mem0 Platform"]
        API["REST API"]
        ENGINE["Memory Engine"]
        VDB["Managed Vector DB"]
        HDB["Managed History DB"]
        GDB["Managed Graph DB"]
    end
    
    APP --> CLIENT
    CLIENT --> API
    API --> ENGINE
    ENGINE --> VDB
    ENGINE --> HDB
    ENGINE --> GDB

Performance Considerations

Scaling Factors

Component	Scaling Strategy	Considerations
Vector Search	Horizontal sharding, Index optimization	Query latency, Memory usage
History Database	Read replicas, Partitioning	Storage growth, Query performance
LLM Calls	Caching, Batching, Rate limiting	API costs, Response time
Embeddings	Local models, Batch processing	Inference time, Model size

Performance Optimizations

Embedding Caching: Cache embeddings to avoid recomputation
Batch Operations: Process multiple memories simultaneously
Async Processing: Use AsyncMemory for concurrent operations
Vector Index Tuning: Optimize HNSW parameters for your use case
Memory Pruning: Implement retention policies for old memories

Memory Usage Patterns

graph LR
    subgraph Growth["Memory Growth Over Time"]
        T1["Week 1<br/>100 memories<br/>~1MB storage"]
        T2["Month 1<br/>1K memories<br/>~10MB storage"]
        T3["Year 1<br/>50K memories<br/>~500MB storage"]
    end
    
    T1 --> T2 --> T3
    
    subgraph Optimization["Optimization Strategies"]
        COMPRESS["Vector Compression<br/>Reduce dimensions"]
        PRUNE["Memory Pruning<br/>Remove old/irrelevant"]
        ARCHIVE["Cold Storage<br/>Archive old data"]
    end
    
    T3 -.-> COMPRESS
    T3 -.-> PRUNE
    T3 -.-> ARCHIVE

Conclusion

Mem0's architecture provides a robust, scalable foundation for building AI applications with persistent memory capabilities. The modular design allows for easy customization and scaling, while the multi-level memory system enables sophisticated context management for different use cases.

Key architectural benefits:

Modularity: Swap providers without code changes
Scalability: From local development to enterprise deployment
Flexibility: Support for multiple memory paradigms
Auditability: Complete change tracking
Performance: Optimized for real-time AI applications

coderplay/0-mem0-architecture.md

Mem0 Architecture Documentation

Table of Contents

Overview

Key Features

High-Level Architecture

Core Components

1. Memory Engine

2. LLM Integration Layer

3. Embedding Layer

4. Vector Storage Layer

5. History Database

6. Graph Memory (Optional)

Data Flow

Memory Addition Flow

Memory Search Flow

Storage Architecture

Physical Storage Layout

Memory Storage by Level

Data Separation Strategy

Memory Levels & Scoping

Memory Hierarchy

Retrieval Strategy

Configuration System

Configuration Hierarchy

Example Configuration

API Layers

Three API Interfaces

API Usage Examples

Deployment Options

Local Development

Production Self-Hosted

Mem0 Platform (Hosted)

Performance Considerations

Scaling Factors

Performance Optimizations

Memory Usage Patterns

Conclusion

Core Memory Operations

1. add() - Create New Memories

2. get() - Retrieve Single Memory

3. get_all() - List All Memories

4. search() - Semantic Search

5. update() - Modify Memory

6. delete() - Remove Single Memory

7. delete_all() - Bulk Delete

History & Audit APIs

8. history() - Get Change History

Management APIs

9. reset() - Complete Reset

10. chat() - Chat Interface (Not Implemented)

Configuration APIs

11. from_config() - Create from Config Dict

12. __init__() - Constructor

AsyncMemory Class - Identical API Set

Summary

1. `add()` - Create New Memories

2. `get()` - Retrieve Single Memory

3. `get_all()` - List All Memories

4. `search()` - Semantic Search

5. `update()` - Modify Memory

6. `delete()` - Remove Single Memory

7. `delete_all()` - Bulk Delete

8. `history()` - Get Change History

9. `reset()` - Complete Reset

10. `chat()` - Chat Interface (Not Implemented)

11. `from_config()` - Create from Config Dict

12. `init()` - Constructor