- Overview
- High-Level Architecture
- Core Components
- Data Flow
- Storage Architecture
- Memory Levels & Scoping
- Configuration System
- API Layers
- Deployment Options
- Performance Considerations
Mem0 is an intelligent memory layer for AI applications that provides personalized, persistent memory capabilities. It's designed as a modular, extensible system that can integrate with various LLMs, embedding models, and storage backends.
- Multi-level Memory Scoping: User, Agent, and Session level memories
- Semantic Search: Vector-based memory retrieval
- Change Tracking: Complete audit trail of memory operations
- Graph Relationships: Entity relationship mapping (optional)
- Multi-provider Support: 20+ LLM providers, 10+ vector stores, 14+ embedding models
graph TB
User["User Application"]
subgraph CoreAPI["Mem0 Core API"]
Memory["Memory Class<br/>(Sync API)"]
AsyncMemory["AsyncMemory<br/>(Async API)"]
Client["MemoryClient<br/>(Remote API)"]
end
Config["MemoryConfig<br/>Central Configuration"]
subgraph Processing["AI Processing"]
LLM["LLM Layer<br/>(20+ providers)"]
Embeddings["Embedding Layer<br/>(14+ providers)"]
end
subgraph Storage["Storage Systems"]
VectorDB["Vector Store<br/>(10+ providers)"]
HistoryDB["History Database<br/>(SQLite)"]
GraphDB["Graph Store<br/>(Neo4j/Memgraph)"]
end
Platform["Mem0 Platform<br/>(Hosted Service)"]
User --> Memory
User --> AsyncMemory
User --> Client
Memory --> Config
AsyncMemory --> Config
Client --> Platform
Memory --> LLM
Memory --> Embeddings
AsyncMemory --> LLM
AsyncMemory --> Embeddings
Memory --> VectorDB
Memory --> HistoryDB
Memory --> GraphDB
AsyncMemory --> VectorDB
AsyncMemory --> HistoryDB
AsyncMemory --> GraphDB
Config --> LLM
Config --> Embeddings
Config --> VectorDB
Config --> HistoryDB
Config --> GraphDB
The central component that orchestrates all memory operations.
Key Classes:
Memory: Synchronous memory operationsAsyncMemory: Asynchronous memory operationsMemoryClient: Remote API client
Core Operations:
add(): Store new memories from conversationsget(): Retrieve specific memory by IDget_all(): List memories with filteringsearch(): Semantic search across memoriesupdate(): Modify existing memoriesdelete(): Remove memorieshistory(): Get change history for a memory
Supports 20+ LLM providers for fact extraction and memory processing.
Supported Providers:
- OpenAI (GPT-3.5, GPT-4, GPT-4o)
- Anthropic (Claude 3.5)
- Google (Gemini Pro)
- AWS Bedrock
- Azure OpenAI
- Groq, Together, Ollama, and more
Key Functions:
- Fact extraction from conversations
- Memory update decision making
- Structured output generation
Converts text to vector representations for semantic search.
Supported Providers:
- OpenAI (text-embedding-ada-002, text-embedding-3-small/large)
- HuggingFace transformers
- Google Vertex AI
- Together AI
- AWS Bedrock
- And more
Stores memory embeddings and metadata for fast semantic retrieval.
Supported Vector Stores:
- Local: Qdrant, Chroma, FAISS
- Cloud: Pinecone, Weaviate, MongoDB Atlas
- Enterprise: Azure AI Search, Elasticsearch
SQLite database that tracks all memory operations for audit trails.
Schema:
CREATE TABLE history (
id TEXT PRIMARY KEY,
memory_id TEXT,
old_memory TEXT,
new_memory TEXT,
event TEXT, -- 'ADD', 'UPDATE', 'DELETE'
created_at DATETIME,
updated_at DATETIME,
is_deleted INTEGER,
actor_id TEXT,
role TEXT
)Manages entity relationships and connections between memories.
Supported Graph Stores:
- Neo4j
- Memgraph
- AWS Neptune
sequenceDiagram
participant User
participant Memory
participant LLM
participant Embeddings
participant VectorStore
participant HistoryDB
participant GraphStore
User->>Memory: add(messages, user_id)
Memory->>LLM: extract_facts(messages)
LLM-->>Memory: extracted_facts[]
loop For each fact
Memory->>Embeddings: embed(fact)
Embeddings-->>Memory: vector
Memory->>VectorStore: search_similar(vector)
VectorStore-->>Memory: existing_memories[]
end
Memory->>LLM: update_memory_decisions(facts, existing)
LLM-->>Memory: actions[ADD/UPDATE/DELETE]
loop For each action
alt ADD
Memory->>VectorStore: insert(vector, metadata)
Memory->>HistoryDB: add_history(memory_id, null, new_memory, "ADD")
else UPDATE
Memory->>VectorStore: update(memory_id, vector, metadata)
Memory->>HistoryDB: add_history(memory_id, old_memory, new_memory, "UPDATE")
else DELETE
Memory->>VectorStore: delete(memory_id)
Memory->>HistoryDB: add_history(memory_id, old_memory, null, "DELETE")
end
opt Graph enabled
Memory->>GraphStore: update_relationships(entities)
end
end
Memory-->>User: memory_results[]
sequenceDiagram
participant User
participant Memory
participant Embeddings
participant VectorStore
participant GraphStore
User->>Memory: search(query, user_id)
Memory->>Embeddings: embed(query)
Embeddings-->>Memory: query_vector
Memory->>VectorStore: search(query_vector, filters)
VectorStore-->>Memory: similar_memories[]
opt Graph enabled
Memory->>GraphStore: search_related_entities(query)
GraphStore-->>Memory: entity_relationships[]
Memory->>Memory: combine_results(memories, relationships)
end
Memory-->>User: search_results[]
~/.mem0/ # Default storage directory
├── history.db # SQLite history database
├── qdrant/ # Qdrant vector store (default)
│ ├── collection/ # Vector embeddings + metadata
│ │ ├── segments/ # Vector data segments
│ │ └── meta.json # Collection metadata
│ └── wal/ # Write-ahead log
├── migrations_qdrant/ # Migration-related vectors
└── graph_store/ # Graph database files (if enabled)
All memory levels are stored in the same physical storage but logically separated using metadata:
graph TD
subgraph Physical["Physical Storage"]
VDB[Vector Database]
HDB[History Database]
GDB[Graph Database]
end
subgraph Logical["Logical Separation"]
U["User Level<br/>user_id: alice"]
S["Session Level<br/>user_id: alice<br/>run_id: session_123"]
A["Agent Level<br/>user_id: alice<br/>agent_id: assistant_v1"]
end
U --> VDB
S --> VDB
A --> VDB
U --> HDB
S --> HDB
A --> HDB
U --> GDB
S --> GDB
A --> GDB
| Level | Scope | Filter Example |
|---|---|---|
| User | Personal preferences, long-term facts | user_id: "alice" |
| Session | Conversation context, temporary state | user_id: "alice", run_id: "chat_123" |
| Agent | AI assistant behaviors, workflows | user_id: "alice", agent_id: "assistant_v1" |
graph LR
subgraph Scoping["Memory Scoping"]
U["User Level<br/>Personal Preferences<br/>Long-term Facts"]
A["Agent Level<br/>AI Behaviors<br/>Workflows"]
S["Session Level<br/>Conversation Context<br/>Temporary State"]
end
U -.-> A
U -.-> S
A -.-> S
subgraph Examples["Examples"]
U1["Likes Italian food<br/>Lives in San Francisco<br/>Prefers morning meetings"]
A1["Use formal tone<br/>Focus on technical details<br/>Remember coding preferences"]
S1["Current project: Web app<br/>Debugging React issue<br/>Working on API integration"]
end
U --- U1
A --- A1
S --- S1
When searching memories, the system can:
- Scope by single level: Only user memories
- Combine multiple levels: User + Agent memories
- Hierarchical search: Session → Agent → User (fallback)
graph TB
subgraph Layers["Configuration Layers"]
DC["Default Config<br/>Built-in defaults"]
EC["Environment Config<br/>Environment variables"]
FC["File Config<br/>YAML/JSON files"]
PC["Programmatic Config<br/>Python objects"]
end
DC --> EC
EC --> FC
FC --> PC
subgraph Components["Config Components"]
LLM["LLM Config<br/>Provider, model, API keys"]
EMB["Embedder Config<br/>Provider, dimensions"]
VEC["Vector Store Config<br/>Provider, connection details"]
GRA["Graph Store Config<br/>Provider, database URL"]
HIS["History DB Config<br/>SQLite file path"]
end
PC --> LLM
PC --> EMB
PC --> VEC
PC --> GRA
PC --> HIS
from mem0 import Memory
from mem0.configs.base import MemoryConfig
config = MemoryConfig(
llm={
"provider": "openai",
"config": {
"model": "gpt-4o",
"temperature": 0.1
}
},
embedder={
"provider": "openai",
"config": {
"model": "text-embedding-3-large"
}
},
vector_store={
"provider": "qdrant",
"config": {
"collection_name": "my_memories",
"path": "./my_vector_db"
}
},
graph_store={
"provider": "neo4j",
"config": {
"url": "bolt://localhost:7687",
"username": "neo4j",
"password": "password"
}
}
)
memory = Memory(config)graph TB
subgraph Interfaces["API Interfaces"]
SYNC["Synchronous API<br/>Memory class<br/>Direct method calls"]
ASYNC["Asynchronous API<br/>AsyncMemory class<br/>Async/await support"]
CLIENT["Client API<br/>MemoryClient class<br/>HTTP REST calls"]
end
subgraph UseCases["Use Cases"]
SYNC_UC["Simple scripts<br/>Jupyter notebooks<br/>Blocking operations"]
ASYNC_UC["High-performance apps<br/>Web servers<br/>Concurrent processing"]
CLIENT_UC["Remote deployments<br/>Microservices<br/>Platform integration"]
end
SYNC --- SYNC_UC
ASYNC --- ASYNC_UC
CLIENT --- CLIENT_UC
subgraph Backend["Backend Services"]
LOCAL["Local Storage<br/>File system"]
PLATFORM["Mem0 Platform<br/>Hosted service"]
end
SYNC --> LOCAL
ASYNC --> LOCAL
CLIENT --> PLATFORM
Synchronous API:
from mem0 import Memory
memory = Memory()
result = memory.add("I love pizza", user_id="alice")
memories = memory.get_all(user_id="alice")Asynchronous API:
from mem0 import AsyncMemory
import asyncio
async def main():
memory = AsyncMemory()
result = await memory.add("I love pizza", user_id="alice")
memories = await memory.get_all(user_id="alice")
asyncio.run(main())Client API:
from mem0 import MemoryClient
client = MemoryClient(api_key="your-api-key")
result = client.add("I love pizza", user_id="alice")
memories = client.get_all(user_id="alice")graph LR
subgraph Local["Local Machine"]
APP["Python Application"]
MEM0["Mem0 Library"]
VDB["Vector DB<br/>Qdrant/Chroma"]
HDB["SQLite History"]
end
APP --> MEM0
MEM0 --> VDB
MEM0 --> HDB
subgraph External["External APIs"]
OPENAI["OpenAI API"]
HF["HuggingFace"]
end
MEM0 -.-> OPENAI
MEM0 -.-> HF
graph TB
subgraph AppTier["Application Tier"]
APP1["App Instance 1"]
APP2["App Instance 2"]
LB["Load Balancer"]
end
subgraph StorageTier["Storage Tier"]
VDB["Vector Database<br/>Pinecone/Weaviate"]
HDB["History Database<br/>PostgreSQL"]
GDB["Graph Database<br/>Neo4j"]
end
subgraph ExtServices["External Services"]
LLM["LLM APIs<br/>OpenAI/Anthropic"]
EMB["Embedding APIs<br/>OpenAI/Cohere"]
end
LB --> APP1
LB --> APP2
APP1 --> VDB
APP1 --> HDB
APP1 --> GDB
APP2 --> VDB
APP2 --> HDB
APP2 --> GDB
APP1 -.-> LLM
APP1 -.-> EMB
APP2 -.-> LLM
APP2 -.-> EMB
graph TB
subgraph YourApp["Your Application"]
APP["Your App"]
CLIENT["MemoryClient"]
end
subgraph Platform["Mem0 Platform"]
API["REST API"]
ENGINE["Memory Engine"]
VDB["Managed Vector DB"]
HDB["Managed History DB"]
GDB["Managed Graph DB"]
end
APP --> CLIENT
CLIENT --> API
API --> ENGINE
ENGINE --> VDB
ENGINE --> HDB
ENGINE --> GDB
| Component | Scaling Strategy | Considerations |
|---|---|---|
| Vector Search | Horizontal sharding, Index optimization | Query latency, Memory usage |
| History Database | Read replicas, Partitioning | Storage growth, Query performance |
| LLM Calls | Caching, Batching, Rate limiting | API costs, Response time |
| Embeddings | Local models, Batch processing | Inference time, Model size |
- Embedding Caching: Cache embeddings to avoid recomputation
- Batch Operations: Process multiple memories simultaneously
- Async Processing: Use AsyncMemory for concurrent operations
- Vector Index Tuning: Optimize HNSW parameters for your use case
- Memory Pruning: Implement retention policies for old memories
graph LR
subgraph Growth["Memory Growth Over Time"]
T1["Week 1<br/>100 memories<br/>~1MB storage"]
T2["Month 1<br/>1K memories<br/>~10MB storage"]
T3["Year 1<br/>50K memories<br/>~500MB storage"]
end
T1 --> T2 --> T3
subgraph Optimization["Optimization Strategies"]
COMPRESS["Vector Compression<br/>Reduce dimensions"]
PRUNE["Memory Pruning<br/>Remove old/irrelevant"]
ARCHIVE["Cold Storage<br/>Archive old data"]
end
T3 -.-> COMPRESS
T3 -.-> PRUNE
T3 -.-> ARCHIVE
Mem0's architecture provides a robust, scalable foundation for building AI applications with persistent memory capabilities. The modular design allows for easy customization and scaling, while the multi-level memory system enables sophisticated context management for different use cases.
Key architectural benefits:
- Modularity: Swap providers without code changes
- Scalability: From local development to enterprise deployment
- Flexibility: Support for multiple memory paradigms
- Auditability: Complete change tracking
- Performance: Optimized for real-time AI applications