title | description |
---|---|
Storing Embeddings in A Vector Database | Mastra Docs |
Guide on vector storage options in Mastra, including embedded and dedicated vector databases for similarity search. |
import { Tabs } from "nextra/components";
After generating embeddings, you need to store them in a database that supports vector similarity search. Mastra provides a consistent interface for storing and querying embeddings across different vector databases.
<Tabs items={['Pg Vector', 'Pinecone', 'Qdrant', 'Chroma', 'Astra', 'LibSQL', 'Upstash', 'Cloudflare', 'MongoDB']}> <Tabs.Tab>
import { PgVector } from '@mastra/pg';
const store = new PgVector(process.env.POSTGRES_CONNECTION_STRING)
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
PostgreSQL with the pgvector extension is a good solution for teams already using PostgreSQL who want to minimize infrastructure complexity. For detailed setup instructions and best practices, see the official pgvector repository. </Tabs.Tab> <Tabs.Tab>
import { PineconeVector } from '@mastra/pinecone'
const store = new PineconeVector(process.env.PINECONE_API_KEY)
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab> <Tabs.Tab>
import { QdrantVector } from '@mastra/qdrant'
const store = new QdrantVector({
url: process.env.QDRANT_URL,
apiKey: process.env.QDRANT_API_KEY
})
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab> <Tabs.Tab>
import { ChromaVector } from '@mastra/chroma'
const store = new ChromaVector()
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab> <Tabs.Tab>
import { AstraVector } from '@mastra/astra'
const store = new AstraVector({
token: process.env.ASTRA_DB_TOKEN,
endpoint: process.env.ASTRA_DB_ENDPOINT,
keyspace: process.env.ASTRA_DB_KEYSPACE
})
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab> <Tabs.Tab>
import { LibSQLVector } from "@mastra/core/vector/libsql";
const store = new LibSQLVector({
connectionUrl: process.env.DATABASE_URL,
authToken: process.env.DATABASE_AUTH_TOKEN // Optional: for Turso cloud databases
})
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab> <Tabs.Tab>
import { UpstashVector } from '@mastra/upstash'
const store = new UpstashVector({
url: process.env.UPSTASH_URL,
token: process.env.UPSTASH_TOKEN
})
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab> <Tabs.Tab>
import { CloudflareVector } from '@mastra/vectorize'
const store = new CloudflareVector({
accountId: process.env.CF_ACCOUNT_ID,
apiToken: process.env.CF_API_TOKEN
})
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab> <Tabs.Tab>
import { MongoDBVector } from '@mastra/mongodb'
const store = new MongoDBVector({
url: process.env.MONGODB_URL,
database: process.env.MONGODB_DATABASE
})
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
});
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map(chunk => ({ text: chunk.text })),
});
</Tabs.Tab>
Once initialized, all vector stores share the same interface for creating indexes, upserting embeddings, and querying.
Before storing embeddings, you need to create an index with the appropriate dimension size for your embedding model:
// Create an index with dimension 1536 (for text-embedding-3-small)
await store.createIndex({
indexName: 'myCollection',
dimension: 1536,
});
The dimension size must match the output dimension of your chosen embedding model. Common dimension sizes are:
- OpenAI text-embedding-3-small: 1536 dimensions (or custom, e.g., 256)
- Cohere embed-multilingual-v3: 1024 dimensions
- Google
text-embedding-004
: 768 dimensions (or custom)
Important: Index dimensions cannot be changed after creation. To use a different model, delete and recreate the index with the new dimension size.
Each vector database enforces specific naming conventions for indexes and collections to ensure compatibility and prevent conflicts.
<Tabs items={['Pg Vector', 'Pinecone', 'Qdrant', 'Chroma', 'Astra', 'LibSQL', 'Upstash', 'Cloudflare', 'MongoDB']}>
<Tabs.Tab>
Index names must:
- Start with a letter or underscore
- Contain only letters, numbers, and underscores
- Example: my_index_123
is valid
- Example: my-index
is not valid (contains hyphen)
</Tabs.Tab>
<Tabs.Tab>
Index names must:
- Use only lowercase letters, numbers, and dashes
- Not contain dots (used for DNS routing)
- Not use non-Latin characters or emojis
- Have a combined length (with project ID) under 52 characters
- Example: my-index-123
is valid
- Example: my.index
is not valid (contains dot)
</Tabs.Tab>
<Tabs.Tab>
Collection names must:
- Be 1-255 characters long
- Not contain any of these special characters:
- < > : " / \ | ? *
- Null character (\0
)
- Unit separator (\u{1F}
)
- Example: my_collection_123
is valid
- Example: my/collection
is not valid (contains slash)
</Tabs.Tab>
<Tabs.Tab>
Collection names must:
- Be 3-63 characters long
- Start and end with a letter or number
- Contain only letters, numbers, underscores, or hyphens
- Not contain consecutive periods (..)
- Not be a valid IPv4 address
- Example: my-collection-123
is valid
- Example: my..collection
is not valid (consecutive periods)
</Tabs.Tab>
<Tabs.Tab>
Collection names must:
- Not be empty
- Be 48 characters or less
- Contain only letters, numbers, and underscores
- Example: my_collection_123
is valid
- Example: my-collection
is not valid (contains hyphen)
</Tabs.Tab>
<Tabs.Tab>
Index names must:
- Start with a letter or underscore
- Contain only letters, numbers, and underscores
- Example: my_index_123
is valid
- Example: my-index
is not valid (contains hyphen)
</Tabs.Tab>
<Tabs.Tab>
Namespace names must:
- Be 2-100 characters long
- Contain only:
- Alphanumeric characters (a-z, A-Z, 0-9)
- Underscores, hyphens, dots
- Not start or end with special characters (_, -, .)
- Can be case-sensitive
- Example: MyNamespace123
is valid
- Example: _namespace
is not valid (starts with underscore)
</Tabs.Tab>
<Tabs.Tab>
Index names must:
- Start with a letter
- Be shorter than 32 characters
- Contain only lowercase ASCII letters, numbers, and dashes
- Use dashes instead of spaces
- Example: my-index-123
is valid
- Example: My_Index
is not valid (uppercase and underscore)
</Tabs.Tab>
<Tabs.Tab>
Collection (index) names must:
- Start with a letter or underscore
- Be up to 120 bytes long
- Contain only letters, numbers, underscores, or dots
- Cannot contain $
or the null character
- Example: my_collection.123
is valid
- Example: my-index
is not valid (contains hyphen)
- Example: My$Collection
is not valid (contains $
)
</Tabs.Tab>
After creating an index, you can store embeddings along with their basic metadata:
// Store embeddings with their corresponding metadata
await store.upsert({
indexName: 'myCollection', // index name
vectors: embeddings, // array of embedding vectors
metadata: chunks.map(chunk => ({
text: chunk.text, // The original text content
id: chunk.id // Optional unique identifier
}))
});
The upsert operation:
- Takes an array of embedding vectors and their corresponding metadata
- Updates existing vectors if they share the same ID
- Creates new vectors if they don't exist
- Automatically handles batching for large datasets
For complete examples of upserting embeddings in different vector stores, see the Upsert Embeddings guide.
Vector stores support rich metadata (any JSON-serializable fields) for filtering and organization. Since metadata is stored with no fixed schema, use consistent field naming to avoid unexpected query results.
Important: Metadata is crucial for vector storage - without it, you'd only have numerical embeddings with no way to return the original text or filter results. Always store at least the source text as metadata.
// Store embeddings with rich metadata for better organization and filtering
await store.upsert({
indexName: "myCollection",
vectors: embeddings,
metadata: chunks.map((chunk) => ({
// Basic content
text: chunk.text,
id: chunk.id,
// Document organization
source: chunk.source,
category: chunk.category,
// Temporal metadata
createdAt: new Date().toISOString(),
version: "1.0",
// Custom fields
language: chunk.language,
author: chunk.author,
confidenceScore: chunk.score,
})),
});
Key metadata considerations:
- Be strict with field naming - inconsistencies like 'category' vs 'Category' will affect queries
- Only include fields you plan to filter or sort by - extra fields add overhead
- Add timestamps (e.g., 'createdAt', 'lastUpdated') to track content freshness
- Create indexes before bulk insertions
- Use batch operations for large insertions (the upsert method handles batching automatically)
- Only store metadata you'll query against
- Match embedding dimensions to your model (e.g., 1536 for
text-embedding-3-small
)