Rules of Thumb

For Small, Fast Retrieval (≤10M vectors) → Use FAISS or Annoy (in-memory)
For prototyping with serverless persistent search → Use Chroma DB (duckdb+parquet)
For On-Premise Search with Persistence → Use Milvus, Weaviate, or Qdrant
For Fully Managed, Scalable Search (Cloud) → Use Pinecone or Weaviate Cloud Service (WCS)
For Multi-Modal Hybrid Search (keywords + Vectors) → Use Weaviate, Qdrant, Elasticsearch, or MongoDB.
For Large-Scale AI Pipelines → Use Milvus, Weaviate, or a hybrid FAISS+OLTP Database (e.g. Postgres) setup.

Pros and Cons

FAISS

Pros: Extremely fast, efficient for in-memory search, great for research and prototyping.
Cons: No built-in persistence, limited distributed support, not production-ready for large-scale or multi-user.

Annoy

Pros: Simple, lightweight, good for read-heavy workloads, easy to use.
Cons: In-memory, slower build times, limited scalability, no advanced filtering.

Chroma DB with Duckdb + Parquet

There is no separate database server to install, run, or connect to. More powerful than [[#FAISS|option 1]] , but simpler than [[#Milvus|option 3]]. Compared to [[#Hybrid FAISS + OLTP Database]], this is an integrated, serverless architecture. It's a single, self-contained system, where both the vectors and the metadata payloads are managed together within Chroma.

DuckDB acts as an embedded query engine for the metadata, and
Parquet files are used for on-disk storage of everything.
The system is the glue. Chroma’s API handles the entire hybrid search process internally. You submit one query that specifies both the vector search and the metadata filters (where clauses), and Chroma figures out how to execute it. It runs entirely within your Python process, reading and writing to local files.

Pros: Zero setup friction, python-native experience, no network latency, persistent.
Cons: limited scalability

Milvus

Pros: Scalable, supports persistence, distributed, strong community, supports multiple index types.
Cons: Requires setup and resources, more complex to operate than in-memory libraries.

Weaviate

Pros: Hybrid search (text + vector), schema support, RESTful API, cloud and on-premise, multi-modal, easy to use.
Cons: Can be resource-intensive, some advanced features may require cloud version.

Qdrant

Pros: Fast, persistent, easy to deploy, supports filtering and payloads, open-source.
Cons: Fewer integrations than Weaviate, smaller community.

Pinecone

Pros: Fully managed, scalable, easy to use, high availability, no infrastructure management.
Cons: Cloud-only, can be costly at scale, less control over infrastructure.

Elasticsearch

Pros: Mature, supports hybrid search, strong filtering and analytics, large ecosystem.
Cons: Vector search is newer, less efficient for pure vector workloads, can be complex to tune.

Hybrid FAISS + OLTP Database

This is a decoupled, specialized architecture.

The OLTP database (like Postgres) is the "source of truth." It stores all your metadata (IDs, text, prices, user info) and is responsible for data consistency, transactions, and complex, filtered queries.
While FAISS is a highly specialized, in-memory search index library. It only stores the vector embeddings and their corresponding IDs from the database. It does one thing: finds the nearest neighbor IDs for a given query vector, and does it extremely fast.
Your application code is the glue. A typical query involves first hitting FAISS to get a list of promising IDs, then taking those IDs and performing a structured WHERE id IN (...) query against the OLTP database to retrieve the full data and apply any final filters.

Pros: Combines fast vector search with relational data, flexible.
Cons: More complex to set up and maintain, not as seamless as dedicated vector DBs.

AFirooz/Intro.md