Last active
July 28, 2025 07:09
-
-
Save VenkataSakethDakuri/c876823a49590c62db11a4f065661ad7 to your computer and use it in GitHub Desktop.
RAG
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Table Stakes | |
| Better Parsers | |
| Chunk Sizes | |
| Hybrid Search | |
| Metadata Filters | |
| Advanced Retrieval | |
| Reranking | |
| Recursive Retrieval | |
| Embedded Tables | |
| Small-to-big Retrieval | |
| Fine-tuning | |
| Embedding fine-tuning | |
| LLM fine-tuning | |
| Agentic Behavior | |
| Routing | |
| Query Planning | |
| Multi-document Agents | |
| Less Expressive, Easier to Implement, Lower Latency/Cost ⟶ More Expressive, Harder to Implement, Higher Latency/Cost | |
| BM25 + Vector Search: Combines traditional keyword-based retrieval (BM25) with embedding similarity search to capture both semantic meaning and exact keyword matches. | |
| Minimum Similarity Threshold: Instead of using a fixed top-k, you can filter retrieved nodes based on a minimum similarity score. | |
| Reranking is a specialized component in information retrieval systems that performs a crucial second-stage evaluation of search results after an initial set of potentially relevant items is retrieved. The process involves reassessing and reordering these results to push the most relevant items to the top of the list. | |
| At its core, reranking implements a two-stage retrieval process: | |
| Initial retrieval: A fast, scalable method (like embedding-based similarity search or BM25) retrieves an initial set of candidate documents. | |
| The chunk overlap ensures that contextual information at the boundaries between chunks is preserved, which is essential for maintaining semantic coherence and improving retrieval accuracy. | |
| Smaller Chunks (e.g., 256 tokens or less) | |
| Higher precision: Smaller chunks contain more focused information, leading to more precise retrieval | |
| Reduced noise: Less irrelevant information is included with the relevant content | |
| Better for specific queries: When users ask highly specific questions, smaller chunks can pinpoint exact answers | |
| Improved semantic focus: Each chunk tends to cover a single concept or topic | |
| Larger Chunks (e.g., 512-1024 tokens or more) | |
| Better recall: More likely to capture all relevant information about a topic | |
| Preserved context: Maintains more surrounding context that might be important for understanding | |
| Reduced fragmentation: Related concepts are less likely to be split across multiple chunks | |
| Better for complex queries: When questions require synthesizing multiple pieces of information | |
| Retrieval accuracy improves when using a database like ChromaDB instead of in memory. | |
| Advanced Techniques: | |
| When storing the chunks generate a hypothetical question that this chunk can possibly answer and store this as well. | |
| Can use relational or graph based database depending on usecase. | |
| Retrieval-Augmented Generation (RAG) | |
| Dynamic Information Retrieval: RAG retrieves relevant external data from knowledge bases or databases in real-time to augment the model's responses, ensuring accuracy and contextual relevance. | |
| Reduces Hallucinations: By grounding responses in authoritative sources, RAG minimizes the chances of generating incorrect or fabricated information. | |
| Cost-Effective Updates: RAG eliminates the need for frequent retraining by dynamically incorporating updated external data into the response generation process. | |
| Cache-Augmented Generation (CAG) | |
| Preloaded Knowledge: CAG preloads all required information into the model's extended context window, avoiding real-time retrieval altogether. | |
| Low Latency: Eliminates retrieval delays by using cached knowledge, enabling faster and more consistent responses. | |
| Simplified Architecture: CAG avoids complex retrieval mechanisms, making it ideal for scenarios with stable datasets that fit within the model's memory. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment