Created
August 16, 2024 11:43
-
-
Save sergeliatko/1aed65b8501160b7d117a70cdad68347 to your computer and use it in GitHub Desktop.
SIMANTIKS API - Outline generated from structure.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Semantic Chunking - 3 Methods for Better RAG | |
Preface: Introduction to Semantic Chunkers in RAG | |
Introduction to Semantic Chunkers for Text Modality in Retrieval-Augmented Generation (RAG). | |
Introduction to Three Types of Semantic Chunkers. | |
Introduction to Semantic Chunkers Library and Usage of Chunker’s Intro Notebook in Python via Colab. | |
Prerequisites | |
Prerequisites Installation: Semantic Chunkers and Hugging Face Datasets. | |
Data Testing for Chunking Methods: Impact on Latency and Quality of Results. | |
Data Setup | |
Introduction to Dataset and Structure of AI Archive Papers. | |
Limitation on Text Due to Resource-Intensive Chunker. | |
Requirement of Embedding Model for Semantic Chunking. | |
Use of OpenAI's Text-Embedding-Ada-002 Model and API Key Requirements. | |
1. Statistical Semantic Chunking | |
Introduction to the Statistical Chunking Method and Its Advantages. | |
Explanation of Statistical Chunker Functionality and Similarity Threshold Calculation. | |
Overview of Initial Document Chunking Results and Preliminary Assessment. | |
2. Consecutive Semantic Chunking | |
Recommendation Order for Consecutive Chunking Method. | |
Score Threshold Requirements for Various Text-Embedding Models. | |
User Input and Performance Adjustment for Chunker Threshold. | |
Explanation of Consecutive Chunker Functionality. | |
3. Cumulative Semantic Chunking | |
Cumulative Chunker Method: Step-by-Step Embedding Process and Similarity Comparison. | |
Higher Time and Cost Due to Increased Embeddings Creation. | |
Comparison of Noise Resistance and Performance of Chunkers. | |
Performance Analysis and Threshold Adjustment of the Chunker. | |
Threshold Adjustment for Improved Performance Over Consecutive Chunker. | |
Multi-modal Chunking | |
Introduction to Modalities Handled by Different Chunkers. | |
Statistical Chunker Limitation to Text Modality. | |
Capabilities and Future Demonstration of the Consecutive Chunker for Video Handling. | |
Text-Focused Nature of the Cumulative Chunker. | |
Conclusion and Sign-off for Semantic Chunkers Presentation ! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See more examples here: SIMANTIKS API - Semantic Chunking Examples