The Similarity Reranker is a Python-based module designed to analyze and rank the similarity between documents. It utilizes advanced techniques such as BERT embeddings, BM25 scoring, and Language Model (LLM) refinement to provide a comprehensive similarity analysis. The module is configurable and can handle large document sets efficiently, making it suitable for various use cases like document retrieval, comparison, and clustering.
- Multi-stage Similarity Analysis: Combines embeddings, BM25, and LLM scoring to provide refined similarity rankings.
- Dimensionality Reduction: Uses random projection to reduce the dimensionality of BERT embeddings, improving computational efficiency.
- LLM-Based Refinement: Ranks document similarity using LLMs, with configurable models and parameters.