Skip to content

Instantly share code, notes, and snippets.

@grahama1970
grahama1970 / docker-compose.yml
Last active November 18, 2024 14:36
The configuration deploys various models on an A5000 GPU, leveraging SGLang for long-running overnight tasks with low inference speed requirements. Successful configurations include QWEN 32B Int4, QWEN 14B FP8, and Meta Llama 3.1 8B, while QWEN 32B Int4 with TorchAO exceeds memory limits.
services:
# WORKS: Loads successfully on an A5000 GPU
sglang_QWEN_32B_Int4:
image: lmsysorg/sglang:latest
container_name: sglang_QWEN_32B_Int4
volumes:
- ${HOME}/.cache/huggingface:/root/.cache/huggingface
restart: always
ports:
- "30004:30000" # Adjust port as needed
@grahama1970
grahama1970 / decorators_arango.py
Last active October 23, 2024 18:21
AranogoDB Integration to LiteLLM rather than Redis
from types import SimpleNamespace
import litellm
from litellm.integrations.custom_logger import CustomLogger
from litellm import completion, acompletion, token_counter
import asyncio
from functools import wraps
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
from litellm import RateLimitError, APIError
import os
from dotenv import load_dotenv
@grahama1970
grahama1970 / decorators_aioredis_class.py
Created October 23, 2024 15:00
Added a Different Redis implementation to LiteLLM
from types import SimpleNamespace
import litellm
from litellm.integrations.custom_logger import CustomLogger
from litellm import completion, acompletion
import asyncio
from functools import wraps
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
from litellm import RateLimitError, APIError
import os
from dotenv import load_dotenv
import litellm
from litellm.integrations.custom_logger import CustomLogger
from litellm import completion, acompletion, Cache
import asyncio
from functools import wraps
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
from litellm import RateLimitError, APIError, ModelResponse
import os
from dotenv import load_dotenv
from loguru import logger
@grahama1970
grahama1970 / README.md
Last active October 12, 2024 20:36
similarity_ranker

Similarity Reranker

The Similarity Reranker is a Python-based module designed to analyze and rank the similarity between documents. It utilizes advanced techniques such as BERT embeddings, BM25 scoring, and Language Model (LLM) refinement to provide a comprehensive similarity analysis. The module is configurable and can handle large document sets efficiently, making it suitable for various use cases like document retrieval, comparison, and clustering.

Key Features

  • Multi-stage Similarity Analysis: Combines embeddings, BM25, and LLM scoring to provide refined similarity rankings.
  • Dimensionality Reduction: Uses random projection to reduce the dimensionality of BERT embeddings, improving computational efficiency.
  • LLM-Based Refinement: Ranks document similarity using LLMs, with configurable models and parameters.
@grahama1970
grahama1970 / cleaning_utils.py
Created October 12, 2024 17:58
text_normalizer
import regex as re
from typing import Dict, Optional
import unicodedata
import html
from dateutil.parser import parse as date_parser
from better_profanity import profanity
from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
import warnings
import emoji
@grahama1970
grahama1970 / arango_client.py
Created October 5, 2024 22:03
test for 2 files
import asyncio
import datetime
import os
import sys
import json
from typing import Optional, Dict, List, Any
from concurrent.futures import ThreadPoolExecutor
from arango import ArangoClient, CollectionCreateError
from arango.exceptions import ArangoError
from loguru import logger
@grahama1970
grahama1970 / arango_db_helper.py
Last active October 12, 2024 17:53
The ArangoDBHelper class provides comprehensive management for an ArangoDB instance, handling initialization, connection, schema retrieval, and collection management. It integrates LLM-based metadata generation, ensuring structured data for collections. The class supports asynchronous database initialization, embedding storage, AQL query execut…
import importlib
import os
import json
import asyncio
import sys
from arango import ArangoClient
from arango.exceptions import ArangoError, CollectionCreateError
import datetime
import logging
from typing import List, Dict, Optional, Any, Union
import os
import requests
import asyncio
import regex as re
from requests.exceptions import RequestException
import json
from bs4 import BeautifulSoup
from pandas import read_html
from uuid import uuid4
import pandas as pd
@grahama1970
grahama1970 / create_pydantic_model_from_schema.py
Created August 9, 2024 23:08
create_pydantic_model_from_schema for dynamic openai structured response
from pydantic import BaseModel, create_model, ValidationError
from typing import Dict, Type, Any, List, Union
import json
from beta.llm_client.helpers.json_cleaner import clean_json_string
def infer_type(value: Any) -> Type:
"""
Infers the type of a given value or type string.