Skip to content

Instantly share code, notes, and snippets.

@grahama1970
grahama1970 / decorators_arango.py
Last active October 23, 2024 18:21
AranogoDB Integration to LiteLLM rather than Redis
from types import SimpleNamespace
import litellm
from litellm.integrations.custom_logger import CustomLogger
from litellm import completion, acompletion, token_counter
import asyncio
from functools import wraps
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
from litellm import RateLimitError, APIError
import os
from dotenv import load_dotenv
@grahama1970
grahama1970 / docker-compose.yml
Last active November 18, 2024 14:36
The configuration deploys various models on an A5000 GPU, leveraging SGLang for long-running overnight tasks with low inference speed requirements. Successful configurations include QWEN 32B Int4, QWEN 14B FP8, and Meta Llama 3.1 8B, while QWEN 32B Int4 with TorchAO exceeds memory limits.
services:
# WORKS: Loads successfully on an A5000 GPU
sglang_QWEN_32B_Int4:
image: lmsysorg/sglang:latest
container_name: sglang_QWEN_32B_Int4
volumes:
- ${HOME}/.cache/huggingface:/root/.cache/huggingface
restart: always
ports:
- "30004:30000" # Adjust port as needed
@grahama1970
grahama1970 / pipeline_ex.py
Last active November 20, 2024 02:24
The pipeline dynamically launches a RunPod container for LLM processing, waits for it to reach a "RUNNING" state, executes queries asynchronously using LiteLLM, tracks container activity, and shuts down after a specified inactivity period, optimizing resource usage.
import asyncio
import os
import runpod
from datetime import datetime, timedelta, timezone
from dotenv import load_dotenv
from loguru import logger
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type
from verifaix.llm_client.get_litellm_response import get_litellm_response
from verifaix.arangodb_helper.arango_client import (
connect_to_arango_client,
import asyncio
from loguru import logger
from verifaix.arangodb_helper.arango_client import connect_to_arango_client
async def truncate_cache_collection(arango_config, db=None):
logger.info(f"Attempting to truncate cache collection '{arango_config['cache_collection_name']}'")
if db is None:
logger.info(f"Connecting to ArangoDB at {arango_config['host']}")
db = await asyncio.to_thread(connect_to_arango_client, arango_config)
@grahama1970
grahama1970 / arango_utils.py
Created November 21, 2024 14:47
This Python script orchestrates the lifecycle of a RunPod container, executes LLM requests using Qwen2.5-1.5B, caches results in ArangoDB, and ensures clean-up in all scenarios with a robust finally block for stopping the container. It supports scalable, efficient, and reliable LLM pipelines.
import asyncio
from loguru import logger
from verifaix.arangodb_helper.arango_client import connect_to_arango_client
async def truncate_cache_collection(arango_config, db=None):
logger.info(f"Attempting to truncate cache collection '{arango_config['cache_collection_name']}'")
if db is None:
logger.info(f"Connecting to ArangoDB at {arango_config['host']}")
db = await asyncio.to_thread(connect_to_arango_client, arango_config)
@grahama1970
grahama1970 / get_project_root.py
Last active January 26, 2025 21:39
Analyzes Python project files to generate a structured report of directory trees, dependencies, and imports. Helps LLMs understand project architecture and relationships between files.
from pathlib import Path
from dotenv import load_dotenv
def get_project_root(marker_file=".git"):
"""
Find the project root directory by looking for a marker file.
Args:
marker_file (str): File/directory to look for (default: ".git")
@grahama1970
grahama1970 / full output.txt
Last active December 20, 2024 14:44
Comaparing lorax requests to openai call: curl = 11 seconds, request = 19 seconds, OpenAI: 30 seconds
2024-12-20 09:39:32.640 | INFO | __main__:run_curl_version:10 -
=== Running curl version ===
2024-12-20 09:39:32.641 | INFO | __main__:run_curl_version:43 - Initial request time: 0.00 seconds
2024-12-20 09:39:32.641 | INFO | __main__:run_curl_version:47 - Response tokens:
To determine the number of rugby players on a touch rugby team, we can refer to the relevant section of the document.
1. **Understanding Team Composition**: The document states that a team consists of a maximum of 14 players. However, this number includes reserves, meaning that only six (6) players are allowed on the field at any given time during a match.
2. **Player Limitation**: Additionally, teams are encouraged to include mixed genders (four males and four females), indicating that2024-12-20 09:39:43.862 | INFO | __main__:run_curl_version:68 -
Tokens generated: 100
@grahama1970
grahama1970 / tinyllama_custom_adaptor.py
Last active December 21, 2024 22:26
tinyllama_model_merge_wip: well I thought I could make this work....maybe leave for another time
import os
import torch
import gc
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
from huggingface_hub import snapshot_download
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
pipeline,
@grahama1970
grahama1970 / hf_only_inference_sanity_check.py.py
Last active December 27, 2024 21:07
For dynamic adaptor loading and inferencing, the Unsloth Inference works fine--using Hugging Face does not work--outputs garbled
# Doesn't Work. Outputs are garbled
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from loguru import logger
# Configuration
BASE_MODEL_NAME = "unsloth/Phi-3.5-mini-instruct"
ADAPTER_PATH = "/home/grahama/dev/vllm_lora/training_output/Phi-3.5-mini-instruct_touch-rugby-rules_adapter/final_model"
@grahama1970
grahama1970 / aql_utils.py
Last active January 11, 2025 14:59
This script implements a hybrid search system using ArangoDB that combines: 1. Vector similarity search using COSINE_SIMILARITY 2. BM25 text search with custom text analyzer 3. Fuzzy string matching using Levenshtein distance
import os
from loguru import logger
from typing import List, Dict
def load_aql_query(filename: str) -> str:
"""
Load an AQL query from a file.
"""
try:
file_path = os.path.join("app/backend/vllm/beta/utils/aql", filename)