grahama1970’s gists

grahama1970 / decorators_arango.py

Last active October 23, 2024 18:21

AranogoDB Integration to LiteLLM rather than Redis

	from types import SimpleNamespace
	import litellm
	from litellm.integrations.custom_logger import CustomLogger
	from litellm import completion, acompletion, token_counter
	import asyncio
	from functools import wraps
	from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
	from litellm import RateLimitError, APIError
	import os
	from dotenv import load_dotenv

grahama1970 / docker-compose.yml

Last active November 18, 2024 14:36

The configuration deploys various models on an A5000 GPU, leveraging SGLang for long-running overnight tasks with low inference speed requirements. Successful configurations include QWEN 32B Int4, QWEN 14B FP8, and Meta Llama 3.1 8B, while QWEN 32B Int4 with TorchAO exceeds memory limits.

	services:
	# WORKS: Loads successfully on an A5000 GPU
	sglang_QWEN_32B_Int4:
	image: lmsysorg/sglang:latest
	container_name: sglang_QWEN_32B_Int4
	volumes:
	- ${HOME}/.cache/huggingface:/root/.cache/huggingface
	restart: always
	ports:
	- "30004:30000" # Adjust port as needed

grahama1970 / pipeline_ex.py

Last active November 20, 2024 02:24

The pipeline dynamically launches a RunPod container for LLM processing, waits for it to reach a "RUNNING" state, executes queries asynchronously using LiteLLM, tracks container activity, and shuts down after a specified inactivity period, optimizing resource usage.

	import asyncio
	import os
	import runpod
	from datetime import datetime, timedelta, timezone
	from dotenv import load_dotenv
	from loguru import logger
	from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type
	from verifaix.llm_client.get_litellm_response import get_litellm_response
	from verifaix.arangodb_helper.arango_client import (
	connect_to_arango_client,

grahama1970 / arango_utils.py

Created November 21, 2024 14:45

	import asyncio
	from loguru import logger
	from verifaix.arangodb_helper.arango_client import connect_to_arango_client

	async def truncate_cache_collection(arango_config, db=None):
	logger.info(f"Attempting to truncate cache collection '{arango_config['cache_collection_name']}'")

	if db is None:
	logger.info(f"Connecting to ArangoDB at {arango_config['host']}")
	db = await asyncio.to_thread(connect_to_arango_client, arango_config)

grahama1970 / arango_utils.py

Created November 21, 2024 14:47

This Python script orchestrates the lifecycle of a RunPod container, executes LLM requests using Qwen2.5-1.5B, caches results in ArangoDB, and ensures clean-up in all scenarios with a robust finally block for stopping the container. It supports scalable, efficient, and reliable LLM pipelines.

	import asyncio
	from loguru import logger
	from verifaix.arangodb_helper.arango_client import connect_to_arango_client

	async def truncate_cache_collection(arango_config, db=None):
	logger.info(f"Attempting to truncate cache collection '{arango_config['cache_collection_name']}'")

	if db is None:
	logger.info(f"Connecting to ArangoDB at {arango_config['host']}")
	db = await asyncio.to_thread(connect_to_arango_client, arango_config)

grahama1970 / get_project_root.py

Last active January 26, 2025 21:39

Analyzes Python project files to generate a structured report of directory trees, dependencies, and imports. Helps LLMs understand project architecture and relationships between files.

	from pathlib import Path
	from dotenv import load_dotenv


	def get_project_root(marker_file=".git"):
	"""
	Find the project root directory by looking for a marker file.

	Args:
	marker_file (str): File/directory to look for (default: ".git")

grahama1970 / full output.txt

Last active December 20, 2024 14:44

Comaparing lorax requests to openai call: curl = 11 seconds, request = 19 seconds, OpenAI: 30 seconds

	2024-12-20 09:39:32.640 \| INFO \| __main__:run_curl_version:10 -
	=== Running curl version ===
	2024-12-20 09:39:32.641 \| INFO \| __main__:run_curl_version:43 - Initial request time: 0.00 seconds
	2024-12-20 09:39:32.641 \| INFO \| __main__:run_curl_version:47 - Response tokens:
	To determine the number of rugby players on a touch rugby team, we can refer to the relevant section of the document.

	1. Understanding Team Composition: The document states that a team consists of a maximum of 14 players. However, this number includes reserves, meaning that only six (6) players are allowed on the field at any given time during a match.

	2. Player Limitation: Additionally, teams are encouraged to include mixed genders (four males and four females), indicating that2024-12-20 09:39:43.862 \| INFO \| __main__:run_curl_version:68 -
	Tokens generated: 100

grahama1970 / tinyllama_custom_adaptor.py

Last active December 21, 2024 22:26

tinyllama_model_merge_wip: well I thought I could make this work....maybe leave for another time

	import os
	import torch
	import gc
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	from peft import PeftModel
	from huggingface_hub import snapshot_download
	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer,
	pipeline,

grahama1970 / hf_only_inference_sanity_check.py.py

Last active December 27, 2024 21:07

For dynamic adaptor loading and inferencing, the Unsloth Inference works fine--using Hugging Face does not work--outputs garbled

	# Doesn't Work. Outputs are garbled
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	from loguru import logger

	# Configuration
	BASE_MODEL_NAME = "unsloth/Phi-3.5-mini-instruct"
	ADAPTER_PATH = "/home/grahama/dev/vllm_lora/training_output/Phi-3.5-mini-instruct_touch-rugby-rules_adapter/final_model"

grahama1970 / aql_utils.py

Last active January 11, 2025 14:59

This script implements a hybrid search system using ArangoDB that combines: 1. Vector similarity search using COSINE_SIMILARITY 2. BM25 text search with custom text analyzer 3. Fuzzy string matching using Levenshtein distance

	import os
	from loguru import logger
	from typing import List, Dict

	def load_aql_query(filename: str) -> str:
	"""
	Load an AQL query from a file.
	"""
	try:
	file_path = os.path.join("app/backend/vllm/beta/utils/aql", filename)

	2024-12-20 09:39:32.640 \| INFO \| __main__:run_curl_version:10 -
	=== Running curl version ===
	2024-12-20 09:39:32.641 \| INFO \| __main__:run_curl_version:43 - Initial request time: 0.00 seconds
	2024-12-20 09:39:32.641 \| INFO \| __main__:run_curl_version:47 - Response tokens:
	To determine the number of rugby players on a touch rugby team, we can refer to the relevant section of the document.

	1. Understanding Team Composition: The document states that a team consists of a maximum of 14 players. However, this number includes reserves, meaning that only six (6) players are allowed on the field at any given time during a match.

	2. Player Limitation: Additionally, teams are encouraged to include mixed genders (four males and four females), indicating that2024-12-20 09:39:43.862 \| INFO \| __main__:run_curl_version:68 -
	Tokens generated: 100

Graham grahama1970