Skip to content

Instantly share code, notes, and snippets.

@HDCharles
HDCharles / gpucheck.sh
Last active May 14, 2026 15:20
script for seeing what GPUs are available over time for RedHatAI/NeuralMagic, instructions below
#!/bin/bash
# === Configuration ===
SSH_CONFIG="$HOME/.ssh/config"
STATE_FILE="$HOME/.gpucheck_state.json"
LAST_SYNC_FILE="$HOME/.gpucheck_last_sync"
INVENTORY_REPO="neuralmagic/stratus"
INVENTORY_PATH="infra-ansible/webapp-inventory/public/data/inventory.json"
POLL_INTERVAL=300 # seconds between checks (5 minutes)
# Three EMA alpha values for different time horizons
@HDCharles
HDCharles / Results.md
Created May 8, 2026 04:23
Obs Refactor Eval Results

NVFP4 Evals on B200

model                          scheme     technique    task                         main             PR       change
--------------------------------------------------------------------------------------------------------------------
Meta-Llama-3-8B-Instruct       NVFP4      awq_rtn      gsm8k_platinum               71.46%           69.89%       -2.20%
Qwen2.5-3B-Instruct            NVFP4      awq_rtn      gsm8k_platinum               23.33%           29.61%      +26.92% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B                  NVFP4      awq_rtn      gsm8k_platinum               92.31%           91.32%       -1.07%

Meta-Llama-3-8B-Instruct       NVFP4      awq_rtn      mmlu                       63.55%**           63.46%       -0.14%
Qwen2.5-3B-Instruct            NVFP4      awq_rtn      mmlu                       63.12%**         63.34%**       +0.35%
@HDCharles
HDCharles / parallel_regression_test.sh
Last active May 5, 2026 17:17
Updated parallel regression test with chg GPU detection and FP8 support
#!/bin/bash
# Parallel Regression Test Script
# 1. Quantizes remaining models (if needed)
# 2. Runs evaluations in parallel (4 at a time)
# 3. Saves individual logs for each eval job
# 4. Prints summaries as jobs complete
set -o pipefail
# ── Configuration ────────────────────────────────────────────────────────────
@HDCharles
HDCharles / MODEL_TRANSFER_README.md
Last active May 5, 2026 04:21
LLM Compressor NVFP4 Regression Testing Suite

Model Transfer Workflow

Scripts to upload quantized models to HuggingFace (RedHatAI), download them elsewhere, and clean up.

Prerequisites

pip install huggingface_hub[cli]
huggingface-cli login  # Login with token that has write access to RedHatAI org
@HDCharles
HDCharles / gpucheck.sh
Created April 30, 2026 15:55
always know who is using the allocations
#!/bin/bash
# === Configuration ===
SSH_CONFIG="$HOME/.ssh/config"
STATE_FILE="$HOME/.gpucheck_state.json"
LAST_SYNC_FILE="$HOME/.gpucheck_last_sync"
HOSTS_URL="https://vigilant-happiness-o78jgy9.pages.github.io/hosts"
SSH_USER="HDCharles"
POLL_INTERVAL=300 # seconds between checks (5 minutes)
# Three EMA alpha values for different time horizons
@HDCharles
HDCharles / quantize.py
Created April 30, 2026 14:58
NVFP4 regression test suite: quantize.py and run_all_tests.sh
import argparse
import os
import time
import torch
from compressed_tensors.offload import dispatch_model
from compressed_tensors.quantization import preset_name_to_scheme
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
@HDCharles
HDCharles / quantize.py
Created April 28, 2026 20:02
LLM Compressor regression testing scripts
import argparse
import time
import torch
from compressed_tensors.offload import dispatch_model
from compressed_tensors.quantization import preset_name_to_scheme
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor import oneshot
@HDCharles
HDCharles / extract_log_summary.py
Last active April 25, 2026 02:59
LLM Compressor Testing Setup - GPTQ actorder regression tests
#!/usr/bin/env python3
"""Extract summary data from AWQ DDP regression test log files.
Parses the log output from run_all_tests.sh and produces a comparison table
showing pre-DDP vs post-DDP results across models, schemes, and benchmarks.
Usage:
python extract_log_summary.py regression_results.log
"""
@HDCharles
HDCharles / bench_logprobs.py
Created April 23, 2026 20:49
vLLM full-vocab logprob timing benchmark (Llama-3-8B)
import time, os
model_path = "/mnt/data/engine/HDCharles/hf_hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/8afb486c1db24fe5011ec46dfbe5b5dccdb575c2"
from vllm import LLM, SamplingParams
print("Loading model...")
t0 = time.time()
llm = LLM(model=model_path, gpu_memory_utilization=0.5, max_logprobs=-1)
tokenizer = llm.get_tokenizer()
@HDCharles
HDCharles / extract_log_summary.py
Created April 22, 2026 20:28
GPTQ actorder regression test suite for llm-compressor (FP8 block, W4A16, W8A16)
#!/usr/bin/env python3
"""Extract summary data from AWQ DDP regression test log files.
Parses the log output from run_all_tests.sh and produces a comparison table
showing pre-DDP vs post-DDP results across models, schemes, and benchmarks.
Usage:
python extract_log_summary.py regression_results.log
"""