Skip to content

Instantly share code, notes, and snippets.

View dougbtv's full-sized avatar

Doug Smith dougbtv

View GitHub Profile
@dougbtv
dougbtv / gemma4-unified-howto.md
Created June 8, 2026 12:48
Gemma 4 Unified (12B): Build, Run, and Smoke Test Guide (nm-vllm-ent)

Gemma 4 Unified (12B): Build, Run, and Smoke Test Guide (nm-vllm-ent)

Build Info

Field Value
Model google/gemma-4-12B-it — encoder-free multimodal dense 12B (text, image, audio)
Early Access Image quay.io/vllm/rhaiis-early-access:gemma4-unified-qat
Build Run 27027630533
nm-vllm-ent branch doug/kup-0day-nightly
@dougbtv
dougbtv / nemotron-3-ultra-usage.md
Last active June 4, 2026 18:13
Nemotron 3 Ultra 550B: Build, Run, and Smoke Test Guide (nm-vllm-ent v0.22.1)

Nemotron 3 Ultra 550B: Build, Run, and Smoke Test Guide (nm-vllm-ent v0.22.1)

Overview

Deploy and test nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 — a 550B MoE frontier model (55B active parameters) in NVFP4 quantization — using nm-vllm-ent based on upstream vLLM v0.22.1 on NVIDIA H200 GPUs.

Build Information

Field Value
@dougbtv
dougbtv / gemma4-qat-howto.md
Last active June 5, 2026 18:01
GemmaQAT + Gemma 4 Unified: Build, Run, and Smoke Test Guide (nm-vllm-ent)

GemmaQAT + Gemma 4 Unified: Build, Run, and Smoke Test Guide (nm-vllm-ent)

Build Info

Field Value
Models gemma-4-E4B-it-qat-mobile-ct, gemma-4-E2B-it-qat-mobile-ct (QAT), google/gemma-4-12B-it (Unified)
Early Access Image quay.io/vllm/rhaiis-early-access:gemma4-unified-qat
Build Run (Nightly V2) 27027630533
nm-vllm-ent branch doug/kup-0day-nightly
@dougbtv
dougbtv / deepseek-v4-pro-guide.md
Created April 28, 2026 22:01
DeepSeek V4 Pro: Build, Run, and Smoke Test Guide (nm-vllm-ent v0.20.1rc0, 8x B200)

DeepSeek V4 Pro: Build, Run, and Smoke Test Guide

Overview

This document covers how to build, deploy, and test the deepseek-ai/DeepSeek-V4-Pro model (1.6T total params, 49B active, FP4+FP8 mixed precision) using nm-vllm-ent (based on upstream vLLM v0.20.1rc0) on NVIDIA B200 GPUs.

Build Information

Field Value
@dougbtv
dougbtv / laguna-howto.md
Last active April 28, 2026 12:30
Poolside Laguna: build, run, and smoke test howto

Poolside Laguna: build, run, and smoke test howto

Poolside Laguna Howto

Build Info

Field Value
Model Poolside Laguna (model card TBD) — poolside.ai
Image quay.io/vllm/rhaiis-early-access:poolside-laguna
@dougbtv
dougbtv / ipv6-upgrade-path.md
Created April 22, 2026 11:45
Home network IPv6 upgrade path — GMAVT/AS12282, ASUS ZenWifi AX, Fedora desktop

Home Network IPv6 Upgrade Path

Current Setup (as of 2026-04-22)

  • Desktop: Fedora 42 (Linux 6.17.11), eno1 on 192.168.50.198/24
  • Router: ASUS ZenWifi AX (192.168.50.1)
  • ISP: GMAVT / Green Mountain Access (AS12282), PPPoE connection
  • Public IP: 69.54.3.214 (pppoe-3.214.gmavt.net)
  • Location: Starksboro, VT
@dougbtv
dougbtv / gemma4-31b-usage.md
Created April 17, 2026 23:25
Gemma 4 31B: Build, Run, and Smoke Test Guide (nm-vllm-ent v0.19.1)

Gemma 4 31B: Build, Run, and Smoke Test Guide

Overview

This document covers how to build, deploy, and test the google/gemma-4-31B-it model using nm-vllm-ent (based on upstream vLLM v0.19.1) on NVIDIA A100 GPUs.

Build Information

Field Value
@dougbtv
dougbtv / MISTRAL_4_SMALL_HOWTO.md
Created April 13, 2026 23:52
Mistral-Small-4-119B: build, run, and smoke test howto
@dougbtv
dougbtv / OMNI_WITH_LTX2.md
Last active March 18, 2026 18:32
Running LTX-2 Video Generation with vLLM-Omni and ComfyUI - Conference Guide

Running LTX-2 Video Generation with vLLM-Omni and ComfyUI

This guide shows you how to run LTX-2 video generation (text-to-video and image-to-video) using vLLM-Omni as the inference backend and ComfyUI as the frontend.

Background

LTX-2 is a powerful video generation model from Lightricks that supports both text-to-video (T2V) and image-to-video (I2V) generation with audio synthesis.

Resources:

@dougbtv
dougbtv / README.md
Last active February 5, 2026 16:31
voxtral realtime in vLLM cheat sheet

RHAII Preview: Voxtral Realtime

This guide covers running and trying out the Red Hat AI Inference Server to serve Mistral Voxtral-Mini-4B-Realtime-2602 model, powered by vLLM.

You can find the Voxtral Mini model card @ https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602

From the model card:

Voxtral Mini 4B Realtime 2602 is a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms. It supports 13 languages and outperforms existing open-source baselines across a range of tasks, making it ideal for applications like voice assistants and live subtitling.