| Field | Value |
|---|---|
| Model | google/gemma-4-12B-it — encoder-free multimodal dense 12B (text, image, audio) |
| Early Access Image | quay.io/vllm/rhaiis-early-access:gemma4-unified-qat |
| Build Run | 27027630533 |
| nm-vllm-ent branch | doug/kup-0day-nightly |
Deploy and test nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 — a 550B MoE frontier model (55B active parameters) in NVFP4 quantization — using nm-vllm-ent based on upstream vLLM v0.22.1 on NVIDIA H200 GPUs.
| Field | Value |
|---|
| Field | Value |
|---|---|
| Models | gemma-4-E4B-it-qat-mobile-ct, gemma-4-E2B-it-qat-mobile-ct (QAT), google/gemma-4-12B-it (Unified) |
| Early Access Image | quay.io/vllm/rhaiis-early-access:gemma4-unified-qat |
| Build Run (Nightly V2) | 27027630533 |
| nm-vllm-ent branch | doug/kup-0day-nightly |
This document covers how to build, deploy, and test the deepseek-ai/DeepSeek-V4-Pro model (1.6T total params, 49B active, FP4+FP8 mixed precision) using nm-vllm-ent (based on upstream vLLM v0.20.1rc0) on NVIDIA B200 GPUs.
| Field | Value |
|---|
Poolside Laguna: build, run, and smoke test howto
| Field | Value |
|---|---|
| Model | Poolside Laguna (model card TBD) — poolside.ai |
| Image | quay.io/vllm/rhaiis-early-access:poolside-laguna |
| Field | Value |
|---|---|
| Model | mistralai/Mistral-Small-4-119B-2603 |
| Image | quay.io/vllm/rhaiis-early-access:mistral-4-small |
| Build Run | 24369571413 |
| nm-cicd branch | doug/mistral-4-small |
This guide shows you how to run LTX-2 video generation (text-to-video and image-to-video) using vLLM-Omni as the inference backend and ComfyUI as the frontend.
LTX-2 is a powerful video generation model from Lightricks that supports both text-to-video (T2V) and image-to-video (I2V) generation with audio synthesis.
Resources:
- LTX-2 GitHub: https://github.com/Lightricks/LTX-2 - Python stack for inference and LoRA training, model links
This guide covers running and trying out the Red Hat AI Inference Server to serve Mistral Voxtral-Mini-4B-Realtime-2602 model, powered by vLLM.
You can find the Voxtral Mini model card @ https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602
From the model card:
Voxtral Mini 4B Realtime 2602 is a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms. It supports 13 languages and outperforms existing open-source baselines across a range of tasks, making it ideal for applications like voice assistants and live subtitling.