Running LTX-2 Video Generation with vLLM-Omni and ComfyUI

This guide shows you how to run LTX-2 video generation (text-to-video and image-to-video) using vLLM-Omni as the inference backend and ComfyUI as the frontend.

Background

LTX-2 is a powerful video generation model from Lightricks that supports both text-to-video (T2V) and image-to-video (I2V) generation with audio synthesis.

Resources:

LTX-2 GitHub: https://github.com/Lightricks/LTX-2 - Python stack for inference and LoRA training, model links
LTX-2.3 on Hugging Face: https://huggingface.co/Lightricks/LTX-2.3 - Latest model checkpoint
Blog Post: vLLM-Omni with LTX-2 (for reference!)

Prerequisites

Docker or Podman
NVIDIA GPU with sufficient VRAM (recommended: 24GB+)
Hugging Face account and token (for downloading models)
Basic familiarity with terminal/command line

Part 1: Setting Up ComfyUI

1.1 Clone and Install ComfyUI

# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

1.2 Install vLLM-Omni Custom Node

The vLLM-Omni custom node enables ComfyUI to connect to vLLM-Omni inference servers.

# From ComfyUI directory
cd custom_nodes

# Clone vLLM-Omni repository
git clone https://github.com/vllm-project/vllm-omni.git

# Copy the ComfyUI custom node
cp -r vllm-omni/apps/ComfyUI-vLLM-Omni ./

# Clean up (optional)
rm -rf vllm-omni

# Verify installation
ls -la ComfyUI-vLLM-Omni

The custom node requires no additional dependencies beyond what ComfyUI already has installed.

1.3 Start ComfyUI

# From ComfyUI root directory
cd ..
python main.py --cpu

ComfyUI will start on http://127.0.0.1:8188

Note: We use --cpu flag because vLLM will handle GPU inference. This keeps ComfyUI lightweight and prevents GPU memory conflicts.

Part 2: Running vLLM-Omni with LTX-2

vLLM-Omni supports both Text-to-Video (T2V) and Image-to-Video (I2V) modes for LTX-2. You'll need to choose which mode at server startup.

2.1 Set Up Environment Variables

# Set your Hugging Face token for model downloads
export HF_TOKEN="your_hf_token_here"

# Create directories for model cache and output
mkdir -p ~/model_cache
mkdir -p ~/video_output

2.2 Option A: Text-to-Video (T2V) Mode

Use this for generating videos from text prompts only.

podman run -d --name ltx2-t2v \
  --device nvidia.com/gpu=0 \
  --security-opt=label=disable \
  --userns=keep-id \
  --security-opt label=level:s0 \
  -e NVIDIA_VISIBLE_DEVICES=0 \
  -e CUDA_VISIBLE_DEVICES=0 \
  -e HF_TOKEN="${HF_TOKEN}" \
  -e HF_HOME=/hf/hub \
  -e HUGGINGFACE_HUB_CACHE=/hf/hub \
  -e TRANSFORMERS_CACHE=/hf/hub \
  --mount type=tmpfs,target=/workspace/vllm-omni/.triton \
  --mount type=tmpfs,target=/workspace/vllm-omni/.cache \
  -v ~/model_cache:/hf/hub \
  -v ~/video_output:/output \
  -p 8000:8000 \
  -w /workspace/vllm-omni \
  public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:4036cd547c7552fd9329a87f38b9d6c484f3f14b \
  vllm serve \
    Lightricks/LTX-2 \
    --omni \
    --port 8000

2.3 Option B: Image-to-Video (I2V) Mode

Use this for animating existing images (should support T2V as fallback).

podman run -d --name ltx2-i2v \
  --device nvidia.com/gpu=0 \
  --security-opt=label=disable \
  --userns=keep-id \
  --security-opt label=level:s0 \
  -e NVIDIA_VISIBLE_DEVICES=0 \
  -e CUDA_VISIBLE_DEVICES=0 \
  -e HF_TOKEN="${HF_TOKEN}" \
  -e HF_HOME=/hf/hub \
  -e HUGGINGFACE_HUB_CACHE=/hf/hub \
  -e TRANSFORMERS_CACHE=/hf/hub \
  --mount type=tmpfs,target=/workspace/vllm-omni/.triton \
  --mount type=tmpfs,target=/workspace/vllm-omni/.cache \
  -v ~/model_cache:/hf/hub \
  -v ~/video_output:/output \
  -p 8000:8000 \
  -w /workspace/vllm-omni \
  public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:4036cd547c7552fd9329a87f38b9d6c484f3f14b \
  vllm serve \
    Lightricks/LTX-2 \
    --omni \
    --model-class-name LTX2ImageToVideoPipeline \
    --port 8000

Key Difference: The I2V mode adds --model-class-name LTX2ImageToVideoPipeline which enables image input support.

2.4 Check Server Status

# View logs
podman logs -f ltx2-t2v  # or ltx2-i2v

# Check API health
curl http://localhost:8000/health

# List available models
curl http://localhost:8000/v1/models

First run will take time as the model downloads (~50GB). The model will be cached in ~/model_cache for subsequent runs.

Part 3: Using ComfyUI with vLLM-Omni

3.1 Create a Text-to-Video Workflow

Open ComfyUI at http://127.0.0.1:8188
In the Node Library (sidebar), find vLLM-Omni category
Add these nodes:
- Generate Video - Main generation node
- Diffusion Sampling Params (optional) - Control quality/speed
- Save Video - Output the result
Configure Generate Video node:
- url: http://localhost:8000/v1
- model: Lightricks/LTX-2
- prompt: Your text description (e.g., "A cat playing with yarn")
- width: 768
- height: 512
- fps: 24
- num_frames: 121 (5 seconds @ 24fps)
Connect nodes:
- Generate Video output → Save Video input
Click Queue Prompt to generate

3.2 Create an Image-to-Video Workflow

Requirements: vLLM server must be running in I2V mode (LTX2ImageToVideoPipeline)

Add these nodes:
- Load Image - Input your image
- Generate Video - Main generation node
- Save Video - Output the result
Connect nodes:
- Load Image output → Generate Video image input
- Generate Video output → Save Video input
Configure Generate Video:
- Same settings as T2V, but now with image input
- Prompt describes the motion/animation (e.g., "gentle movement, cinematic")
Queue and generate

3.3 Advanced: Using Sampling Parameters

For fine-tuning generation quality:

Add Diffusion Sampling Params node
Configure parameters:
- num_inference_steps: 40-50 (higher = better quality, slower)
- guidance_scale: 3.0-5.0 (how much to follow the prompt)
- seed: Set for reproducible results
Connect Diffusion Sampling Params → Generate Video sampling_params

Part 4: Tips and Troubleshooting

Recommended Settings for LTX-2

Parameter	T2V	I2V
Width	768	768
Height	512	512
FPS	24	24
Num Frames	121 (5s)	81-121 (3.4-5s)
Guidance Scale	4.0	2.0-3.0
Inference Steps	40-50	40-50

Common Issues

Problem: PermissionError: [Errno 13] Permission denied: '/workspace/vllm-omni/.triton' Solution: The tmpfs mounts fix this - make sure they're included in your podman run command.

Problem: GPU out of memory Solution:

Reduce num_frames (fewer frames)
Reduce resolution (512x512 instead of 768x512)
Enable vae_use_slicing and vae_use_tiling in sampling params

Problem: Video generation is very slow Solution:

First generation is always slower (model loading)
Reduce num_inference_steps to 30-40
Check GPU utilization with nvidia-smi

Problem: ComfyUI can't connect to vLLM Solution:

Verify vLLM is running: podman ps
Check port mapping: curl http://localhost:8000/health
Ensure URL in ComfyUI node is correct: http://localhost:8000/v1

Resource Requirements

VRAM: ~22-28GB for LTX-2 (depends on resolution/frames)
RAM: 32GB+ recommended
Storage: ~50GB for model cache
Generation Time: 20-60 seconds per video (varies by GPU)

Part 5: Stopping and Cleanup

# Stop the vLLM container
podman stop ltx2-t2v  # or ltx2-i2v

# Remove the container
podman rm ltx2-t2v  # or ltx2-i2v

# Stop ComfyUI (Ctrl+C in the terminal)

# Optional: Clear model cache to free space
rm -rf ~/model_cache/*

# Optional: Clear generated videos
rm -rf ~/video_output/*

Additional Resources

vLLM-Omni Documentation: https://docs.vllm.ai/projects/vllm-omni/
ComfyUI Documentation: https://docs.comfy.org/
LTX-2 Model Card: https://huggingface.co/Lightricks/LTX-2.3
LTX-2 GitHub (Training & Scripts): https://github.com/Lightricks/LTX-2

Example Prompts

T2V:

"A cinematic shot of ocean waves at golden hour"
"Timelapse of flowers blooming in spring"
"Aerial view flying over a mountain range"

I2V (describe motion, not content):

"gentle swaying, natural movement"
"slow zoom in, cinematic"
"subtle animation, soft lighting changes"

Conference Note: This guide uses the latest vLLM-Omni CI image. For production use, consider using official releases from https://github.com/vllm-project/vllm-omni/releases.

dougbtv/OMNI_WITH_LTX2.md

Select an option

No results found