This guide shows you how to run LTX-2 video generation (text-to-video and image-to-video) using vLLM-Omni as the inference backend and ComfyUI as the frontend.
LTX-2 is a powerful video generation model from Lightricks that supports both text-to-video (T2V) and image-to-video (I2V) generation with audio synthesis.
Resources:
- LTX-2 GitHub: https://github.com/Lightricks/LTX-2 - Python stack for inference and LoRA training, model links
- LTX-2.3 on Hugging Face: https://huggingface.co/Lightricks/LTX-2.3 - Latest model checkpoint
- Blog Post: vLLM-Omni with LTX-2 (for reference!)
- Docker or Podman
- NVIDIA GPU with sufficient VRAM (recommended: 24GB+)
- Hugging Face account and token (for downloading models)
- Basic familiarity with terminal/command line
# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThe vLLM-Omni custom node enables ComfyUI to connect to vLLM-Omni inference servers.
# From ComfyUI directory
cd custom_nodes
# Clone vLLM-Omni repository
git clone https://github.com/vllm-project/vllm-omni.git
# Copy the ComfyUI custom node
cp -r vllm-omni/apps/ComfyUI-vLLM-Omni ./
# Clean up (optional)
rm -rf vllm-omni
# Verify installation
ls -la ComfyUI-vLLM-OmniThe custom node requires no additional dependencies beyond what ComfyUI already has installed.
# From ComfyUI root directory
cd ..
python main.py --cpuComfyUI will start on http://127.0.0.1:8188
Note: We use --cpu flag because vLLM will handle GPU inference. This keeps ComfyUI lightweight and prevents GPU memory conflicts.
vLLM-Omni supports both Text-to-Video (T2V) and Image-to-Video (I2V) modes for LTX-2. You'll need to choose which mode at server startup.
# Set your Hugging Face token for model downloads
export HF_TOKEN="your_hf_token_here"
# Create directories for model cache and output
mkdir -p ~/model_cache
mkdir -p ~/video_outputUse this for generating videos from text prompts only.
podman run -d --name ltx2-t2v \
--device nvidia.com/gpu=0 \
--security-opt=label=disable \
--userns=keep-id \
--security-opt label=level:s0 \
-e NVIDIA_VISIBLE_DEVICES=0 \
-e CUDA_VISIBLE_DEVICES=0 \
-e HF_TOKEN="${HF_TOKEN}" \
-e HF_HOME=/hf/hub \
-e HUGGINGFACE_HUB_CACHE=/hf/hub \
-e TRANSFORMERS_CACHE=/hf/hub \
--mount type=tmpfs,target=/workspace/vllm-omni/.triton \
--mount type=tmpfs,target=/workspace/vllm-omni/.cache \
-v ~/model_cache:/hf/hub \
-v ~/video_output:/output \
-p 8000:8000 \
-w /workspace/vllm-omni \
public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:4036cd547c7552fd9329a87f38b9d6c484f3f14b \
vllm serve \
Lightricks/LTX-2 \
--omni \
--port 8000Use this for animating existing images (should support T2V as fallback).
podman run -d --name ltx2-i2v \
--device nvidia.com/gpu=0 \
--security-opt=label=disable \
--userns=keep-id \
--security-opt label=level:s0 \
-e NVIDIA_VISIBLE_DEVICES=0 \
-e CUDA_VISIBLE_DEVICES=0 \
-e HF_TOKEN="${HF_TOKEN}" \
-e HF_HOME=/hf/hub \
-e HUGGINGFACE_HUB_CACHE=/hf/hub \
-e TRANSFORMERS_CACHE=/hf/hub \
--mount type=tmpfs,target=/workspace/vllm-omni/.triton \
--mount type=tmpfs,target=/workspace/vllm-omni/.cache \
-v ~/model_cache:/hf/hub \
-v ~/video_output:/output \
-p 8000:8000 \
-w /workspace/vllm-omni \
public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:4036cd547c7552fd9329a87f38b9d6c484f3f14b \
vllm serve \
Lightricks/LTX-2 \
--omni \
--model-class-name LTX2ImageToVideoPipeline \
--port 8000Key Difference: The I2V mode adds --model-class-name LTX2ImageToVideoPipeline which enables image input support.
# View logs
podman logs -f ltx2-t2v # or ltx2-i2v
# Check API health
curl http://localhost:8000/health
# List available models
curl http://localhost:8000/v1/modelsFirst run will take time as the model downloads (~50GB). The model will be cached in ~/model_cache for subsequent runs.
-
Open ComfyUI at http://127.0.0.1:8188
-
In the Node Library (sidebar), find vLLM-Omni category
-
Add these nodes:
- Generate Video - Main generation node
- Diffusion Sampling Params (optional) - Control quality/speed
- Save Video - Output the result
-
Configure Generate Video node:
url:http://localhost:8000/v1model:Lightricks/LTX-2prompt: Your text description (e.g., "A cat playing with yarn")width: 768height: 512fps: 24num_frames: 121 (5 seconds @ 24fps)
-
Connect nodes:
- Generate Video output → Save Video input
-
Click Queue Prompt to generate
Requirements: vLLM server must be running in I2V mode (LTX2ImageToVideoPipeline)
-
Add these nodes:
- Load Image - Input your image
- Generate Video - Main generation node
- Save Video - Output the result
-
Connect nodes:
- Load Image output → Generate Video
imageinput - Generate Video output → Save Video input
- Load Image output → Generate Video
-
Configure Generate Video:
- Same settings as T2V, but now with image input
- Prompt describes the motion/animation (e.g., "gentle movement, cinematic")
-
Queue and generate
For fine-tuning generation quality:
-
Add Diffusion Sampling Params node
-
Configure parameters:
num_inference_steps: 40-50 (higher = better quality, slower)guidance_scale: 3.0-5.0 (how much to follow the prompt)seed: Set for reproducible results
-
Connect Diffusion Sampling Params → Generate Video
sampling_params
| Parameter | T2V | I2V |
|---|---|---|
| Width | 768 | 768 |
| Height | 512 | 512 |
| FPS | 24 | 24 |
| Num Frames | 121 (5s) | 81-121 (3.4-5s) |
| Guidance Scale | 4.0 | 2.0-3.0 |
| Inference Steps | 40-50 | 40-50 |
Problem: PermissionError: [Errno 13] Permission denied: '/workspace/vllm-omni/.triton'
Solution: The tmpfs mounts fix this - make sure they're included in your podman run command.
Problem: GPU out of memory Solution:
- Reduce
num_frames(fewer frames) - Reduce resolution (512x512 instead of 768x512)
- Enable
vae_use_slicingandvae_use_tilingin sampling params
Problem: Video generation is very slow Solution:
- First generation is always slower (model loading)
- Reduce
num_inference_stepsto 30-40 - Check GPU utilization with
nvidia-smi
Problem: ComfyUI can't connect to vLLM Solution:
- Verify vLLM is running:
podman ps - Check port mapping:
curl http://localhost:8000/health - Ensure URL in ComfyUI node is correct:
http://localhost:8000/v1
- VRAM: ~22-28GB for LTX-2 (depends on resolution/frames)
- RAM: 32GB+ recommended
- Storage: ~50GB for model cache
- Generation Time: 20-60 seconds per video (varies by GPU)
# Stop the vLLM container
podman stop ltx2-t2v # or ltx2-i2v
# Remove the container
podman rm ltx2-t2v # or ltx2-i2v
# Stop ComfyUI (Ctrl+C in the terminal)
# Optional: Clear model cache to free space
rm -rf ~/model_cache/*
# Optional: Clear generated videos
rm -rf ~/video_output/*- vLLM-Omni Documentation: https://docs.vllm.ai/projects/vllm-omni/
- ComfyUI Documentation: https://docs.comfy.org/
- LTX-2 Model Card: https://huggingface.co/Lightricks/LTX-2.3
- LTX-2 GitHub (Training & Scripts): https://github.com/Lightricks/LTX-2
T2V:
- "A cinematic shot of ocean waves at golden hour"
- "Timelapse of flowers blooming in spring"
- "Aerial view flying over a mountain range"
I2V (describe motion, not content):
- "gentle swaying, natural movement"
- "slow zoom in, cinematic"
- "subtle animation, soft lighting changes"
Conference Note: This guide uses the latest vLLM-Omni CI image. For production use, consider using official releases from https://github.com/vllm-project/vllm-omni/releases.