This guide distills community knowledge and proven settings to help you get the fastest, most stable experience with ComfyUI on a laptop RTX 4090 (16GB VRAM).
- VRAM: 16GB is less than desktop 4090's 24GB—optimize for memory usage.
- Power: Laptops cap GPU power (often 150–175W), so thermal throttling and power management are critical.
- ComfyUI: Use the latest standalone Windows build, update regularly.
- Python: Use the version bundled with ComfyUI (3.10 or 3.11).
- PyTorch: Prefer 2.3.1 with CUDA 12.4 for best stability and speed (PyTorch 2.4.x+ can have bugs).
- xFormers: Disable unless you run out of VRAM; Ada (40-series) GPUs do better with PyTorch SDPA.
Add these flags to your run_nvidia_gpu.bat
or ComfyUI settings JSON:
python_embeded\python.exe -s ComfyUI\main.py ^
--cuda-malloc ^
--force-channels-last ^
--opt-sdp-attention ^
--dont-upcast-attention ^
--cache-classic ^
--disable-xformers ^
--fast ^
--compile-model
pause
- For low VRAM: add
--lowvram --use-split-cross-attention
and reduce batch size to 1.
Edit comfy.settings.json
:
{
"use_fp16": true,
"collect_gpu": true,
"max_graph_op_batch_size": 64
}
- Power Plan: Set Windows to "Best Performance."
- NVIDIA Control Panel: Set "Preferred graphics processor" to "High-performance NVIDIA processor."
- CUDA - Sysmem Fallback Policy: Set to "Prefer No Sysmem Fallback" to prevent slow system RAM offloading.
- Monitor: If possible, connect your monitor to the iGPU to free up VRAM for ComfyUI.
Model | Resolution | Batch | Steps | Expected Speed (it/s) |
---|---|---|---|---|
SD 1.5/Turbo | 512×512 | 8–12 | 20 | 18–25 |
SDXL Base | 1024×1024 | 1–2 | 20 | 3–6 |
FLUX Dev FP8 | 1024×1024 | 1 | 20 | 1.9–2.1 |
- Tip: If you run into memory errors, drop batch size first, then lower resolution.
- Torch.compile: Use the CompileModel node for up to 40% speedup.
- Sage Attention 2: Use with
--use-sage-attention
for video workflows (PyTorch 2.5+). - FP8 Models: Use
--fast
flag with FP8 checkpoints for up to 40% faster generation (may affect image quality). - TeaCache/Nunchaku: Custom nodes for advanced users; can increase speed further.
- Cap GPU to 75–80% TDP (MSI Afterburner/NVIDIA Control Panel) to prevent thermal throttling.
- Keep laptop fans on max during long runs.
- Ensure adequate AC power (use the full-wattage brick, not USB-C PD).
- Reboot if performance drops suddenly to clear potential driver or scheduling bugs.
- Check power mode: Must be "Best Performance."
- Check GPU usage: Ensure ComfyUI is using the RTX, not iGPU.
- Check VRAM usage: Stay below 15GB for stability.
- Update drivers: Use latest Studio drivers (555.xx or newer).
- Monitor temps: Keep GPU below 90°C to avoid throttling.
- SDXL 1024×1024, batch 1: 3–6 it/s.
- SD 1.5 512×512, batch 8–12: 18–25 it/s.
- FLUX Dev FP8 1024×1024, batch 1: ~2 it/s.
If you are seeing times of 1+ minute per iteration, check for power, VRAM, or driver issues.
- Community troubleshooting and benchmarks from Reddit, GitHub, and user guides.
- Real-world user reports confirm these settings and speeds for laptop RTX 4090s.