Practical Guide: Maximizing ComfyUI Performance on Laptop RTX 4090 GPUs

This guide distills community knowledge and proven settings to help you get the fastest, most stable experience with ComfyUI on a laptop RTX 4090 (16GB VRAM).

1. Understand Laptop Limitations

VRAM: 16GB is less than desktop 4090's 24GB—optimize for memory usage.
Power: Laptops cap GPU power (often 150–175W), so thermal throttling and power management are critical.

2. Essential Software & Dependencies

ComfyUI: Use the latest standalone Windows build, update regularly.
Python: Use the version bundled with ComfyUI (3.10 or 3.11).
PyTorch: Prefer 2.3.1 with CUDA 12.4 for best stability and speed (PyTorch 2.4.x+ can have bugs).
xFormers: Disable unless you run out of VRAM; Ada (40-series) GPUs do better with PyTorch SDPA.

3. Launch Flags for Best Performance

Add these flags to your run_nvidia_gpu.bat or ComfyUI settings JSON:

python_embeded\python.exe -s ComfyUI\main.py ^
  --cuda-malloc ^
  --force-channels-last ^
  --opt-sdp-attention ^
  --dont-upcast-attention ^
  --cache-classic ^
  --disable-xformers ^
  --fast ^
  --compile-model
pause

For low VRAM: add --lowvram --use-split-cross-attention and reduce batch size to 1.

4. Critical ComfyUI Settings

Edit comfy.settings.json:

{
  "use_fp16": true,
  "collect_gpu": true,
  "max_graph_op_batch_size": 64
}

5. System & NVIDIA Control Panel Tweaks

Power Plan: Set Windows to "Best Performance."
NVIDIA Control Panel: Set "Preferred graphics processor" to "High-performance NVIDIA processor."
CUDA - Sysmem Fallback Policy: Set to "Prefer No Sysmem Fallback" to prevent slow system RAM offloading.
Monitor: If possible, connect your monitor to the iGPU to free up VRAM for ComfyUI.

6. Batch Size & Resolution Guidelines

Model	Resolution	Batch	Steps	Expected Speed (it/s)
SD 1.5/Turbo	512×512	8–12	20	18–25
SDXL Base	1024×1024	1–2	20	3–6
FLUX Dev FP8	1024×1024	1	20	1.9–2.1

Tip: If you run into memory errors, drop batch size first, then lower resolution.

7. Advanced Accelerators (Optional)

Torch.compile: Use the CompileModel node for up to 40% speedup.
Sage Attention 2: Use with --use-sage-attention for video workflows (PyTorch 2.5+).
FP8 Models: Use --fast flag with FP8 checkpoints for up to 40% faster generation (may affect image quality).
TeaCache/Nunchaku: Custom nodes for advanced users; can increase speed further.

8. Thermal & Power Management

Cap GPU to 75–80% TDP (MSI Afterburner/NVIDIA Control Panel) to prevent thermal throttling.
Keep laptop fans on max during long runs.
Ensure adequate AC power (use the full-wattage brick, not USB-C PD).
Reboot if performance drops suddenly to clear potential driver or scheduling bugs.

9. Troubleshooting Checklist

Check power mode: Must be "Best Performance."
Check GPU usage: Ensure ComfyUI is using the RTX, not iGPU.
Check VRAM usage: Stay below 15GB for stability.
Update drivers: Use latest Studio drivers (555.xx or newer).
Monitor temps: Keep GPU below 90°C to avoid throttling.

10. Expected Benchmarks

SDXL 1024×1024, batch 1: 3–6 it/s.
SD 1.5 512×512, batch 8–12: 18–25 it/s.
FLUX Dev FP8 1024×1024, batch 1: ~2 it/s.

If you are seeing times of 1+ minute per iteration, check for power, VRAM, or driver issues.

References

Community troubleshooting and benchmarks from Reddit, GitHub, and user guides.
Real-world user reports confirm these settings and speeds for laptop RTX 4090s.

Saik0s/performance_optimization_comfyui_4090.md