Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Saik0s/7ff873abea6dc6f1b0592d50c8f66d5a to your computer and use it in GitHub Desktop.
Save Saik0s/7ff873abea6dc6f1b0592d50c8f66d5a to your computer and use it in GitHub Desktop.
Maximizing ComfyUI Performance on Laptop RTX 4090 GPUs

Practical Guide: Maximizing ComfyUI Performance on Laptop RTX 4090 GPUs

This guide distills community knowledge and proven settings to help you get the fastest, most stable experience with ComfyUI on a laptop RTX 4090 (16GB VRAM).

1. Understand Laptop Limitations

  • VRAM: 16GB is less than desktop 4090's 24GB—optimize for memory usage.
  • Power: Laptops cap GPU power (often 150–175W), so thermal throttling and power management are critical.

2. Essential Software & Dependencies

  • ComfyUI: Use the latest standalone Windows build, update regularly.
  • Python: Use the version bundled with ComfyUI (3.10 or 3.11).
  • PyTorch: Prefer 2.3.1 with CUDA 12.4 for best stability and speed (PyTorch 2.4.x+ can have bugs).
  • xFormers: Disable unless you run out of VRAM; Ada (40-series) GPUs do better with PyTorch SDPA.

3. Launch Flags for Best Performance

Add these flags to your run_nvidia_gpu.bat or ComfyUI settings JSON:

python_embeded\python.exe -s ComfyUI\main.py ^
  --cuda-malloc ^
  --force-channels-last ^
  --opt-sdp-attention ^
  --dont-upcast-attention ^
  --cache-classic ^
  --disable-xformers ^
  --fast ^
  --compile-model
pause
  • For low VRAM: add --lowvram --use-split-cross-attention and reduce batch size to 1.

4. Critical ComfyUI Settings

Edit comfy.settings.json:

{
  "use_fp16": true,
  "collect_gpu": true,
  "max_graph_op_batch_size": 64
}

5. System & NVIDIA Control Panel Tweaks

  • Power Plan: Set Windows to "Best Performance."
  • NVIDIA Control Panel: Set "Preferred graphics processor" to "High-performance NVIDIA processor."
  • CUDA - Sysmem Fallback Policy: Set to "Prefer No Sysmem Fallback" to prevent slow system RAM offloading.
  • Monitor: If possible, connect your monitor to the iGPU to free up VRAM for ComfyUI.

6. Batch Size & Resolution Guidelines

Model Resolution Batch Steps Expected Speed (it/s)
SD 1.5/Turbo 512×512 8–12 20 18–25
SDXL Base 1024×1024 1–2 20 3–6
FLUX Dev FP8 1024×1024 1 20 1.9–2.1
  • Tip: If you run into memory errors, drop batch size first, then lower resolution.

7. Advanced Accelerators (Optional)

  • Torch.compile: Use the CompileModel node for up to 40% speedup.
  • Sage Attention 2: Use with --use-sage-attention for video workflows (PyTorch 2.5+).
  • FP8 Models: Use --fast flag with FP8 checkpoints for up to 40% faster generation (may affect image quality).
  • TeaCache/Nunchaku: Custom nodes for advanced users; can increase speed further.

8. Thermal & Power Management

  • Cap GPU to 75–80% TDP (MSI Afterburner/NVIDIA Control Panel) to prevent thermal throttling.
  • Keep laptop fans on max during long runs.
  • Ensure adequate AC power (use the full-wattage brick, not USB-C PD).
  • Reboot if performance drops suddenly to clear potential driver or scheduling bugs.

9. Troubleshooting Checklist

  • Check power mode: Must be "Best Performance."
  • Check GPU usage: Ensure ComfyUI is using the RTX, not iGPU.
  • Check VRAM usage: Stay below 15GB for stability.
  • Update drivers: Use latest Studio drivers (555.xx or newer).
  • Monitor temps: Keep GPU below 90°C to avoid throttling.

10. Expected Benchmarks

  • SDXL 1024×1024, batch 1: 3–6 it/s.
  • SD 1.5 512×512, batch 8–12: 18–25 it/s.
  • FLUX Dev FP8 1024×1024, batch 1: ~2 it/s.

If you are seeing times of 1+ minute per iteration, check for power, VRAM, or driver issues.

References

  • Community troubleshooting and benchmarks from Reddit, GitHub, and user guides.
  • Real-world user reports confirm these settings and speeds for laptop RTX 4090s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment