Flux: https://blackforestlabs.ai/announcing-black-forest-labs/
- Run Flux with quantization by AmericanPresidentJimmyCarter.
- Run Flux on a 24GB 4090 by decoupling the different stages of the pipeline
- Running with
torchao
- Running with NF4
The first resource even allows you to run the pipeline under 16GBs of GPU VRAM.
Additionally, you can use the various memory optimization tricks that are discussed in the following resources:
- 🧨 Diffusers welcomes Stable Diffusion 3
- Memory-efficient Diffusion Transformers with Quanto and Diffusers
Enjoy 🤗
Any tips on how to speed up inference time?