Flux: https://blackforestlabs.ai/announcing-black-forest-labs/
- Run Flux with quantization by AmericanPresidentJimmyCarter.
- Run Flux on a 24GB 4090 by decoupling the different stages of the pipeline
- Running with
torchao
- Running with NF4
The first resource even allows you to run the pipeline under 16GBs of GPU VRAM.
Additionally, you can use the various memory optimization tricks that are discussed in the following resources:
- 🧨 Diffusers welcomes Stable Diffusion 3
- Memory-efficient Diffusion Transformers with Quanto and Diffusers
Enjoy 🤗
Testing with 16GB of graphics memory, the process of generating images requires a significant amount of time for reloading the model each time. Could you specify how much graphics memory would be needed for the model(flux-dev-fp8 and t5xxl-fp8) to remain permanently in the gpu memory?