ckpt_id | gemm backend | batch_size | fuse | compile | compile_vae | quantization | sparsify | model_memory | inference_memory | time |
---|---|---|---|---|---|---|---|---|---|---|
black-forest-labs/FLUX.1-dev | N/A | 1 | False | False | False | fp8dqrow | False | 20.367 | 31.817 | 9.578 |
black-forest-labs/FLUX.1-dev | triton,aten | 1 | False | True | False | fp8dqrow | False | 20.367 | 31.817 | 4.165 |
black-forest-labs/FLUX.1-dev | cutlass,aten | 16 | False | True | False | fp8dqrow | False | 20.367 | 50.471 | 60.734 |
black-forest-labs/FLUX.1-dev | cutlass_no_fast_accum, aten | 16 | False | True | False | fp8dq | False | 20.356 | 50.46 | 65.387 |
black-forest-labs/FLUX.1-dev | triton_no_fast_accum, aten | 16 | False | True | False | fp8dq | False | 20.356 | 50.46 | 65.346 |
black-forest-labs/FLUX.1-dev | cutlass_fast_accum_2221 | 16 | False | True | True | fp8dq | False | 20.356 | 52.524 | 63.084 |
model_type | compile | fuse_qkv | quantize_vae | quantization | model_memory | inference_memory | time |
---|---|---|---|---|---|---|---|
5B | True | False | False | fp8dqrow | 10.178 | 24.987 | 91.316 |
model_type | compile | fuse_qkv | quantize_vae | quantization | model_memory | inference_memory | time |
---|---|---|---|---|---|---|---|
5B | True | False | False | fp8dqrow | 10.178 | 24.987 | 91.7 |