Here's a simple script intended to aid tuning hyperparameters for quantization processes. Provided two or more safetensors, the script compares the quantization applied to each layer and attempts to extrapolate the sensitivity of a particular layer to quantization. This, naturally, assumes that at least some of the input quants were created by someone with deep knowledge of the model. A small amount of logic exists to undermine the influence of models that bluntly quantize everything and to exaggerate the influence of models that employ broader ranges of dtypes. It's theoretically possible for a carefully tuned fp8 model that carefully selects layers to preserve at fp32 to outperform a fp16 model that uniformly downsamples. It is hoped that this logic captures that design pattern.
Example usage:
strategize_quants.py z_image_turbo_bf16.safetensors z-image-turbo_fp8_scaled_e4m3fn_KJ.safetensors z_image_turbo_nvfp4.safetensors
=========================================================================

