model scheme technique task main PR change
--------------------------------------------------------------------------------------------------------------------
Meta-Llama-3-8B-Instruct NVFP4 awq_rtn gsm8k_platinum 71.46% 69.89% -2.20%
Qwen2.5-3B-Instruct NVFP4 awq_rtn gsm8k_platinum 23.33% 29.61% +26.92% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B NVFP4 awq_rtn gsm8k_platinum 92.31% 91.32% -1.07%
Meta-Llama-3-8B-Instruct NVFP4 awq_rtn mmlu 63.55%** 63.46% -0.14%
Qwen2.5-3B-Instruct NVFP4 awq_rtn mmlu 63.12%** 63.34%** +0.35%
Qwen3-30B-A3B NVFP4 awq_rtn mmlu 78.46% 78.39% -0.09%
Meta-Llama-3-8B-Instruct NVFP4 awq_rtn wikitext 20.26 18.86 +6.91%
Qwen2.5-3B-Instruct NVFP4 awq_rtn wikitext 12.84** 12.83** +0.08%
Qwen3-30B-A3B NVFP4 awq_rtn wikitext 12.05** 12.03** +0.17%
# overall slightly positive awq_rtn
Meta-Llama-3-8B-Instruct NVFP4 gptq gsm8k_platinum 72.37%** 72.54%** +0.23%
Qwen2.5-3B-Instruct NVFP4 gptq gsm8k_platinum 54.67%** 49.88% -8.76% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B NVFP4 gptq gsm8k_platinum 92.31% 92.22% -0.10%
Meta-Llama-3-8B-Instruct NVFP4 gptq mmlu 62.97% 62.88% -0.14% fine
Qwen2.5-3B-Instruct NVFP4 gptq mmlu 62.93% 63.15% +0.35%
Qwen3-30B-A3B NVFP4 gptq mmlu 78.61%** 78.37% -0.31%
Meta-Llama-3-8B-Instruct NVFP4 gptq wikitext 1021.43 1442.89 -41.26% ???? i ran this multiple times with/without chat template, no idea, other evals look fine for this model
Qwen2.5-3B-Instruct NVFP4 gptq wikitext 12.97 13.01 -0.31%
Qwen3-30B-A3B NVFP4 gptq wikitext 12.10 12.14 -0.33%
# overall slightly negative seems ok
Meta-Llama-3-8B-Instruct NVFP4 imatrix gsm8k_platinum 0.00% 71.38% N/A (bug main - fixed)
Qwen2.5-3B-Instruct NVFP4 imatrix gsm8k_platinum 0.00% 52.44%** N/A (bug main - fixed)
Qwen3-30B-A3B NVFP4 imatrix gsm8k_platinum 89.33% 92.97%** +4.07% (bug main - fixed)
Meta-Llama-3-8B-Instruct NVFP4 imatrix mmlu 25.37% 63.67%** +150.97% (bug main - fixed)
Qwen2.5-3B-Instruct NVFP4 imatrix mmlu 24.25% 62.26% +156.74% (bug main - fixed)
Qwen3-30B-A3B NVFP4 imatrix mmlu 77.44% 78.36% +1.19% (bug main - fixed)
Meta-Llama-3-8B-Instruct NVFP4 imatrix wikitext 86574.63 18.47 +99.98% (bug main - fixed)
Qwen2.5-3B-Instruct NVFP4 imatrix wikitext 610.98 13.28 +97.83% (bug main - fixed)
Qwen3-30B-A3B NVFP4 imatrix wikitext 13.20 12.03** +8.86% (bug main - fixed)
# bugfix, i don't really understand why the Qwen3-30B-A3B results look reasonable here on main but either way this is fine on PR
Meta-Llama-3-8B-Instruct NVFP4 rtn_mse gsm8k_platinum 71.55% 72.04% +0.68%
Qwen2.5-3B-Instruct NVFP4 rtn_mse gsm8k_platinum 29.45% 48.30% +64.01% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B NVFP4 rtn_mse gsm8k_platinum 93.71%** 92.22% -1.59%
Meta-Llama-3-8B-Instruct NVFP4 rtn_mse mmlu 63.20% 63.38% +0.28%
Qwen2.5-3B-Instruct NVFP4 rtn_mse mmlu 62.42% 62.59% +0.27%
Qwen3-30B-A3B NVFP4 rtn_mse mmlu 78.36% 78.61%** +0.32%
Meta-Llama-3-8B-Instruct NVFP4 rtn_mse wikitext 20.52 18.09 +11.84%
Qwen2.5-3B-Instruct NVFP4 rtn_mse wikitext 13.16 13.25 -0.68%
Qwen3-30B-A3B NVFP4 rtn_mse wikitext 12.17 12.23 -0.49%
# looks good, note this implies that MSE ignoring global scale is great and also that ignoring quant for static activation global_scale looks fine
Meta-Llama-3-8B-Instruct NVFP4 rtn gsm8k_platinum 70.64% 70.80% +0.23%
Qwen2.5-3B-Instruct NVFP4 rtn gsm8k_platinum 52.85% 50.12% -5.17% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B NVFP4 rtn gsm8k_platinum 92.22% 92.64% +0.46%
Meta-Llama-3-8B-Instruct NVFP4 rtn mmlu 63.25% 63.38% +0.21%
Qwen2.5-3B-Instruct NVFP4 rtn mmlu 62.78% 62.59% -0.30%
Qwen3-30B-A3B NVFP4 rtn mmlu 78.26% 78.46% +0.26%
Meta-Llama-3-8B-Instruct NVFP4 rtn wikitext 19.53** 17.82** +8.76%
Qwen2.5-3B-Instruct NVFP4 rtn wikitext 13.27 13.27 +0.00%
Qwen3-30B-A3B NVFP4 rtn wikitext 12.12 12.16 -0.33%
# ignoring quant for static activation global_scale seems fine, potentially helpful
# note some of the FP8 runs failed for Qwen2.5-3B-Instruct because someone jumped on the GPUs i was using
# also gsm8k for Qwen2.5-3B seems to be very volatile
model scheme technique task main PR change
--------------------------------------------------------------------------------------------------------------------
Meta-Llama-3-8B-Instruct FP8 awq_rtn gsm8k_platinum 76.59% 78.00% +1.84%
Qwen3-30B-A3B FP8 awq_rtn gsm8k_platinum 93.22% 93.13% -0.10%
Meta-Llama-3-8B-Instruct FP8 awq_rtn mmlu 65.62% 65.80% +0.27%
Qwen3-30B-A3B FP8 awq_rtn mmlu 79.60% 79.63% +0.04%
Meta-Llama-3-8B-Instruct FP8 awq_rtn wikitext 19.42 18.72 +3.60%
Qwen3-30B-A3B FP8 awq_rtn wikitext 11.70 11.64 +0.51%
# small improvement
Meta-Llama-3-8B-Instruct W4A16 awq_rtn gsm8k_platinum 72.62% 71.96% -0.91% biggest real drop
Qwen2.5-3B-Instruct W4A16 awq_rtn gsm8k_platinum 21.84% 20.68% -5.31% volatile
Qwen3-30B-A3B W4A16 awq_rtn gsm8k_platinum 91.73% 91.89% +0.17%
Meta-Llama-3-8B-Instruct W4A16 awq_rtn mmlu 64.07% 64.28% +0.33%
Qwen2.5-3B-Instruct W4A16 awq_rtn mmlu 64.49% 64.35% -0.22%
Qwen3-30B-A3B W4A16 awq_rtn mmlu 78.79% 78.65% -0.18%
Meta-Llama-3-8B-Instruct W4A16 awq_rtn wikitext 11.50 11.50 +0.00%
Qwen2.5-3B-Instruct W4A16 awq_rtn wikitext 12.78 12.78 +0.00%
Qwen3-30B-A3B W4A16 awq_rtn wikitext 11.98 11.99 -0.08%
# small drop
Meta-Llama-3-8B-Instruct FP8 gptq gsm8k_platinum 77.25% 77.92% +0.87%
Qwen3-30B-A3B FP8 gptq gsm8k_platinum 93.30% 92.64% -0.71%
Meta-Llama-3-8B-Instruct FP8 gptq mmlu 66.12% 65.82% -0.45%
Qwen3-30B-A3B FP8 gptq mmlu 79.42% 79.36% -0.08%
Meta-Llama-3-8B-Instruct FP8 gptq wikitext 18.82 18.91 -0.48%
Qwen3-30B-A3B FP8 gptq wikitext 11.64 11.61 +0.26%
# steady
Meta-Llama-3-8B-Instruct W4A16 gptq gsm8k_platinum 74.03% 74.11% +0.11%
Qwen2.5-3B-Instruct W4A16 gptq gsm8k_platinum 35.81% 47.56% +32.81% volatile
Qwen3-30B-A3B W4A16 gptq gsm8k_platinum 90.90% 92.14% +1.36%
Meta-Llama-3-8B-Instruct W4A16 gptq mmlu 64.75% 64.55% -0.31%
Qwen2.5-3B-Instruct W4A16 gptq mmlu 64.06% 64.28% +0.34%
Qwen3-30B-A3B W4A16 gptq mmlu 78.88% 79.06% +0.23%
Meta-Llama-3-8B-Instruct W4A16 gptq wikitext 11.39 11.49 -0.88%
Qwen2.5-3B-Instruct W4A16 gptq wikitext 12.55 12.54 +0.08%
Qwen3-30B-A3B W4A16 gptq wikitext 11.87 11.92 -0.42%
# steady aside from the volatile one
Meta-Llama-3-8B-Instruct FP8 imatrix gsm8k_platinum 76.26% 75.77% -0.64%
Qwen2.5-3B-Instruct FP8 imatrix gsm8k_platinum 19.69% 19.93% +1.22% volatile
Qwen3-30B-A3B FP8 imatrix gsm8k_platinum 93.47% 93.47% +0.00%
Meta-Llama-3-8B-Instruct FP8 imatrix mmlu 65.81% 65.74% -0.11%
Qwen2.5-3B-Instruct FP8 imatrix mmlu 65.97% 65.90% -0.11%
Qwen3-30B-A3B FP8 imatrix mmlu 79.45% 79.51% +0.08%
Meta-Llama-3-8B-Instruct FP8 imatrix wikitext 19.69 18.99 +3.56%
Qwen2.5-3B-Instruct FP8 imatrix wikitext 11.80 11.80 +0.00%
Qwen3-30B-A3B FP8 imatrix wikitext 11.63 11.59 +0.34%
# small improvement
Meta-Llama-3-8B-Instruct W4A16 imatrix gsm8k_platinum 72.54% 72.29% -0.34%
Qwen2.5-3B-Instruct W4A16 imatrix gsm8k_platinum 43.09% 42.85% -0.56% volatile
Qwen3-30B-A3B W4A16 imatrix gsm8k_platinum 90.65% 91.48% +0.92%
Meta-Llama-3-8B-Instruct W4A16 imatrix mmlu 64.66% 64.67% +0.02%
Qwen2.5-3B-Instruct W4A16 imatrix mmlu 64.00% 64.01% +0.02%
Qwen3-30B-A3B W4A16 imatrix mmlu 78.47% 78.48% +0.01%
Meta-Llama-3-8B-Instruct W4A16 imatrix wikitext 11.45 11.45 +0.00%
Qwen2.5-3B-Instruct W4A16 imatrix wikitext 12.66 12.66 +0.00%
Qwen3-30B-A3B W4A16 imatrix wikitext 12.08 12.08 +0.00%
# steady
Meta-Llama-3-8B-Instruct FP8 rtn_mse gsm8k_platinum 78.08% 78.66% +0.74%
Qwen3-30B-A3B FP8 rtn_mse gsm8k_platinum 93.63% 93.55% -0.09%
Meta-Llama-3-8B-Instruct FP8 rtn_mse mmlu 65.59% 65.80% +0.32%
Qwen3-30B-A3B FP8 rtn_mse mmlu 79.52% 79.55% +0.04%
Meta-Llama-3-8B-Instruct FP8 rtn_mse wikitext 19.17 18.72 +2.35%
Qwen2.5-3B-Instruct FP8 rtn_mse wikitext 11.78 11.77 +0.08%
Qwen3-30B-A3B FP8 rtn_mse wikitext 11.60 11.59 +0.09%
# small improvement
Meta-Llama-3-8B-Instruct W4A16 rtn_mse gsm8k_platinum 68.40% 67.99% -0.60%
Qwen2.5-3B-Instruct W4A16 rtn_mse gsm8k_platinum 55.00% 52.69% -4.20% volatile
Qwen3-30B-A3B W4A16 rtn_mse gsm8k_platinum 90.90% 90.32% -0.64%
Meta-Llama-3-8B-Instruct W4A16 rtn_mse wikitext 11.63 11.63 +0.00%
Qwen2.5-3B-Instruct W4A16 rtn_mse wikitext 14.13 14.13 +0.00%
Qwen3-30B-A3B W4A16 rtn_mse wikitext 12.24 12.23 +0.08%
Meta-Llama-3-8B-Instruct W4A16 rtn_mse mmlu 63.20% 63.22% +0.03%
Qwen2.5-3B-Instruct W4A16 rtn_mse mmlu 62.53% 62.56% +0.05%
Qwen3-30B-A3B W4A16 rtn_mse mmlu 78.14% 78.10% -0.05%
# steady
Meta-Llama-3-8B-Instruct FP8 rtn gsm8k_platinum 77.42% 77.50% +0.10%
Qwen3-30B-A3B FP8 rtn gsm8k_platinum 92.64% 92.22% -0.45%
Meta-Llama-3-8B-Instruct FP8 rtn mmlu 65.80% 65.93% +0.20%
Qwen3-30B-A3B FP8 rtn mmlu 79.48% 79.35% -0.16%
Meta-Llama-3-8B-Instruct FP8 rtn wikitext 18.99 18.38 +3.21%
Qwen3-30B-A3B FP8 rtn wikitext 11.58 11.59 -0.09%
# small improvement
Meta-Llama-3-8B-Instruct W4A16 rtn gsm8k_platinum 69.89% 70.89% +1.43%
Qwen2.5-3B-Instruct W4A16 rtn gsm8k_platinum 46.98% 46.48% -1.06% volatile
Qwen3-30B-A3B W4A16 rtn gsm8k_platinum 90.57% 91.07% +0.55%
Meta-Llama-3-8B-Instruct W4A16 rtn mmlu 63.45% 63.55% +0.16%
Qwen2.5-3B-Instruct W4A16 rtn mmlu 58.08% 58.08% +0.00%
Qwen3-30B-A3B W4A16 rtn mmlu 78.43% 78.35% -0.10%
Meta-Llama-3-8B-Instruct W4A16 rtn wikitext 11.67 11.67 +0.00%
Qwen2.5-3B-Instruct W4A16 rtn wikitext 16.25 16.26 -0.06%
Qwen3-30B-A3B W4A16 rtn wikitext 11.99 11.99 +0.00%
# steady
model scheme technique task main PR change
--------------------------------------------------------------------------------------------------------------------
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 awq_rtn gsm8k_platinum 73.53%** 71.55% -2.69%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 awq_rtn mmlu 63.46% 63.21% -0.39%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 awq_rtn wikitext 20.10 18.92 +5.87%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 gptq gsm8k_platinum 71.88% 70.22% -2.31%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 gptq mmlu 63.60%** 63.99%** +0.61%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 gptq wikitext 18.36** 18.20 +0.87%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 imatrix gsm8k_platinum 0.00% 71.96%** N/A
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 imatrix mmlu 23.79% 63.50% +166.92%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 imatrix wikitext 74398.66 18.40 +99.98%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 rtn gsm8k_platinum 71.38% 70.97% -0.57%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 rtn mmlu 62.87% 63.09% +0.35%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 rtn wikitext 18.96 17.85** +5.85%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 rtn_mse gsm8k_platinum 71.05% 70.89% -0.23%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 rtn_mse mmlu 63.32% 63.12% -0.32%
Meta-Llama-3-8B-Instruct-DDP2 NVFP4 rtn_mse wikitext 21.24 18.10 +14.78%
seems ok, a little volatile, but indicates everything is working more or less as expected