"All things leave behind them the Obscurity... and go forward to embrace the Brightness..." — Dao De Jing #42
- Q: Who provides the best GGUFs now?
- A: They're all pretty good.
Skip down if you just want graphs and numbers comparing various Qwen3-30B-A3B GGUF quants.
It's been well over a year since TheBloke uploaded his last quant to huggingface. The LLM landscape has changed markedly since then with many new models being released monthly, new inference engines targeting specific hardware optimizations, and ongoing evolution of quantization algorithims. Our community continues to grow and diversify at an amazing rate.
Fortunately, many folks and organizations have kindly stepped-up to keep the quants cooking so we can all find an LLM sized just right to fit on our home rigs. Amongst them bartowski, and unsloth (Daniel and Michael's start-up company), have become the new "household names" for providing a variety of GGUF quantizations for popular model releases and even all those wild creative fine-tunes! (There are many more including team mradermacher and too many to list everyone, sorry!)
Until recently most GGUF style quants' recipes were "static" meaning that
all the tensors and layers were quantized the same e.g. Q8_0 or with
consistent patterns defined in llama.cpp's code. So all quants of a given size were
mostly the same regardless of who cooked and uploaded it to huggingface.
Things began to change over a year ago with major advancements
like importance matrix quantizations by ikawrakow in llama.cpp PR#4861
as well as new quant types (like the perennial favorite IQ4_XS)
which have become the mainstay for users of llama.cpp, ollama, koboldcpp,
lmstudio, etc. The entire GGUF ecosystem owes a big thanks to not just to
ggerganov but also ikawrakow (as well as the many more contributors).
Very recently unsloth introduced a few changes to their quantization methodology that combine different imatrix calibration texts and context lengths along with making some tensors/layers different sizes than the regular llama.cpp code (they had a public fork with their branch, but have to update and re-push due to upstream changes). They have named this change in standard methodology Unsloth Dynamic 2.0 GGUFs as part of their start-up company's marketing strategy.
Around the same time bartowski has been experimenting with different
imatrix calibration texts and opened a PR to llama.cpp modifying the
default tensor/layer quantization recipes. I myself began experimenting
with custom "dynamic" quantization recipes using ikawrakow's latest SOTA
quants like iq4_k which to-date only work on his ik_llama.cpp
fork.
While this is great news for all GGUF enjoyers, the friendly competition and additional options have led to some confusion and I dare say some "tribalism". (If part of your identity as a person depends on downloading quants from only one source, I suggest you google: "Nan Yar?").
So how can you, dear reader, decide which is the best quant of a given model for you to download? unsloth already did a great blog post discussing their own benchmarks and metrics. Open a tab to check out u/AaronFeng47's many other benchmarks. And finally, this post contains even more metrics and benchmarks. The best answer I have is "Nullius in verba, (Latin for "take nobody's word for it") — even my word!
Unfortunately, this means there is no one-size-fits-all rule, "X" is not always better than "Y", and if you want to min-max-optimize your LLM for your specific use case on your specific hardware you probably will have to experiment and think critically. If you don't care too much, then pick the any of biggest quants that fit on your rig for the desired context length and you'll be fine because: they're all pretty good.
And with that, let's dive into the Qwen3-30B-A3B benchmarks below!
Shout out to Wendell and the Level1Techs crew, the L1T Forums, and the L1T YouTube Channel! BIG thanks for providing BIG hardware expertise and access to run these experiments and make great quants available to the community!!!
👈 Qwen3-30B-A3B Benchmark Suite Graphs
Note <think> mode was disabled for these tests to speed up benchmarking.
👈 Qwen3-30B-A3B Perplexity and KLD Graphs
Using the BF16 as baseline for KLD stats. Also note the perplexity was lowest ("best") for models other than the bf16 which is not typically the case unless there was possibly some QAT going on. As such, the chart is relative to the lowest perplexity score: PPL/min(PPL)-1 plus a small eps for scaling.
wiki.test.raw (lower is "better")
ubergarm-kdl-test-corpus.txt (lower is "better")
(lower is "better")
(lower is "better")
👈 Qwen3-235B-A22B Perplexity and KLD Graphs
Not as many data points here but just for comparison. Keep in mind the Q8_0 was the baseline for KLD stats given I couldn't easily run the full BF16.
wiki.test.raw (lower is "better")
ubergarm-kdl-test-corpus.txt (lower is "better")
(lower is "better")
(lower is "better")
👈 Qwen3-30B-A3B Speed llama-sweep-bench Graphs
llama.cpp
ik_llama.cpp
NOTE: Keep in mind ik's fork is faster than mainline llama.cpp for many architectures and configurations especially only-CPU, hybrid-CPU+GPU, and DeepSeek MLA cases.
👈 Perplexity, KLD, and imatrix Methodology
PPL and KLD testing done with ik_llama.cpp@9ba36270.
I adjust ngl and threads for larger 235B models.
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-perplexity \
    -m "$model" \
    --ctx-size 512 \
    --ubatch-size 512 \
    -f wiki.test.raw \
    -fa \
    -ngl 99 \
    --seed 1337 \
    --threads 1
I adjust ngl and threads for larger 235B models.
For 235B I had to use the Q8_0 as the baseline given this rig can't easily run the full 400+GiB BF16.
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-perplexity \
    -m "$model" \
    --kl-divergence-base /mnt/raid/models/ubergarm/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-BF16-ubergarm-kld-test-corpus-base.dat \
    --kl-divergence \
    -f ubergarm-kld-test-corpus.txt \
    -fa \
    -ngl 99 \
    --seed 1337 \
    --threads 1
This is how I make my imatrix using ik_llama.cpp to additionaly print out cosine similarity data to inform possible custom quant strategies. I haven't seen how exactly unsloth makes their new recipe.
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-imatrix \
    --verbosity 1 \
    --layer-similarity \
    -m /mnt/raid/models/Qwen/Qwen3-30B-A3B/Qwen3-30B-A3B-BF16-00001-of-00002.gguf \
    -f calibration_data_v5_rc.txt \
    -o /mnt/raid/models/ubergarm/Qwen3-30B-A3B-GGUF/imatrix-Qwen3-30B-A3B.dat \
    --ctx-size 512 \
    -ngl 36 \
    --threads 16
======================== sorted layer importances
  0: Layer   0, <cos_sim> = 0.32154
  1: Layer  47, <cos_sim> = 0.38473
  2: Layer   1, <cos_sim> = 0.736987
  3: Layer  28, <cos_sim> = 0.845492
  4: Layer   2, <cos_sim> = 0.847391
  5: Layer  29, <cos_sim> = 0.859291
  6: Layer   7, <cos_sim> = 0.861405
  7: Layer   3, <cos_sim> = 0.878313
  8: Layer   8, <cos_sim> = 0.893971
  9: Layer   6, <cos_sim> = 0.900308
 10: Layer  42, <cos_sim> = 0.911525
 11: Layer   5, <cos_sim> = 0.912156
 12: Layer  17, <cos_sim> = 0.913169
 13: Layer   4, <cos_sim> = 0.914095
 14: Layer  13, <cos_sim> = 0.92175
 15: Layer  46, <cos_sim> = 0.925283
 16: Layer  19, <cos_sim> = 0.926845
 17: Layer  18, <cos_sim> = 0.927019
 18: Layer  45, <cos_sim> = 0.928896
 19: Layer  40, <cos_sim> = 0.934481
 20: Layer  31, <cos_sim> = 0.934585
 21: Layer  14, <cos_sim> = 0.936932
 22: Layer  16, <cos_sim> = 0.940338
 23: Layer  25, <cos_sim> = 0.940477
 24: Layer  10, <cos_sim> = 0.942312
 25: Layer  38, <cos_sim> = 0.943166
 26: Layer   9, <cos_sim> = 0.943843
 27: Layer  11, <cos_sim> = 0.944233
 28: Layer  37, <cos_sim> = 0.944325
 29: Layer  20, <cos_sim> = 0.94612
 30: Layer  22, <cos_sim> = 0.946449
 31: Layer  41, <cos_sim> = 0.946775
 32: Layer  39, <cos_sim> = 0.947228
 33: Layer  44, <cos_sim> = 0.947687
 34: Layer  30, <cos_sim> = 0.947942
 35: Layer  23, <cos_sim> = 0.949102
 36: Layer  12, <cos_sim> = 0.951618
 37: Layer  21, <cos_sim> = 0.951701
 38: Layer  24, <cos_sim> = 0.952261
 39: Layer  43, <cos_sim> = 0.953357
 40: Layer  27, <cos_sim> = 0.953528
 41: Layer  26, <cos_sim> = 0.95575
 42: Layer  32, <cos_sim> = 0.956024
 43: Layer  15, <cos_sim> = 0.956915
 44: Layer  35, <cos_sim> = 0.959861
 45: Layer  36, <cos_sim> = 0.960591
 46: Layer  34, <cos_sim> = 0.961539
 47: Layer  33, <cos_sim> = 0.968161
======================== sorted attention importances
  0: Layer   0, <cos_sim> = 0.353019
  1: Layer  45, <cos_sim> = 0.638476
  2: Layer   1, <cos_sim> = 0.674894
  3: Layer  29, <cos_sim> = 0.686547
  4: Layer  17, <cos_sim> = 0.708034
  5: Layer   3, <cos_sim> = 0.718456
  6: Layer  21, <cos_sim> = 0.72082
  7: Layer  44, <cos_sim> = 0.732611
  8: Layer  22, <cos_sim> = 0.738435
  9: Layer  18, <cos_sim> = 0.742531
 10: Layer  42, <cos_sim> = 0.745018
 11: Layer   8, <cos_sim> = 0.746792
 12: Layer  24, <cos_sim> = 0.750162
 13: Layer  23, <cos_sim> = 0.750384
 14: Layer   9, <cos_sim> = 0.754324
 15: Layer  46, <cos_sim> = 0.758528
 16: Layer  33, <cos_sim> = 0.76019
 17: Layer  47, <cos_sim> = 0.760449
 18: Layer  27, <cos_sim> = 0.760966
 19: Layer   4, <cos_sim> = 0.761774
 20: Layer   2, <cos_sim> = 0.762337
 21: Layer   6, <cos_sim> = 0.763453
 22: Layer  34, <cos_sim> = 0.765167
 23: Layer  30, <cos_sim> = 0.768629
 24: Layer  25, <cos_sim> = 0.768819
 25: Layer  26, <cos_sim> = 0.769841
 26: Layer  20, <cos_sim> = 0.77039
 27: Layer  10, <cos_sim> = 0.772251
 28: Layer  41, <cos_sim> = 0.773975
 29: Layer  35, <cos_sim> = 0.774599
 30: Layer  43, <cos_sim> = 0.775401
 31: Layer  11, <cos_sim> = 0.776914
 32: Layer  28, <cos_sim> = 0.778543
 33: Layer  19, <cos_sim> = 0.781975
 34: Layer  36, <cos_sim> = 0.78645
 35: Layer  32, <cos_sim> = 0.790626
 36: Layer  15, <cos_sim> = 0.795375
 37: Layer  12, <cos_sim> = 0.797279
 38: Layer  16, <cos_sim> = 0.797483
 39: Layer  14, <cos_sim> = 0.797921
 40: Layer   7, <cos_sim> = 0.80098
 41: Layer   5, <cos_sim> = 0.802361
 42: Layer  37, <cos_sim> = 0.805299
 43: Layer  13, <cos_sim> = 0.806054
 44: Layer  31, <cos_sim> = 0.807454
 45: Layer  38, <cos_sim> = 0.808983
 46: Layer  40, <cos_sim> = 0.813216
 47: Layer  39, <cos_sim> = 0.816557
======================== sorted ffn importances
  0: Layer  47, <cos_sim> = 0.613059
  1: Layer  44, <cos_sim> = 0.630819
  2: Layer   0, <cos_sim> = 0.653987
  3: Layer  28, <cos_sim> = 0.686159
  4: Layer  16, <cos_sim> = 0.693473
  5: Layer   7, <cos_sim> = 0.694612
  6: Layer  43, <cos_sim> = 0.710648
  7: Layer  20, <cos_sim> = 0.71511
  8: Layer  21, <cos_sim> = 0.715567
  9: Layer  46, <cos_sim> = 0.71785
 10: Layer  45, <cos_sim> = 0.718143
 11: Layer   1, <cos_sim> = 0.726385
 12: Layer   3, <cos_sim> = 0.735632
 13: Layer   8, <cos_sim> = 0.736597
 14: Layer   2, <cos_sim> = 0.737616
 15: Layer  22, <cos_sim> = 0.739272
 16: Layer  33, <cos_sim> = 0.739951
 17: Layer  19, <cos_sim> = 0.740003
 18: Layer   9, <cos_sim> = 0.742748
 19: Layer  32, <cos_sim> = 0.747542
 20: Layer  23, <cos_sim> = 0.749229
 21: Layer  24, <cos_sim> = 0.755807
 22: Layer  41, <cos_sim> = 0.75653
 23: Layer  10, <cos_sim> = 0.757337
 24: Layer  34, <cos_sim> = 0.758472
 25: Layer  31, <cos_sim> = 0.759585
 26: Layer  40, <cos_sim> = 0.763913
 27: Layer  17, <cos_sim> = 0.768032
 28: Layer  26, <cos_sim> = 0.768999
 29: Layer  18, <cos_sim> = 0.771782
 30: Layer   6, <cos_sim> = 0.776553
 31: Layer   4, <cos_sim> = 0.777394
 32: Layer  27, <cos_sim> = 0.777827
 33: Layer  35, <cos_sim> = 0.778635
 34: Layer  42, <cos_sim> = 0.779552
 35: Layer  36, <cos_sim> = 0.779963
 36: Layer  25, <cos_sim> = 0.785371
 37: Layer  12, <cos_sim> = 0.785794
 38: Layer  29, <cos_sim> = 0.787757
 39: Layer   5, <cos_sim> = 0.79259
 40: Layer  11, <cos_sim> = 0.793774
 41: Layer  15, <cos_sim> = 0.796992
 42: Layer  30, <cos_sim> = 0.797935
 43: Layer  14, <cos_sim> = 0.7999
 44: Layer  39, <cos_sim> = 0.806665
 45: Layer  38, <cos_sim> = 0.813561
 46: Layer  13, <cos_sim> = 0.820982
 47: Layer  37, <cos_sim> = 0.830343
👈 Benchmarking Methodology
The benchmark client used is bartowski's patched evalchemy fork containing fixes for easier use across a variety of LLM server API endpoints.
Benchmark test suite testing done with llama.cpp@36667c8e on a subset of models.
For llama.cpp server:
cd llama.cpp
git checkout 36667c8e
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j $(nproc)
model=/mnt/raid/models/bartowski/Qwen_Qwen3-30B-A3B-GGUF/Qwen_Qwen3-30B-A3B-IQ2_M.gguf
name=bartowski/Qwen3-30B-A3B-IQ2_M
CUDA_VISIBLE_DEVICES="1" \
./build/bin/llama-server \
  --model "$model" \
  --alias "$name" \
  --api-key super-secret-change-me \
  -fa \
  -ctk f16 -ctv f16 \
  -c 262144 \
  --parallel 8 \
  --slots \
  -ngl 99 \
  --threads 1 \
  --host 127.0.0.1 \
  --port 8088
For ik_llama.cpp server:
cd ik_llama.cpp
git checkout e3fec173
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j $(nproc)
model=/mnt/raid/models/ubergarm/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-mix-IQ4_K.gguf
name=ubergarm/Qwen3-30B-A3B-mix-IQ4_K
CUDA_VISIBLE_DEVICES="1" \
./build/bin/llama-server \
  --model "$model" \
  --alias "$name" \
  --api-key super-secret-change-me \
  -fmoe \
  -fa \
  -ctk f16 -ctv f16 \
  -c 262144 \
  --parallel 8 \
  -ngl 99 \
  --threads 1 \
  --host 127.0.0.1 \
  --port 8088
For vllm server:
CUDA_VISIBLE_DEVICES="1" \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
VLLM_USE_MODELSCOPE=True \
vllm \
  serve swift/Qwen3-30B-A3B-AWQ \
  --served-model-name Qwen3-30B-A3B-AWQ \
  --gpu-memory-utilization 0.9 \
  --max-model-len 32768 \
  --max-num-seqs 64 \
  --api-key super-secret-change-me \
  --host 127.0.0.1 \
  --port 8080
👈 Speed Benchmark Methodology
Note probably no warmup, I saw a PR on ik's fork about it so the first data point trends low.cd llama.cpp
git ug/port-sweep-bench
# llama.cpp@814f795e + ug/port-sweep-bench
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j $(nproc)
#model=/mnt/astrodata/llm/models/bartowski/Qwen_Qwen3-30B-A3B-GGUF/Qwen_Qwen3-30B-A3B-Q4_K_M.gguf
#model=/mnt/astrodata/llm/models/bartowski/Qwen_Qwen3-30B-A3B-GGUF/Qwen_Qwen3-30B-A3B-Q2_K_L.gguf
#model=/mnt/astrodata/llm/models/bartowski/Qwen_Qwen3-30B-A3B-GGUF/Qwen_Qwen3-30B-A3B-IQ2_M.gguf
#model=/mnt/astrodata/llm/models/unsloth/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-UD-Q2_K_XL.gguf
#model=/mnt/astrodata/llm/models/unsloth/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-UD-IQ2_M.gguf
model=/mnt/astrodata/llm/models/unsloth/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-UD-Q4_K_XL.gguf
CUDA_VISIBLE_DEVICE=0 \
./build/bin/llama-sweep-bench \
    --model "$model" \
    -fa \
    -ctk f16 -ctv f16 \
    -c 32768 \
    -ngl 99 \
    --threads 1 \
👈 Perplexity, KLD, and Δp Raw Data Table
Parsed this data from a bunch of logs generated above. It is not in the most beautiful order so feel free to copy paste into google docs or however you'd like to make your own graphs.
| Model | Size | 0.1% Δp | 1.0% KLD | 1.0% Δp | 10.0% KLD | 10.0% Δp | 25.0% Δp | 5.0% KLD | 5.0% Δp | 75.0% Δp | 90.0% Δp | 95.0% Δp | 99.0% KLD | 99.0% Δp | 99.9% KLD | 99.9% Δp | Maximum KLD | Maximum Δp | Mean KLD | Mean KLD uncertainty | Mean Δp | Mean Δp uncertainty | Mean PPL(Q) ubergarm-kld-test-corpus.txt | Mean PPL(Q) uncertainty ubergarm-kld-test-corpus.txt | Median KLD | Median Δp | Minimum KLD | Minimum Δp | PPL uncertainty wiki.test.raw | PPL wiki.test.raw | RMS Δp | RMS Δp uncertainty | Same top p | Same top p uncertainty | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen3-235B-A22B-BF16 | 438 | |||||||||||||||||||||||||||||||||
| ubergarm/Qwen3-235B-A22B-Q8_0 | 233 | 11.7194 | 0.07212 | 0.03321 | 5.3141 | |||||||||||||||||||||||||||||
| ubergarm/Qwen3-235B-A22B-mix-IQ3_K | 107 | -18.276% | 0.000036 | -8.542% | 0.000940 | -2.631% | -0.686% | 0.000310 | -4.272% | 0.587% | 2.504% | 4.175% | 0.098368 | 8.257% | 0.296680 | 17.122% | 2.906263 | 63.764% | 0.014594 | 0.000064 | -0.049 | 0.006 | 11.788282 | 0.072648 | 0.008979 | -0.001% | -0.000039 | -72.329% | 0.03421 | 5.4403 | 2.846 | 0.017 | 93.459 | 0.056 | 
| lmstudio-community/Qwen3-235B-A22B-Q3_K_L | 104 | -27.956% | 0.000083 | -14.266% | 0.002466 | -4.579% | -1.294% | 0.000766 | -7.290% | 0.786% | 3.742% | 6.267% | 0.219563 | 12.470% | 0.628216 | 24.126% | 8.358958 | 77.349% | 0.036266 | 0.000140 | -0.284 | 0.010 | 11.904309 | 0.073302 | 0.023930 | -0.010% | -0.000003 | -99.970% | 0.03584 | 5.6582 | 4.496 | 0.025 | 89.756 | 0.069 | 
| unsloth/Qwen3-235B-A22B-UD-Q3_K_XL | 97 | -25.243% | 0.000060 | -12.180% | 0.001945 | -3.752% | -0.962% | 0.000612 | -6.159% | 0.874% | 3.649% | 5.976% | 0.180988 | 11.713% | 0.543533 | 22.421% | 5.471307 | 64.130% | 0.029122 | 0.000123 | -0.059 | 0.009 | 11.855173 | 0.073300 | 0.018888 | -0.000% | -0.000004 | -98.693% | 0.03524 | 5.5695 | 4.018 | 0.023 | 90.694 | 0.066 | 
| Qwen/Qwen3-30B-A3B-BF16 | 56.9 | 15.1443 | 0.10239 | 0.07223 | 9.0703 | |||||||||||||||||||||||||||||
| ubergarm/Qwen3-30B-A3B-Q8_0 | 30.3 | -7.050% | 0.000001 | -3.834% | 0.000154 | -1.241% | -0.282% | 0.000038 | -2.035% | 0.231% | 1.176% | 1.964% | 0.013699 | 3.763% | 0.039718 | 7.128% | 0.359152 | 28.466% | 0.002337 | 0.000009 | -0.020 | 0.003 | 15.152095 | 0.102398 | 0.001587 | -0.000% | -0.000047 | -34.379% | 0.07228 | 9.0740 | 1.279 | 0.008 | 96.972 | 0.039 | 
| ubergarm/Qwen3-30B-A3B-mix-IQ4_K | 17.7 | -11.731% | 0.000004 | -5.522% | 0.000298 | -1.645% | -0.376% | 0.000080 | -2.742% | 0.326% | 1.592% | 2.682% | 0.032109 | 5.373% | 0.104454 | 10.626% | 2.514502 | 39.508% | 0.004821 | 0.000024 | -0.025 | 0.004 | 15.218819 | 0.103071 | 0.002970 | -0.000% | -0.000048 | -44.213% | 0.07278 | 9.1184 | 1.818 | 0.011 | 95.945 | 0.045 | 
| bartowski/Qwen3-30B-A3B-Q4_K_M | 17.4 | -16.135% | 0.000008 | -8.303% | 0.000652 | -2.643% | -0.645% | 0.000171 | -4.286% | 0.398% | 2.084% | 3.570% | 0.063238 | 7.356% | 0.195169 | 14.392% | 5.985787 | 61.522% | 0.010136 | 0.000053 | -0.158 | 0.006 | 15.194468 | 0.102605 | 0.006434 | -0.001% | -0.000032 | -88.357% | 0.07381 | 9.2092 | 2.619 | 0.018 | 94.329 | 0.053 | 
| bartowski/Qwen3-30B-A3B-Q4_K_S | 16.8 | -18.122% | 0.000013 | -9.230% | 0.000862 | -3.006% | -0.780% | 0.000235 | -4.787% | 0.402% | 2.215% | 3.866% | 0.077885 | 7.972% | 0.233980 | 15.420% | 5.971601 | 66.795% | 0.012915 | 0.000065 | -0.227 | 0.007 | 15.202408 | 0.102513 | 0.008261 | -0.002% | -0.000038 | -87.019% | 0.07371 | 9.2232 | 2.885 | 0.019 | 93.804 | 0.055 | 
| unsloth/Qwen3-30B-A3B-UD-Q4_K_XL | 16.5 | -21.984% | 0.000015 | -11.111% | 0.001152 | -3.508% | -0.938% | 0.000315 | -5.582% | 0.421% | 2.460% | 4.261% | 0.102021 | 8.910% | 0.305740 | 17.384% | 5.570370 | 67.990% | 0.016495 | 0.000071 | -0.320 | 0.008 | 15.281833 | 0.103140 | 0.010432 | -0.005% | -0.000016 | -85.356% | 0.07290 | 9.1688 | 3.333 | 0.020 | 93.169 | 0.058 | 
| ubergarm/Qwen3-30B-A3B-IQ4_KS | 15.5 | -20.721% | 0.000018 | -10.000% | 0.001003 | -3.073% | -0.796% | 0.000292 | -5.017% | 0.442% | 2.398% | 4.167% | 0.094074 | 8.691% | 0.282245 | 16.987% | 6.828948 | 89.561% | 0.014617 | 0.000068 | -0.209 | 0.007 | 15.182811 | 0.102278 | 0.008934 | -0.003% | -0.000031 | -75.475% | 0.07061 | 8.9862 | 3.106 | 0.019 | 93.625 | 0.056 | 
| ikawrakow/Qwen3-30B-A3B-IQ4_KS-Bartowski | 15.3 | -20.846% | 0.000021 | -10.497% | 0.001098 | -3.434% | -0.905% | 0.000316 | -5.433% | 0.421% | 2.427% | 4.216% | 0.099815 | 8.719% | 0.290617 | 17.546% | 6.971420 | 81.571% | 0.015818 | 0.000074 | -0.288 | 0.007 | 15.150462 | 0.101931 | 0.009988 | -0.004% | -0.000029 | -86.592% | 0.07078 | 9.0016 | 3.244 | 0.020 | 93.317 | 0.057 | 
| ikawrakow/Qwen3-30B-A3B-IQ4_KS-IK | 15.3 | -21.414% | 0.000026 | -10.689% | 0.001192 | -3.461% | -0.959% | 0.000352 | -5.489% | 0.405% | 2.383% | 4.163% | 0.102473 | 8.750% | 0.301946 | 17.416% | 7.146766 | 58.365% | 0.016277 | 0.000074 | -0.323 | 0.007 | 15.161535 | 0.101972 | 0.010269 | -0.006% | -0.000007 | -90.822% | 0.07094 | 9.0177 | 3.265 | 0.019 | 93.216 | 0.057 | 
| ikawrakow/Qwen3-30B-A3B-IQ4_KS-Unslolth | 15.3 | -21.919% | 0.000023 | -11.082% | 0.001218 | -3.610% | -1.015% | 0.000351 | -5.698% | 0.396% | 2.355% | 4.173% | 0.104796 | 8.799% | 0.314624 | 18.042% | 7.383745 | 78.742% | 0.016845 | 0.000077 | -0.366 | 0.008 | 15.109454 | 0.101327 | 0.010667 | -0.006% | -0.000012 | -86.065% | 0.06945 | 8.9171 | 3.331 | 0.020 | 93.217 | 0.057 | 
| unsloth/Qwen3-30B-A3B-UD-IQ2_M | 10.1 | -47.141% | 0.000072 | -22.803% | 0.004283 | -6.698% | -1.739% | 0.001229 | -11.071% | 0.843% | 4.934% | 8.514% | 0.457244 | 17.671% | 1.370219 | 34.262% | 8.153114 | 88.509% | 0.066646 | 0.000267 | -0.607 | 0.015 | 15.889509 | 0.107834 | 0.039668 | -0.011% | -0.000011 | -99.283% | 0.08541 | 10.3726 | 6.627 | 0.033 | 87.029 | 0.077 | 
| bartowski/Qwen3-30B-A3B-IQ2_M | 9.7 | -48.093% | 0.000068 | -24.583% | 0.005231 | -8.541% | -2.590% | 0.001459 | -13.210% | 0.538% | 4.031% | 7.477% | 0.432021 | 16.466% | 1.262156 | 31.659% | 8.695639 | 80.027% | 0.069100 | 0.000258 | -1.300 | 0.016 | 15.436905 | 0.102661 | 0.044448 | -0.039% | -0.000004 | -96.452% | 0.08036 | 9.9788 | 6.979 | 0.033 | 86.303 | 0.079 | 
👈 Benchmark Suite Raw Data Table
TODO copy/paste it all somewhere if there is enough interest.
👈 llama-sweep-bench Speed Data
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | 
|---|---|---|---|---|---|---|
| 512 | 128 | 0 | 0.186 | 2746.40 | 0.912 | 140.37 | 
| 512 | 128 | 512 | 0.189 | 2709.05 | 0.941 | 135.99 | 
| 512 | 128 | 1024 | 0.190 | 2689.73 | 0.940 | 136.22 | 
| 512 | 128 | 1536 | 0.195 | 2631.96 | 0.943 | 135.78 | 
| 512 | 128 | 2048 | 0.197 | 2601.24 | 0.957 | 133.69 | 
| 512 | 128 | 2560 | 0.201 | 2553.51 | 0.959 | 133.43 | 
| 512 | 128 | 3072 | 0.203 | 2526.21 | 0.966 | 132.56 | 
| 512 | 128 | 3584 | 0.207 | 2472.32 | 0.976 | 131.16 | 
| 512 | 128 | 4096 | 0.210 | 2432.41 | 0.986 | 129.80 | 
| 512 | 128 | 4608 | 0.213 | 2406.39 | 0.996 | 128.50 | 
| 512 | 128 | 5120 | 0.215 | 2385.53 | 1.008 | 126.99 | 
| 512 | 128 | 5632 | 0.218 | 2347.09 | 1.018 | 125.72 | 
| 512 | 128 | 6144 | 0.221 | 2321.62 | 1.029 | 124.44 | 
| 512 | 128 | 6656 | 0.224 | 2287.95 | 1.041 | 123.02 | 
| 512 | 128 | 7168 | 0.227 | 2252.04 | 1.053 | 121.57 | 
| 512 | 128 | 7680 | 0.231 | 2218.25 | 1.065 | 120.17 | 
| 512 | 128 | 8192 | 0.233 | 2194.17 | 1.075 | 119.04 | 
| 512 | 128 | 8704 | 0.235 | 2175.86 | 1.086 | 117.92 | 
| 512 | 128 | 9216 | 0.240 | 2133.00 | 1.099 | 116.47 | 
| 512 | 128 | 9728 | 0.241 | 2126.89 | 1.109 | 115.46 | 
| 512 | 128 | 10240 | 0.245 | 2089.25 | 1.120 | 114.25 | 
| 512 | 128 | 10752 | 0.249 | 2055.28 | 1.164 | 109.96 | 
| 512 | 128 | 11264 | 0.252 | 2032.46 | 1.181 | 108.43 | 
| 512 | 128 | 11776 | 0.254 | 2011.96 | 1.171 | 109.29 | 
| 512 | 128 | 12288 | 0.257 | 1993.13 | 1.175 | 108.95 | 
| 512 | 128 | 12800 | 0.260 | 1970.94 | 1.184 | 108.08 | 
| 512 | 128 | 13312 | 0.264 | 1939.95 | 1.186 | 107.95 | 
| 512 | 128 | 13824 | 0.265 | 1930.30 | 1.194 | 107.24 | 
| 512 | 128 | 14336 | 0.270 | 1897.48 | 1.197 | 106.89 | 
| 512 | 128 | 14848 | 0.272 | 1880.96 | 1.204 | 106.32 | 
| 512 | 128 | 15360 | 0.276 | 1856.05 | 1.214 | 105.45 | 
| 512 | 128 | 15872 | 0.279 | 1832.42 | 1.221 | 104.82 | 
| 512 | 128 | 16384 | 0.283 | 1809.73 | 1.229 | 104.13 | 
| 512 | 128 | 16896 | 0.285 | 1796.89 | 1.234 | 103.69 | 
| 512 | 128 | 17408 | 0.288 | 1778.96 | 1.242 | 103.08 | 
| 512 | 128 | 17920 | 0.293 | 1746.74 | 1.249 | 102.52 | 
| 512 | 128 | 18432 | 0.296 | 1729.58 | 1.256 | 101.89 | 
| 512 | 128 | 18944 | 0.298 | 1715.59 | 1.264 | 101.23 | 
| 512 | 128 | 19456 | 0.302 | 1697.53 | 1.269 | 100.87 | 
| 512 | 128 | 19968 | 0.304 | 1684.14 | 1.278 | 100.13 | 
| 512 | 128 | 20480 | 0.307 | 1665.46 | 1.284 | 99.71 | 
| 512 | 128 | 20992 | 0.311 | 1644.88 | 1.291 | 99.12 | 
| 512 | 128 | 21504 | 0.314 | 1631.38 | 1.334 | 95.97 | 
| 512 | 128 | 22016 | 0.317 | 1613.83 | 1.347 | 95.01 | 
| 512 | 128 | 22528 | 0.321 | 1596.46 | 1.339 | 95.57 | 
| 512 | 128 | 23040 | 0.322 | 1589.42 | 1.345 | 95.16 | 
| 512 | 128 | 23552 | 0.325 | 1573.55 | 1.352 | 94.64 | 
| 512 | 128 | 24064 | 0.329 | 1556.41 | 1.358 | 94.25 | 
| 512 | 128 | 24576 | 0.333 | 1537.96 | 1.363 | 93.93 | 
| 512 | 128 | 25088 | 0.335 | 1529.21 | 1.369 | 93.52 | 
| 512 | 128 | 25600 | 0.340 | 1506.80 | 1.378 | 92.91 | 
| 512 | 128 | 26112 | 0.343 | 1494.38 | 1.383 | 92.54 | 
| 512 | 128 | 26624 | 0.347 | 1476.69 | 1.392 | 91.98 | 
| 512 | 128 | 27136 | 0.350 | 1464.63 | 1.398 | 91.53 | 
| 512 | 128 | 27648 | 0.353 | 1451.77 | 1.405 | 91.13 | 
| 512 | 128 | 28160 | 0.355 | 1442.42 | 1.411 | 90.69 | 
| 512 | 128 | 28672 | 0.359 | 1427.94 | 1.418 | 90.26 | 
| 512 | 128 | 29184 | 0.362 | 1415.01 | 1.426 | 89.77 | 
| 512 | 128 | 29696 | 0.364 | 1406.75 | 1.433 | 89.33 | 
| 512 | 128 | 30208 | 0.367 | 1393.57 | 1.441 | 88.84 | 
| 512 | 128 | 30720 | 0.371 | 1379.72 | 1.450 | 88.27 | 
| 512 | 128 | 31232 | 0.374 | 1367.29 | 1.456 | 87.93 | 
| 512 | 128 | 31744 | 0.378 | 1355.16 | 1.464 | 87.43 | 
| 512 | 128 | 32256 | 0.381 | 1343.89 | 1.507 | 84.94 | 
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | 
|---|---|---|---|---|---|---|
| 512 | 128 | 0 | 0.219 | 2342.04 | 0.940 | 136.14 | 
| 512 | 128 | 512 | 0.221 | 2320.24 | 0.968 | 132.17 | 
| 512 | 128 | 1024 | 0.222 | 2302.08 | 0.968 | 132.25 | 
| 512 | 128 | 1536 | 0.228 | 2245.09 | 0.976 | 131.11 | 
| 512 | 128 | 2048 | 0.230 | 2230.09 | 0.990 | 129.34 | 
| 512 | 128 | 2560 | 0.233 | 2201.35 | 0.998 | 128.21 | 
| 512 | 128 | 3072 | 0.236 | 2168.36 | 1.005 | 127.38 | 
| 512 | 128 | 3584 | 0.240 | 2128.94 | 1.014 | 126.18 | 
| 512 | 128 | 4096 | 0.243 | 2102.88 | 1.025 | 124.82 | 
| 512 | 128 | 4608 | 0.245 | 2093.47 | 1.035 | 123.68 | 
| 512 | 128 | 5120 | 0.248 | 2062.11 | 1.045 | 122.44 | 
| 512 | 128 | 5632 | 0.251 | 2042.84 | 1.057 | 121.12 | 
| 512 | 128 | 6144 | 0.254 | 2016.60 | 1.069 | 119.78 | 
| 512 | 128 | 6656 | 0.256 | 1996.33 | 1.081 | 118.46 | 
| 512 | 128 | 7168 | 0.260 | 1965.62 | 1.090 | 117.42 | 
| 512 | 128 | 7680 | 0.264 | 1939.11 | 1.103 | 116.03 | 
| 512 | 128 | 8192 | 0.267 | 1917.69 | 1.114 | 114.86 | 
| 512 | 128 | 8704 | 0.269 | 1902.68 | 1.123 | 113.97 | 
| 512 | 128 | 9216 | 0.275 | 1864.88 | 1.139 | 112.41 | 
| 512 | 128 | 9728 | 0.275 | 1864.80 | 1.149 | 111.43 | 
| 512 | 128 | 10240 | 0.280 | 1831.10 | 1.173 | 109.12 | 
| 512 | 128 | 10752 | 0.282 | 1813.40 | 1.209 | 105.90 | 
| 512 | 128 | 11264 | 0.286 | 1792.80 | 1.224 | 104.61 | 
| 512 | 128 | 11776 | 0.289 | 1769.64 | 1.217 | 105.19 | 
| 512 | 128 | 12288 | 0.291 | 1756.56 | 1.219 | 104.97 | 
| 512 | 128 | 12800 | 0.296 | 1730.89 | 1.230 | 104.08 | 
| 512 | 128 | 13312 | 0.298 | 1717.56 | 1.231 | 103.94 | 
| 512 | 128 | 13824 | 0.299 | 1709.78 | 1.237 | 103.48 | 
| 512 | 128 | 14336 | 0.304 | 1684.98 | 1.241 | 103.15 | 
| 512 | 128 | 14848 | 0.306 | 1672.32 | 1.247 | 102.63 | 
| 512 | 128 | 15360 | 0.309 | 1657.69 | 1.251 | 102.28 | 
| 512 | 128 | 15872 | 0.312 | 1642.84 | 1.258 | 101.72 | 
| 512 | 128 | 16384 | 0.316 | 1620.66 | 1.265 | 101.16 | 
| 512 | 128 | 16896 | 0.319 | 1603.11 | 1.271 | 100.68 | 
| 512 | 128 | 17408 | 0.322 | 1592.25 | 1.280 | 100.04 | 
| 512 | 128 | 17920 | 0.325 | 1573.98 | 1.286 | 99.52 | 
| 512 | 128 | 18432 | 0.328 | 1560.54 | 1.295 | 98.82 | 
| 512 | 128 | 18944 | 0.331 | 1547.27 | 1.303 | 98.27 | 
| 512 | 128 | 19456 | 0.336 | 1525.32 | 1.308 | 97.87 | 
| 512 | 128 | 19968 | 0.336 | 1523.96 | 1.317 | 97.16 | 
| 512 | 128 | 20480 | 0.339 | 1509.92 | 1.323 | 96.72 | 
| 512 | 128 | 20992 | 0.342 | 1498.56 | 1.328 | 96.36 | 
| 512 | 128 | 21504 | 0.344 | 1487.29 | 1.368 | 93.54 | 
| 512 | 128 | 22016 | 0.348 | 1469.52 | 1.386 | 92.32 | 
| 512 | 128 | 22528 | 0.351 | 1458.22 | 1.377 | 92.95 | 
| 512 | 128 | 23040 | 0.354 | 1447.65 | 1.383 | 92.56 | 
| 512 | 128 | 23552 | 0.357 | 1434.13 | 1.392 | 91.95 | 
| 512 | 128 | 24064 | 0.361 | 1417.81 | 1.397 | 91.60 | 
| 512 | 128 | 24576 | 0.365 | 1401.75 | 1.400 | 91.40 | 
| 512 | 128 | 25088 | 0.367 | 1395.82 | 1.408 | 90.89 | 
| 512 | 128 | 25600 | 0.369 | 1387.75 | 1.412 | 90.67 | 
| 512 | 128 | 26112 | 0.374 | 1368.77 | 1.418 | 90.29 | 
| 512 | 128 | 26624 | 0.377 | 1359.02 | 1.427 | 89.71 | 
| 512 | 128 | 27136 | 0.380 | 1347.28 | 1.434 | 89.25 | 
| 512 | 128 | 27648 | 0.383 | 1336.61 | 1.439 | 88.92 | 
| 512 | 128 | 28160 | 0.387 | 1322.05 | 1.446 | 88.50 | 
| 512 | 128 | 28672 | 0.389 | 1315.73 | 1.454 | 88.02 | 
| 512 | 128 | 29184 | 0.392 | 1307.57 | 1.461 | 87.58 | 
| 512 | 128 | 29696 | 0.395 | 1295.59 | 1.468 | 87.16 | 
| 512 | 128 | 30208 | 0.400 | 1281.33 | 1.475 | 86.77 | 
| 512 | 128 | 30720 | 0.403 | 1269.72 | 1.485 | 86.17 | 
| 512 | 128 | 31232 | 0.406 | 1260.77 | 1.493 | 85.75 | 
| 512 | 128 | 31744 | 0.411 | 1245.97 | 1.499 | 85.37 | 
| 512 | 128 | 32256 | 0.411 | 1244.60 | 1.538 | 83.20 | 
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | 
|---|---|---|---|---|---|---|
| 512 | 128 | 0 | 0.199 | 2571.39 | 0.929 | 137.72 | 
| 512 | 128 | 512 | 0.200 | 2558.87 | 0.958 | 133.66 | 
| 512 | 128 | 1024 | 0.205 | 2502.88 | 0.958 | 133.60 | 
| 512 | 128 | 1536 | 0.209 | 2449.39 | 0.966 | 132.45 | 
| 512 | 128 | 2048 | 0.211 | 2424.91 | 0.979 | 130.70 | 
| 512 | 128 | 2560 | 0.214 | 2387.42 | 0.981 | 130.51 | 
| 512 | 128 | 3072 | 0.217 | 2359.21 | 0.990 | 129.36 | 
| 512 | 128 | 3584 | 0.220 | 2322.95 | 1.001 | 127.93 | 
| 512 | 128 | 4096 | 0.224 | 2281.51 | 1.011 | 126.63 | 
| 512 | 128 | 4608 | 0.226 | 2264.66 | 1.020 | 125.44 | 
| 512 | 128 | 5120 | 0.228 | 2246.85 | 1.031 | 124.21 | 
| 512 | 128 | 5632 | 0.231 | 2218.24 | 1.040 | 123.07 | 
| 512 | 128 | 6144 | 0.235 | 2177.99 | 1.054 | 121.47 | 
| 512 | 128 | 6656 | 0.237 | 2158.85 | 1.065 | 120.14 | 
| 512 | 128 | 7168 | 0.241 | 2124.91 | 1.078 | 118.72 | 
| 512 | 128 | 7680 | 0.245 | 2088.47 | 1.094 | 116.98 | 
| 512 | 128 | 8192 | 0.248 | 2066.12 | 1.106 | 115.68 | 
| 512 | 128 | 8704 | 0.250 | 2044.39 | 1.117 | 114.54 | 
| 512 | 128 | 9216 | 0.253 | 2023.04 | 1.130 | 113.27 | 
| 512 | 128 | 9728 | 0.256 | 2002.81 | 1.141 | 112.18 | 
| 512 | 128 | 10240 | 0.259 | 1980.01 | 1.154 | 110.94 | 
| 512 | 128 | 10752 | 0.263 | 1945.18 | 1.198 | 106.84 | 
| 512 | 128 | 11264 | 0.265 | 1928.54 | 1.211 | 105.70 | 
| 512 | 128 | 11776 | 0.268 | 1908.01 | 1.204 | 106.28 | 
| 512 | 128 | 12288 | 0.271 | 1891.82 | 1.207 | 106.08 | 
| 512 | 128 | 12800 | 0.275 | 1861.92 | 1.216 | 105.27 | 
| 512 | 128 | 13312 | 0.277 | 1846.15 | 1.219 | 104.99 | 
| 512 | 128 | 13824 | 0.280 | 1829.45 | 1.226 | 104.43 | 
| 512 | 128 | 14336 | 0.283 | 1807.34 | 1.229 | 104.17 | 
| 512 | 128 | 14848 | 0.286 | 1789.55 | 1.233 | 103.77 | 
| 512 | 128 | 15360 | 0.289 | 1774.14 | 1.241 | 103.12 | 
| 512 | 128 | 15872 | 0.293 | 1750.23 | 1.248 | 102.55 | 
| 512 | 128 | 16384 | 0.296 | 1730.68 | 1.256 | 101.88 | 
| 512 | 128 | 16896 | 0.299 | 1713.86 | 1.261 | 101.49 | 
| 512 | 128 | 17408 | 0.301 | 1700.49 | 1.271 | 100.72 | 
| 512 | 128 | 17920 | 0.306 | 1671.47 | 1.281 | 99.93 | 
| 512 | 128 | 18432 | 0.310 | 1652.08 | 1.291 | 99.17 | 
| 512 | 128 | 18944 | 0.313 | 1637.83 | 1.299 | 98.53 | 
| 512 | 128 | 19456 | 0.316 | 1618.98 | 1.302 | 98.32 | 
| 512 | 128 | 19968 | 0.317 | 1612.79 | 1.314 | 97.42 | 
| 512 | 128 | 20480 | 0.321 | 1595.76 | 1.319 | 97.04 | 
| 512 | 128 | 20992 | 0.326 | 1572.01 | 1.327 | 96.43 | 
| 512 | 128 | 21504 | 0.328 | 1561.24 | 1.369 | 93.51 | 
| 512 | 128 | 22016 | 0.332 | 1543.74 | 1.383 | 92.57 | 
| 512 | 128 | 22528 | 0.335 | 1529.05 | 1.373 | 93.23 | 
| 512 | 128 | 23040 | 0.336 | 1524.73 | 1.374 | 93.17 | 
| 512 | 128 | 23552 | 0.337 | 1517.70 | 1.386 | 92.33 | 
| 512 | 128 | 24064 | 0.343 | 1493.95 | 1.387 | 92.27 | 
| 512 | 128 | 24576 | 0.346 | 1481.52 | 1.393 | 91.88 | 
| 512 | 128 | 25088 | 0.349 | 1466.47 | 1.401 | 91.37 | 
| 512 | 128 | 25600 | 0.350 | 1462.59 | 1.406 | 91.06 | 
| 512 | 128 | 26112 | 0.356 | 1438.68 | 1.413 | 90.61 | 
| 512 | 128 | 26624 | 0.359 | 1425.06 | 1.418 | 90.29 | 
| 512 | 128 | 27136 | 0.361 | 1417.08 | 1.426 | 89.75 | 
| 512 | 128 | 27648 | 0.365 | 1403.93 | 1.433 | 89.33 | 
| 512 | 128 | 28160 | 0.368 | 1389.95 | 1.442 | 88.74 | 
| 512 | 128 | 28672 | 0.371 | 1380.36 | 1.454 | 88.02 | 
| 512 | 128 | 29184 | 0.374 | 1369.27 | 1.458 | 87.79 | 
| 512 | 128 | 29696 | 0.378 | 1355.92 | 1.465 | 87.36 | 
| 512 | 128 | 30208 | 0.381 | 1345.24 | 1.471 | 87.01 | 
| 512 | 128 | 30720 | 0.383 | 1336.71 | 1.482 | 86.39 | 
| 512 | 128 | 31232 | 0.387 | 1324.60 | 1.486 | 86.11 | 
| 512 | 128 | 31744 | 0.390 | 1311.28 | 1.494 | 85.65 | 
| 512 | 128 | 32256 | 0.393 | 1302.29 | 1.535 | 83.40 | 
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | 
|---|---|---|---|---|---|---|
| 512 | 128 | 0 | 0.185 | 2771.13 | 0.895 | 143.07 | 
| 512 | 128 | 512 | 0.187 | 2735.63 | 0.923 | 138.71 | 
| 512 | 128 | 1024 | 0.190 | 2699.01 | 0.921 | 138.95 | 
| 512 | 128 | 1536 | 0.195 | 2627.30 | 0.930 | 137.64 | 
| 512 | 128 | 2048 | 0.196 | 2614.49 | 0.943 | 135.73 | 
| 512 | 128 | 2560 | 0.200 | 2560.59 | 0.947 | 135.10 | 
| 512 | 128 | 3072 | 0.202 | 2528.42 | 0.954 | 134.19 | 
| 512 | 128 | 3584 | 0.206 | 2481.69 | 0.964 | 132.77 | 
| 512 | 128 | 4096 | 0.210 | 2443.23 | 0.974 | 131.47 | 
| 512 | 128 | 4608 | 0.212 | 2413.67 | 0.985 | 129.96 | 
| 512 | 128 | 5120 | 0.214 | 2394.67 | 0.995 | 128.61 | 
| 512 | 128 | 5632 | 0.219 | 2340.45 | 1.015 | 126.14 | 
| 512 | 128 | 6144 | 0.222 | 2306.96 | 1.024 | 125.01 | 
| 512 | 128 | 6656 | 0.225 | 2273.36 | 1.035 | 123.64 | 
| 512 | 128 | 7168 | 0.228 | 2242.54 | 1.050 | 121.92 | 
| 512 | 128 | 7680 | 0.231 | 2212.63 | 1.060 | 120.71 | 
| 512 | 128 | 8192 | 0.235 | 2182.09 | 1.068 | 119.82 | 
| 512 | 128 | 8704 | 0.237 | 2157.82 | 1.082 | 118.25 | 
| 512 | 128 | 9216 | 0.241 | 2123.14 | 1.097 | 116.72 | 
| 512 | 128 | 9728 | 0.243 | 2109.32 | 1.104 | 115.90 | 
| 512 | 128 | 10240 | 0.246 | 2077.16 | 1.119 | 114.35 | 
| 512 | 128 | 10752 | 0.250 | 2049.47 | 1.168 | 109.62 | 
| 512 | 128 | 11264 | 0.254 | 2017.75 | 1.183 | 108.21 | 
| 512 | 128 | 11776 | 0.255 | 2009.66 | 1.173 | 109.13 | 
| 512 | 128 | 12288 | 0.259 | 1976.27 | 1.176 | 108.86 | 
| 512 | 128 | 12800 | 0.261 | 1957.95 | 1.186 | 107.97 | 
| 512 | 128 | 13312 | 0.266 | 1926.83 | 1.187 | 107.84 | 
| 512 | 128 | 13824 | 0.267 | 1914.87 | 1.191 | 107.45 | 
| 512 | 128 | 14336 | 0.271 | 1888.06 | 1.196 | 107.00 | 
| 512 | 128 | 14848 | 0.274 | 1869.73 | 1.202 | 106.49 | 
| 512 | 128 | 15360 | 0.277 | 1849.09 | 1.209 | 105.84 | 
| 512 | 128 | 15872 | 0.280 | 1828.40 | 1.215 | 105.35 | 
| 512 | 128 | 16384 | 0.284 | 1801.44 | 1.224 | 104.57 | 
| 512 | 128 | 16896 | 0.287 | 1781.87 | 1.229 | 104.13 | 
| 512 | 128 | 17408 | 0.290 | 1767.18 | 1.239 | 103.35 | 
| 512 | 128 | 17920 | 0.293 | 1747.06 | 1.245 | 102.83 | 
| 512 | 128 | 18432 | 0.296 | 1731.39 | 1.252 | 102.25 | 
| 512 | 128 | 18944 | 0.299 | 1712.43 | 1.259 | 101.64 | 
| 512 | 128 | 19456 | 0.303 | 1690.65 | 1.265 | 101.17 | 
| 512 | 128 | 19968 | 0.304 | 1682.41 | 1.276 | 100.31 | 
| 512 | 128 | 20480 | 0.308 | 1660.25 | 1.280 | 99.99 | 
| 512 | 128 | 20992 | 0.312 | 1641.94 | 1.285 | 99.57 | 
| 512 | 128 | 21504 | 0.314 | 1628.35 | 1.331 | 96.17 | 
| 512 | 128 | 22016 | 0.318 | 1611.79 | 1.346 | 95.11 | 
| 512 | 128 | 22528 | 0.321 | 1596.28 | 1.337 | 95.72 | 
| 512 | 128 | 23040 | 0.324 | 1580.92 | 1.340 | 95.54 | 
| 512 | 128 | 23552 | 0.325 | 1573.30 | 1.351 | 94.74 | 
| 512 | 128 | 24064 | 0.330 | 1552.94 | 1.350 | 94.81 | 
| 512 | 128 | 24576 | 0.334 | 1534.84 | 1.355 | 94.48 | 
| 512 | 128 | 25088 | 0.335 | 1526.93 | 1.361 | 94.06 | 
| 512 | 128 | 25600 | 0.339 | 1511.89 | 1.366 | 93.70 | 
| 512 | 128 | 26112 | 0.343 | 1492.70 | 1.383 | 92.55 | 
| 512 | 128 | 26624 | 0.347 | 1476.86 | 1.387 | 92.27 | 
| 512 | 128 | 27136 | 0.350 | 1462.35 | 1.397 | 91.63 | 
| 512 | 128 | 27648 | 0.354 | 1446.91 | 1.404 | 91.16 | 
| 512 | 128 | 28160 | 0.356 | 1438.02 | 1.412 | 90.66 | 
| 512 | 128 | 28672 | 0.361 | 1419.66 | 1.418 | 90.26 | 
| 512 | 128 | 29184 | 0.362 | 1413.92 | 1.426 | 89.77 | 
| 512 | 128 | 29696 | 0.365 | 1401.20 | 1.433 | 89.32 | 
| 512 | 128 | 30208 | 0.368 | 1391.23 | 1.439 | 88.97 | 
| 512 | 128 | 30720 | 0.372 | 1377.54 | 1.450 | 88.29 | 
| 512 | 128 | 31232 | 0.374 | 1369.93 | 1.453 | 88.09 | 
| 512 | 128 | 31744 | 0.378 | 1356.09 | 1.462 | 87.56 | 
| 512 | 128 | 32256 | 0.380 | 1347.04 | 1.503 | 85.14 | 
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | 
|---|---|---|---|---|---|---|
| 512 | 128 | 0 | 0.211 | 2423.89 | 0.943 | 135.74 | 
| 512 | 128 | 512 | 0.213 | 2399.31 | 0.971 | 131.85 | 
| 512 | 128 | 1024 | 0.216 | 2374.33 | 0.969 | 132.11 | 
| 512 | 128 | 1536 | 0.219 | 2340.30 | 0.979 | 130.80 | 
| 512 | 128 | 2048 | 0.220 | 2325.55 | 0.991 | 129.11 | 
| 512 | 128 | 2560 | 0.225 | 2276.44 | 0.994 | 128.82 | 
| 512 | 128 | 3072 | 0.228 | 2247.84 | 1.001 | 127.89 | 
| 512 | 128 | 3584 | 0.232 | 2207.44 | 1.011 | 126.60 | 
| 512 | 128 | 4096 | 0.236 | 2170.20 | 1.023 | 125.06 | 
| 512 | 128 | 4608 | 0.236 | 2166.89 | 1.032 | 124.00 | 
| 512 | 128 | 5120 | 0.240 | 2131.06 | 1.044 | 122.58 | 
| 512 | 128 | 5632 | 0.244 | 2102.12 | 1.054 | 121.41 | 
| 512 | 128 | 6144 | 0.247 | 2076.33 | 1.063 | 120.43 | 
| 512 | 128 | 6656 | 0.249 | 2055.14 | 1.077 | 118.82 | 
| 512 | 128 | 7168 | 0.253 | 2024.47 | 1.088 | 117.68 | 
| 512 | 128 | 7680 | 0.256 | 1996.90 | 1.099 | 116.45 | 
| 512 | 128 | 8192 | 0.260 | 1967.17 | 1.114 | 114.93 | 
| 512 | 128 | 8704 | 0.260 | 1967.20 | 1.122 | 114.06 | 
| 512 | 128 | 9216 | 0.266 | 1922.64 | 1.135 | 112.81 | 
| 512 | 128 | 9728 | 0.268 | 1911.09 | 1.147 | 111.63 | 
| 512 | 128 | 10240 | 0.272 | 1885.44 | 1.157 | 110.64 | 
| 512 | 128 | 10752 | 0.274 | 1865.36 | 1.202 | 106.45 | 
| 512 | 128 | 11264 | 0.278 | 1844.60 | 1.217 | 105.18 | 
| 512 | 128 | 11776 | 0.279 | 1836.43 | 1.208 | 105.93 | 
| 512 | 128 | 12288 | 0.283 | 1810.13 | 1.213 | 105.57 | 
| 512 | 128 | 12800 | 0.288 | 1780.11 | 1.229 | 104.16 | 
| 512 | 128 | 13312 | 0.291 | 1758.14 | 1.229 | 104.12 | 
| 512 | 128 | 13824 | 0.292 | 1753.98 | 1.238 | 103.39 | 
| 512 | 128 | 14336 | 0.298 | 1718.12 | 1.241 | 103.10 | 
| 512 | 128 | 14848 | 0.300 | 1706.26 | 1.247 | 102.61 | 
| 512 | 128 | 15360 | 0.302 | 1693.28 | 1.254 | 102.07 | 
| 512 | 128 | 15872 | 0.306 | 1673.01 | 1.262 | 101.46 | 
| 512 | 128 | 16384 | 0.310 | 1650.90 | 1.268 | 100.96 | 
| 512 | 128 | 16896 | 0.313 | 1638.03 | 1.275 | 100.41 | 
| 512 | 128 | 17408 | 0.315 | 1625.29 | 1.281 | 99.90 | 
| 512 | 128 | 17920 | 0.318 | 1609.23 | 1.289 | 99.31 | 
| 512 | 128 | 18432 | 0.322 | 1589.10 | 1.297 | 98.68 | 
| 512 | 128 | 18944 | 0.325 | 1575.42 | 1.302 | 98.29 | 
| 512 | 128 | 19456 | 0.330 | 1553.28 | 1.310 | 97.73 | 
| 512 | 128 | 19968 | 0.330 | 1552.98 | 1.319 | 97.05 | 
| 512 | 128 | 20480 | 0.334 | 1531.58 | 1.324 | 96.67 | 
| 512 | 128 | 20992 | 0.337 | 1518.07 | 1.332 | 96.12 | 
| 512 | 128 | 21504 | 0.340 | 1507.15 | 1.373 | 93.25 | 
| 512 | 128 | 22016 | 0.344 | 1488.06 | 1.385 | 92.41 | 
| 512 | 128 | 22528 | 0.347 | 1477.13 | 1.378 | 92.88 | 
| 512 | 128 | 23040 | 0.349 | 1467.54 | 1.384 | 92.47 | 
| 512 | 128 | 23552 | 0.351 | 1459.50 | 1.394 | 91.80 | 
| 512 | 128 | 24064 | 0.356 | 1440.13 | 1.397 | 91.61 | 
| 512 | 128 | 24576 | 0.359 | 1426.95 | 1.401 | 91.36 | 
| 512 | 128 | 25088 | 0.360 | 1423.59 | 1.409 | 90.82 | 
| 512 | 128 | 25600 | 0.364 | 1405.52 | 1.413 | 90.62 | 
| 512 | 128 | 26112 | 0.369 | 1388.93 | 1.419 | 90.18 | 
| 512 | 128 | 26624 | 0.371 | 1379.47 | 1.426 | 89.79 | 
| 512 | 128 | 27136 | 0.374 | 1369.38 | 1.434 | 89.28 | 
| 512 | 128 | 27648 | 0.377 | 1357.58 | 1.441 | 88.85 | 
| 512 | 128 | 28160 | 0.382 | 1342.07 | 1.447 | 88.44 | 
| 512 | 128 | 28672 | 0.384 | 1333.90 | 1.455 | 87.99 | 
| 512 | 128 | 29184 | 0.386 | 1326.66 | 1.461 | 87.62 | 
| 512 | 128 | 29696 | 0.390 | 1313.92 | 1.468 | 87.22 | 
| 512 | 128 | 30208 | 0.394 | 1298.28 | 1.483 | 86.34 | 
| 512 | 128 | 30720 | 0.398 | 1286.81 | 1.488 | 86.02 | 
| 512 | 128 | 31232 | 0.400 | 1280.36 | 1.494 | 85.70 | 
| 512 | 128 | 31744 | 0.405 | 1263.20 | 1.502 | 85.21 | 
| 512 | 128 | 32256 | 0.407 | 1257.02 | 1.545 | 82.83 | 
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | 
|---|---|---|---|---|---|---|
| 512 | 128 | 0 | 0.198 | 2588.92 | 0.982 | 130.29 | 
| 512 | 128 | 512 | 0.199 | 2574.42 | 1.008 | 127.04 | 
| 512 | 128 | 1024 | 0.203 | 2527.70 | 1.007 | 127.07 | 
| 512 | 128 | 1536 | 0.206 | 2488.20 | 1.017 | 125.90 | 
| 512 | 128 | 2048 | 0.207 | 2468.48 | 1.031 | 124.15 | 
| 512 | 128 | 2560 | 0.211 | 2427.66 | 1.037 | 123.42 | 
| 512 | 128 | 3072 | 0.215 | 2376.22 | 1.045 | 122.45 | 
| 512 | 128 | 3584 | 0.218 | 2344.71 | 1.055 | 121.32 | 
| 512 | 128 | 4096 | 0.220 | 2323.83 | 1.066 | 120.13 | 
| 512 | 128 | 4608 | 0.224 | 2286.56 | 1.075 | 119.08 | 
| 512 | 128 | 5120 | 0.226 | 2263.56 | 1.086 | 117.87 | 
| 512 | 128 | 5632 | 0.229 | 2233.20 | 1.097 | 116.64 | 
| 512 | 128 | 6144 | 0.231 | 2216.06 | 1.108 | 115.56 | 
| 512 | 128 | 6656 | 0.235 | 2174.18 | 1.125 | 113.82 | 
| 512 | 128 | 7168 | 0.239 | 2141.53 | 1.137 | 112.61 | 
| 512 | 128 | 7680 | 0.244 | 2099.48 | 1.148 | 111.48 | 
| 512 | 128 | 8192 | 0.245 | 2087.77 | 1.160 | 110.38 | 
| 512 | 128 | 8704 | 0.247 | 2076.19 | 1.170 | 109.38 | 
| 512 | 128 | 9216 | 0.251 | 2040.21 | 1.183 | 108.22 | 
| 512 | 128 | 9728 | 0.252 | 2028.41 | 1.192 | 107.39 | 
| 512 | 128 | 10240 | 0.255 | 2006.18 | 1.204 | 106.35 | 
| 512 | 128 | 10752 | 0.257 | 1988.99 | 1.247 | 102.62 | 
| 512 | 128 | 11264 | 0.261 | 1963.06 | 1.264 | 101.28 | 
| 512 | 128 | 11776 | 0.262 | 1951.61 | 1.257 | 101.84 | 
| 512 | 128 | 12288 | 0.265 | 1932.03 | 1.260 | 101.62 | 
| 512 | 128 | 12800 | 0.269 | 1901.02 | 1.269 | 100.83 | 
| 512 | 128 | 13312 | 0.272 | 1882.44 | 1.271 | 100.72 | 
| 512 | 128 | 13824 | 0.273 | 1873.24 | 1.274 | 100.48 | 
| 512 | 128 | 14336 | 0.277 | 1845.12 | 1.281 | 99.91 | 
| 512 | 128 | 14848 | 0.280 | 1830.87 | 1.290 | 99.24 | 
| 512 | 128 | 15360 | 0.282 | 1812.46 | 1.296 | 98.79 | 
| 512 | 128 | 15872 | 0.286 | 1793.02 | 1.302 | 98.31 | 
| 512 | 128 | 16384 | 0.288 | 1778.72 | 1.309 | 97.80 | 
| 512 | 128 | 16896 | 0.293 | 1745.22 | 1.316 | 97.26 | 
| 512 | 128 | 17408 | 0.295 | 1732.67 | 1.323 | 96.76 | 
| 512 | 128 | 17920 | 0.299 | 1714.14 | 1.331 | 96.19 | 
| 512 | 128 | 18432 | 0.301 | 1698.58 | 1.337 | 95.74 | 
| 512 | 128 | 18944 | 0.306 | 1675.72 | 1.350 | 94.84 | 
| 512 | 128 | 19456 | 0.307 | 1668.01 | 1.349 | 94.88 | 
| 512 | 128 | 19968 | 0.313 | 1636.65 | 1.360 | 94.11 | 
| 512 | 128 | 20480 | 0.314 | 1632.97 | 1.366 | 93.72 | 
| 512 | 128 | 20992 | 0.316 | 1620.05 | 1.374 | 93.17 | 
| 512 | 128 | 21504 | 0.319 | 1606.86 | 1.411 | 90.70 | 
| 512 | 128 | 22016 | 0.322 | 1590.15 | 1.426 | 89.75 | 
| 512 | 128 | 22528 | 0.327 | 1567.20 | 1.422 | 90.01 | 
| 512 | 128 | 23040 | 0.330 | 1553.12 | 1.425 | 89.83 | 
| 512 | 128 | 23552 | 0.333 | 1536.30 | 1.434 | 89.28 | 
| 512 | 128 | 24064 | 0.337 | 1520.89 | 1.434 | 89.24 | 
| 512 | 128 | 24576 | 0.339 | 1508.19 | 1.440 | 88.87 | 
| 512 | 128 | 25088 | 0.343 | 1492.82 | 1.446 | 88.52 | 
| 512 | 128 | 25600 | 0.344 | 1487.87 | 1.451 | 88.21 | 
| 512 | 128 | 26112 | 0.350 | 1461.28 | 1.459 | 87.74 | 
| 512 | 128 | 26624 | 0.350 | 1463.85 | 1.466 | 87.32 | 
| 512 | 128 | 27136 | 0.354 | 1445.83 | 1.474 | 86.86 | 
| 512 | 128 | 27648 | 0.357 | 1432.50 | 1.485 | 86.20 | 
| 512 | 128 | 28160 | 0.363 | 1410.51 | 1.487 | 86.10 | 
| 512 | 128 | 28672 | 0.365 | 1402.82 | 1.493 | 85.72 | 
| 512 | 128 | 29184 | 0.368 | 1389.55 | 1.502 | 85.22 | 
| 512 | 128 | 29696 | 0.371 | 1379.92 | 1.508 | 84.87 | 
| 512 | 128 | 30208 | 0.374 | 1367.99 | 1.514 | 84.55 | 
| 512 | 128 | 30720 | 0.377 | 1359.40 | 1.524 | 84.00 | 
| 512 | 128 | 31232 | 0.378 | 1353.18 | 1.529 | 83.72 | 
| 512 | 128 | 31744 | 0.382 | 1338.84 | 1.538 | 83.22 | 
| 512 | 128 | 32256 | 0.386 | 1327.16 | 1.578 | 81.10 | 
👈 PPL, KLD, Δp Statistics
In general these attempt to systematically measure the difference an unquantized model and a given quantized version. In general lower is better as it signals the quantized version performs more similarly to the original.Quantization is the process of compressing an original model's weights to shrink it down to run on limited hardware. Ideally the process minimizes errors and preserves the original uncompressed model's performance.
Perplexity (PPL) is a metric used to evaluate how well a language model predicts text. It essentially measures how "surprised" the model is by a given text—if the model is good at predicting the next word, the perplexity is low. For example, a model that generates coherent, contextually accurate text will have lower perplexity than one that produces random or nonsensical output.
In the context of LLM quantization (e.g., reducing model precision to save resources), perplexity is used to check if the compressed model retains its language understanding. Generally the PPL of the unquantized model is expected to be lower than the PPL of a quantized version.
However, in quantization-aware training (QAT), the model is trained to handle lower-precision weights (e.g., from bf16 to int4) during training, simulating the effects of quantization. This helps the model adapt to the reduced precision, potentially maintaining performance even after quantization.
The PPL of the unquantized bf16 model might not always be lower because the quantized model (trained with QAT) might retain performance close to the original bf16 model. If QAT is effective, the quantized model’s PPL could be similar to or even higher than the original, meaning the unquantized model’s PPL isn’t necessarily lower.
KL-Divergence (KLD) is a statistical measure that quantifies how different two probability distributions are. In the context of Large Language Models (LLMs) and quantization, it’s used to compare how a compressed (quantized) model differs from the original (unquantized) model in terms of their output probabilities.
If two models produce nearly identical predictions (e.g., same probabilitie for words in a sentence), their KLD is low. If their predictions diverge significantly (e.g., the quantized model chooses different word more often), the KLD is high.
Typically a very large KLD baseline data file is generated on the original (or least quantized) version of the model. This baseline is then compared against quantized versions to measure KLD as well as Δp.
Δp Token Probability Distribution Difference refers to the difference in token probability distributions between an unquantized (full-precision) model and a quantized model. It measures how much the probabilities assigned to individual tokens (e.g., words or subwords) change after quantization.
For example, for each token in a given input sequence, the unquantized model computes a probability distribution over the vocabulary (e.g., "the" has 10% chance, "cat" has 5%, etc.). The quantized model (e.g., IQ4_K or Q2_K_L) computes a similar distribution, but due to precision loss, the probabilities may shift. Δp is the absolute or relative difference between these two distributions for each token.
A specific example would be that the unquantized model assigns 0.2 to "cat" and the quantized model assigns 0.15, the Δp for "cat" is 0.05.
👈 Benchmark Suites
GPQA Diamond Set: A subset of 198 high-objectivity, challenging multiple-choice questions designed for advanced testing. Difficulty aligns with college-level or higher expertise in biology, physics, and chemistry. Intended for evaluating AI systems' ability to handle complex, domain-specific tasks requiring deep knowledge and critical thinking.
MBPP Mostly Basic Programming Problems is a benchmark dataset designed to evaluate large language models (LLMs) on programming tasks focusing on Python code. The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases.
MMLU-Pro is an enhanced benchmark designed to evaluate language understanding models across broader and more challenging tasks. Building on the Massive Multitask Language Understanding (MMLU) dataset, MMLU-Pro integrates more challenging, reasoning-focused questions and increases the answer choices per question from four to ten, significantly raising the difficulty and reducing the chance of success through random guessing. MMLU-Pro comprises over 12,000 rigorously curated questions from academic exams and textbooks, spanning 14 diverse domains including Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Others.
MT-Bench is a benchmark designed to evaluate the multi-turn conversational abilities and instruction-following skills of large language models (LLMs). Unlike traditional benchmarks that focus on closed-ended tasks (e.g., multiple-choice questions), MT-Bench emphasizes open-ended, real-world interactions to measure how well models handle complex, dynamic dialogues. By conducting a detailed analysis of real multi-turn dialogue data, we construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks.
MixEval is a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably and effortlessly updated every month to avoid contamination.
- ik_llama.cpp Qwen3 Quants Discussion
- calibration_data_v5_rc.txt - ubergarm uses tristandruyen's imatrix calibration dataset
- wiki.test.raw.gz
- ubergarm-kld-test-corpus.txt- Private gist available upon request if you don't use it for training or fine-tuning or imatrix calibration etc.
- visualization of Qwen3-30B-A3B imatrix statistics












Thank you for the benchmark. Do you also happen to test when the think mode was used? I am particularly interested to know Qwen 3 30B MoE IQ2_XS or IQ2_M for AIME '24 and '25, and also for GPQA Diamond.