Skip to content

Instantly share code, notes, and snippets.

@HDCharles
Created May 8, 2026 04:23
Show Gist options
  • Select an option

  • Save HDCharles/ec03e247f964afa91b10ec679e74c262 to your computer and use it in GitHub Desktop.

Select an option

Save HDCharles/ec03e247f964afa91b10ec679e74c262 to your computer and use it in GitHub Desktop.
Obs Refactor Eval Results

NVFP4 Evals on B200

model                          scheme     technique    task                         main             PR       change
--------------------------------------------------------------------------------------------------------------------
Meta-Llama-3-8B-Instruct       NVFP4      awq_rtn      gsm8k_platinum               71.46%           69.89%       -2.20%
Qwen2.5-3B-Instruct            NVFP4      awq_rtn      gsm8k_platinum               23.33%           29.61%      +26.92% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B                  NVFP4      awq_rtn      gsm8k_platinum               92.31%           91.32%       -1.07%

Meta-Llama-3-8B-Instruct       NVFP4      awq_rtn      mmlu                       63.55%**           63.46%       -0.14%
Qwen2.5-3B-Instruct            NVFP4      awq_rtn      mmlu                       63.12%**         63.34%**       +0.35%
Qwen3-30B-A3B                  NVFP4      awq_rtn      mmlu                         78.46%           78.39%       -0.09%

Meta-Llama-3-8B-Instruct       NVFP4      awq_rtn      wikitext                      20.26            18.86       +6.91% 
Qwen2.5-3B-Instruct            NVFP4      awq_rtn      wikitext                    12.84**          12.83**       +0.08%
Qwen3-30B-A3B                  NVFP4      awq_rtn      wikitext                    12.05**          12.03**       +0.17% 

# overall slightly positive awq_rtn

Meta-Llama-3-8B-Instruct       NVFP4      gptq         gsm8k_platinum             72.37%**         72.54%**       +0.23%
Qwen2.5-3B-Instruct            NVFP4      gptq         gsm8k_platinum             54.67%**           49.88%       -8.76% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B                  NVFP4      gptq         gsm8k_platinum               92.31%           92.22%       -0.10%

Meta-Llama-3-8B-Instruct       NVFP4      gptq         mmlu                         62.97%           62.88%       -0.14% fine
Qwen2.5-3B-Instruct            NVFP4      gptq         mmlu                         62.93%           63.15%       +0.35%
Qwen3-30B-A3B                  NVFP4      gptq         mmlu                       78.61%**           78.37%       -0.31%

Meta-Llama-3-8B-Instruct       NVFP4      gptq         wikitext                    1021.43          1442.89      -41.26% ???? i ran this multiple times with/without chat template, no idea, other evals look fine for this model
Qwen2.5-3B-Instruct            NVFP4      gptq         wikitext                      12.97            13.01       -0.31%
Qwen3-30B-A3B                  NVFP4      gptq         wikitext                      12.10            12.14       -0.33%

# overall slightly negative seems ok

Meta-Llama-3-8B-Instruct       NVFP4      imatrix      gsm8k_platinum                0.00%           71.38%          N/A (bug main - fixed)
Qwen2.5-3B-Instruct            NVFP4      imatrix      gsm8k_platinum                0.00%         52.44%**          N/A (bug main - fixed)
Qwen3-30B-A3B                  NVFP4      imatrix      gsm8k_platinum               89.33%         92.97%**       +4.07% (bug main - fixed)
Meta-Llama-3-8B-Instruct       NVFP4      imatrix      mmlu                         25.37%         63.67%**     +150.97% (bug main - fixed)
Qwen2.5-3B-Instruct            NVFP4      imatrix      mmlu                         24.25%           62.26%     +156.74% (bug main - fixed)
Qwen3-30B-A3B                  NVFP4      imatrix      mmlu                         77.44%           78.36%       +1.19% (bug main - fixed)
Meta-Llama-3-8B-Instruct       NVFP4      imatrix      wikitext                   86574.63            18.47      +99.98% (bug main - fixed)
Qwen2.5-3B-Instruct            NVFP4      imatrix      wikitext                     610.98            13.28      +97.83% (bug main - fixed)
Qwen3-30B-A3B                  NVFP4      imatrix      wikitext                      13.20          12.03**       +8.86% (bug main - fixed)

# bugfix, i don't really understand why the Qwen3-30B-A3B results look reasonable here on main but either way this is fine on PR

Meta-Llama-3-8B-Instruct       NVFP4      rtn_mse      gsm8k_platinum               71.55%           72.04%       +0.68%
Qwen2.5-3B-Instruct            NVFP4      rtn_mse      gsm8k_platinum               29.45%           48.30%      +64.01% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B                  NVFP4      rtn_mse      gsm8k_platinum             93.71%**           92.22%       -1.59%
Meta-Llama-3-8B-Instruct       NVFP4      rtn_mse      mmlu                         63.20%           63.38%       +0.28%
Qwen2.5-3B-Instruct            NVFP4      rtn_mse      mmlu                         62.42%           62.59%       +0.27%
Qwen3-30B-A3B                  NVFP4      rtn_mse      mmlu                         78.36%         78.61%**       +0.32%
Meta-Llama-3-8B-Instruct       NVFP4      rtn_mse      wikitext                      20.52            18.09      +11.84%
Qwen2.5-3B-Instruct            NVFP4      rtn_mse      wikitext                      13.16            13.25       -0.68%
Qwen3-30B-A3B                  NVFP4      rtn_mse      wikitext                      12.17            12.23       -0.49%

# looks good, note this implies that MSE ignoring global scale is great and also that ignoring quant for static activation global_scale looks fine

Meta-Llama-3-8B-Instruct       NVFP4      rtn          gsm8k_platinum               70.64%           70.80%       +0.23%
Qwen2.5-3B-Instruct            NVFP4      rtn          gsm8k_platinum               52.85%           50.12%       -5.17% volatile (Qwen2.5-3B seems to be all over the place for gsm8k_plat)
Qwen3-30B-A3B                  NVFP4      rtn          gsm8k_platinum               92.22%           92.64%       +0.46%
Meta-Llama-3-8B-Instruct       NVFP4      rtn          mmlu                         63.25%           63.38%       +0.21%
Qwen2.5-3B-Instruct            NVFP4      rtn          mmlu                         62.78%           62.59%       -0.30%
Qwen3-30B-A3B                  NVFP4      rtn          mmlu                         78.26%           78.46%       +0.26%
Meta-Llama-3-8B-Instruct       NVFP4      rtn          wikitext                    19.53**          17.82**       +8.76%
Qwen2.5-3B-Instruct            NVFP4      rtn          wikitext                      13.27            13.27       +0.00%
Qwen3-30B-A3B                  NVFP4      rtn          wikitext                      12.12            12.16       -0.33%

# ignoring quant for static activation global_scale seems fine, potentially helpful

FP8/W4A16 Evals on H200


# note some of the FP8 runs failed for Qwen2.5-3B-Instruct because someone jumped on the GPUs i was using
# also gsm8k for Qwen2.5-3B seems to be very volatile
model                          scheme     technique    task                         main             PR       change
--------------------------------------------------------------------------------------------------------------------
Meta-Llama-3-8B-Instruct       FP8        awq_rtn      gsm8k_platinum             76.59%         78.00%       +1.84%
Qwen3-30B-A3B                  FP8        awq_rtn      gsm8k_platinum             93.22%         93.13%       -0.10%
Meta-Llama-3-8B-Instruct       FP8        awq_rtn      mmlu                       65.62%         65.80%       +0.27%
Qwen3-30B-A3B                  FP8        awq_rtn      mmlu                       79.60%         79.63%       +0.04%
Meta-Llama-3-8B-Instruct       FP8        awq_rtn      wikitext                    19.42          18.72       +3.60%
Qwen3-30B-A3B                  FP8        awq_rtn      wikitext                    11.70          11.64       +0.51%

# small improvement

Meta-Llama-3-8B-Instruct       W4A16      awq_rtn      gsm8k_platinum             72.62%         71.96%       -0.91% biggest real drop
Qwen2.5-3B-Instruct            W4A16      awq_rtn      gsm8k_platinum             21.84%         20.68%       -5.31% volatile
Qwen3-30B-A3B                  W4A16      awq_rtn      gsm8k_platinum             91.73%         91.89%       +0.17%
Meta-Llama-3-8B-Instruct       W4A16      awq_rtn      mmlu                       64.07%         64.28%       +0.33%
Qwen2.5-3B-Instruct            W4A16      awq_rtn      mmlu                       64.49%         64.35%       -0.22%
Qwen3-30B-A3B                  W4A16      awq_rtn      mmlu                       78.79%         78.65%       -0.18%
Meta-Llama-3-8B-Instruct       W4A16      awq_rtn      wikitext                    11.50          11.50       +0.00%
Qwen2.5-3B-Instruct            W4A16      awq_rtn      wikitext                    12.78          12.78       +0.00%
Qwen3-30B-A3B                  W4A16      awq_rtn      wikitext                    11.98          11.99       -0.08%

# small drop

Meta-Llama-3-8B-Instruct       FP8        gptq         gsm8k_platinum             77.25%         77.92%       +0.87%
Qwen3-30B-A3B                  FP8        gptq         gsm8k_platinum             93.30%         92.64%       -0.71%
Meta-Llama-3-8B-Instruct       FP8        gptq         mmlu                       66.12%         65.82%       -0.45%
Qwen3-30B-A3B                  FP8        gptq         mmlu                       79.42%         79.36%       -0.08%
Meta-Llama-3-8B-Instruct       FP8        gptq         wikitext                    18.82          18.91       -0.48%
Qwen3-30B-A3B                  FP8        gptq         wikitext                    11.64          11.61       +0.26%

# steady

Meta-Llama-3-8B-Instruct       W4A16      gptq         gsm8k_platinum             74.03%         74.11%       +0.11%
Qwen2.5-3B-Instruct            W4A16      gptq         gsm8k_platinum             35.81%         47.56%      +32.81% volatile
Qwen3-30B-A3B                  W4A16      gptq         gsm8k_platinum             90.90%         92.14%       +1.36%
Meta-Llama-3-8B-Instruct       W4A16      gptq         mmlu                       64.75%         64.55%       -0.31%
Qwen2.5-3B-Instruct            W4A16      gptq         mmlu                       64.06%         64.28%       +0.34%
Qwen3-30B-A3B                  W4A16      gptq         mmlu                       78.88%         79.06%       +0.23%
Meta-Llama-3-8B-Instruct       W4A16      gptq         wikitext                    11.39          11.49       -0.88%
Qwen2.5-3B-Instruct            W4A16      gptq         wikitext                    12.55          12.54       +0.08%
Qwen3-30B-A3B                  W4A16      gptq         wikitext                    11.87          11.92       -0.42%

# steady aside from the volatile one

Meta-Llama-3-8B-Instruct       FP8        imatrix      gsm8k_platinum             76.26%         75.77%       -0.64%
Qwen2.5-3B-Instruct            FP8        imatrix      gsm8k_platinum             19.69%         19.93%       +1.22% volatile
Qwen3-30B-A3B                  FP8        imatrix      gsm8k_platinum             93.47%         93.47%       +0.00%
Meta-Llama-3-8B-Instruct       FP8        imatrix      mmlu                       65.81%         65.74%       -0.11%
Qwen2.5-3B-Instruct            FP8        imatrix      mmlu                       65.97%         65.90%       -0.11%
Qwen3-30B-A3B                  FP8        imatrix      mmlu                       79.45%         79.51%       +0.08%
Meta-Llama-3-8B-Instruct       FP8        imatrix      wikitext                    19.69          18.99       +3.56%
Qwen2.5-3B-Instruct            FP8        imatrix      wikitext                    11.80          11.80       +0.00%
Qwen3-30B-A3B                  FP8        imatrix      wikitext                    11.63          11.59       +0.34%

# small improvement

Meta-Llama-3-8B-Instruct       W4A16      imatrix      gsm8k_platinum             72.54%         72.29%       -0.34%
Qwen2.5-3B-Instruct            W4A16      imatrix      gsm8k_platinum             43.09%         42.85%       -0.56% volatile
Qwen3-30B-A3B                  W4A16      imatrix      gsm8k_platinum             90.65%         91.48%       +0.92%
Meta-Llama-3-8B-Instruct       W4A16      imatrix      mmlu                       64.66%         64.67%       +0.02%
Qwen2.5-3B-Instruct            W4A16      imatrix      mmlu                       64.00%         64.01%       +0.02%
Qwen3-30B-A3B                  W4A16      imatrix      mmlu                       78.47%         78.48%       +0.01%
Meta-Llama-3-8B-Instruct       W4A16      imatrix      wikitext                    11.45          11.45       +0.00%
Qwen2.5-3B-Instruct            W4A16      imatrix      wikitext                    12.66          12.66       +0.00%
Qwen3-30B-A3B                  W4A16      imatrix      wikitext                    12.08          12.08       +0.00%

# steady

Meta-Llama-3-8B-Instruct       FP8        rtn_mse      gsm8k_platinum             78.08%         78.66%       +0.74%
Qwen3-30B-A3B                  FP8        rtn_mse      gsm8k_platinum             93.63%         93.55%       -0.09%
Meta-Llama-3-8B-Instruct       FP8        rtn_mse      mmlu                       65.59%         65.80%       +0.32%
Qwen3-30B-A3B                  FP8        rtn_mse      mmlu                       79.52%         79.55%       +0.04%
Meta-Llama-3-8B-Instruct       FP8        rtn_mse      wikitext                    19.17          18.72       +2.35%
Qwen2.5-3B-Instruct            FP8        rtn_mse      wikitext                    11.78          11.77       +0.08%
Qwen3-30B-A3B                  FP8        rtn_mse      wikitext                    11.60          11.59       +0.09%

# small improvement

Meta-Llama-3-8B-Instruct       W4A16      rtn_mse      gsm8k_platinum             68.40%         67.99%       -0.60%
Qwen2.5-3B-Instruct            W4A16      rtn_mse      gsm8k_platinum             55.00%         52.69%       -4.20% volatile
Qwen3-30B-A3B                  W4A16      rtn_mse      gsm8k_platinum             90.90%         90.32%       -0.64%
Meta-Llama-3-8B-Instruct       W4A16      rtn_mse      wikitext                    11.63          11.63       +0.00%
Qwen2.5-3B-Instruct            W4A16      rtn_mse      wikitext                    14.13          14.13       +0.00%
Qwen3-30B-A3B                  W4A16      rtn_mse      wikitext                    12.24          12.23       +0.08%
Meta-Llama-3-8B-Instruct       W4A16      rtn_mse      mmlu                       63.20%         63.22%       +0.03%
Qwen2.5-3B-Instruct            W4A16      rtn_mse      mmlu                       62.53%         62.56%       +0.05%
Qwen3-30B-A3B                  W4A16      rtn_mse      mmlu                       78.14%         78.10%       -0.05%

# steady

Meta-Llama-3-8B-Instruct       FP8        rtn          gsm8k_platinum             77.42%         77.50%       +0.10%
Qwen3-30B-A3B                  FP8        rtn          gsm8k_platinum             92.64%         92.22%       -0.45%
Meta-Llama-3-8B-Instruct       FP8        rtn          mmlu                       65.80%         65.93%       +0.20%
Qwen3-30B-A3B                  FP8        rtn          mmlu                       79.48%         79.35%       -0.16%
Meta-Llama-3-8B-Instruct       FP8        rtn          wikitext                    18.99          18.38       +3.21%
Qwen3-30B-A3B                  FP8        rtn          wikitext                    11.58          11.59       -0.09%

# small improvement

Meta-Llama-3-8B-Instruct       W4A16      rtn          gsm8k_platinum             69.89%         70.89%       +1.43%
Qwen2.5-3B-Instruct            W4A16      rtn          gsm8k_platinum             46.98%         46.48%       -1.06% volatile
Qwen3-30B-A3B                  W4A16      rtn          gsm8k_platinum             90.57%         91.07%       +0.55%
Meta-Llama-3-8B-Instruct       W4A16      rtn          mmlu                       63.45%         63.55%       +0.16%
Qwen2.5-3B-Instruct            W4A16      rtn          mmlu                       58.08%         58.08%       +0.00%
Qwen3-30B-A3B                  W4A16      rtn          mmlu                       78.43%         78.35%       -0.10%
Meta-Llama-3-8B-Instruct       W4A16      rtn          wikitext                    11.67          11.67       +0.00%
Qwen2.5-3B-Instruct            W4A16      rtn          wikitext                    16.25          16.26       -0.06%
Qwen3-30B-A3B                  W4A16      rtn          wikitext                    11.99          11.99       +0.00%

# steady

DDP2 NVFP4 tests on B200



model                          scheme     technique    task                         main             PR       change
--------------------------------------------------------------------------------------------------------------------
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      awq_rtn      gsm8k_platinum             73.53%**           71.55%       -2.69%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      awq_rtn      mmlu                         63.46%           63.21%       -0.39%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      awq_rtn      wikitext                      20.10            18.92       +5.87%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      gptq         gsm8k_platinum               71.88%           70.22%       -2.31%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      gptq         mmlu                       63.60%**         63.99%**       +0.61%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      gptq         wikitext                    18.36**            18.20       +0.87%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      imatrix      gsm8k_platinum                0.00%         71.96%**          N/A
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      imatrix      mmlu                         23.79%           63.50%     +166.92%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      imatrix      wikitext                   74398.66            18.40      +99.98%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      rtn          gsm8k_platinum               71.38%           70.97%       -0.57%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      rtn          mmlu                         62.87%           63.09%       +0.35%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      rtn          wikitext                      18.96          17.85**       +5.85%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      rtn_mse      gsm8k_platinum               71.05%           70.89%       -0.23%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      rtn_mse      mmlu                         63.32%           63.12%       -0.32%
Meta-Llama-3-8B-Instruct-DDP2  NVFP4      rtn_mse      wikitext                      21.24            18.10      +14.78%

seems ok, a little volatile, but indicates everything is working more or less as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment