Last active
May 10, 2026 13:49
-
-
Save selfup/2702d6bbda4472ce165a1050d4d674d3 to your computer and use it in GitHub Desktop.
Ministral 3 (3B, 8B, 14B) Instruct - M3 Ultra 96GB and M5 Max 36GB llama.cpp benchmark
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Apple Silicon and Linux x86 llama-bench: Ministral 3 family Q4_K_M depth sweep | |
| # Testing for the Ministral 3 family (Ministral-3B, Ministral-8B, Ministral-14B) on Apple Silicon | |
| # and (3B, 8B) on Ubuntu x86 with a 3060ti | |
| # Instruct only no reasoning | |
| llama-bench \ | |
| -m ~/.lmstudio/models/lmstudio-community/Ministral-3-3B-Instruct-2512-GGUF/Ministral-3-3B-Instruct-2512-Q4_K_M.gguf \ | |
| -m ~/.lmstudio/models/lmstudio-community/Ministral-3-8B-Instruct-2512-GGUF/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf \ | |
| -m ~/.lmstudio/models/lmstudio-community/Ministral-3-14B-Instruct-2512-GGUF/Ministral-3-14B-Instruct-2512-Q4_K_M.gguf \ | |
| -p 512 -n 128 -fa 1 -ngl 99 \ | |
| -d 0,4096,8192,16384 \ | |
| -o md > local-ministral-q4km-x_y_z.md 2>/dev/null |
Author
Author
M3 Ultra 96GB (3B, 8B, and 14B)
| model | size | params | backend | threads | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | pp512 | 2416.50 ± 2.82 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | tg128 | 155.32 ± 0.26 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | pp512 @ d4096 | 1779.07 ± 4.11 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | tg128 @ d4096 | 137.48 ± 1.98 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | pp512 @ d8192 | 1408.60 ± 2.96 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | tg128 @ d8192 | 126.19 ± 0.18 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | pp512 @ d16384 | 988.50 ± 1.77 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | BLAS,MTL | 20 | 1 | tg128 @ d16384 | 106.02 ± 0.18 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | pp512 | 1056.49 ± 0.53 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | tg128 | 86.99 ± 0.22 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | pp512 @ d4096 | 877.38 ± 0.98 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | tg128 @ d4096 | 80.11 ± 0.07 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | pp512 @ d8192 | 750.03 ± 0.65 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | tg128 @ d8192 | 74.40 ± 0.13 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | pp512 @ d16384 | 579.46 ± 0.38 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | BLAS,MTL | 20 | 1 | tg128 @ d16384 | 65.04 ± 0.09 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | pp512 | 653.67 ± 0.82 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | tg128 | 59.99 ± 0.06 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | pp512 @ d4096 | 570.07 ± 0.75 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | tg128 @ d4096 | 56.24 ± 0.05 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | pp512 @ d8192 | 504.48 ± 0.37 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | tg128 @ d8192 | 52.79 ± 0.03 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | pp512 @ d16384 | 409.19 ± 0.28 |
| mistral3 14B Q4_K - Medium | 7.67 GiB | 13.51 B | BLAS,MTL | 20 | 1 | tg128 @ d16384 | 47.31 ± 0.06 |
Author
Ubuntu 3800x 32GB 3060ti (3B and 8B)
| model | size | params | backend | ngl | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | pp512 | 5390.67 ± 69.51 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | tg128 | 139.09 ± 2.12 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | pp512 @ d4096 | 4318.27 ± 36.41 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | tg128 @ d4096 | 119.71 ± 1.62 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | pp512 @ d8192 | 3475.10 ± 307.83 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | tg128 @ d8192 | 106.04 ± 2.56 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | pp512 @ d16384 | 2612.65 ± 144.04 |
| mistral3 3B Q4_K - Medium | 1.99 GiB | 3.43 B | CUDA | 99 | 1 | tg128 @ d16384 | 87.11 ± 1.15 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | pp512 | 2439.48 ± 192.02 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | tg128 | 68.64 ± 1.31 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | pp512 @ d4096 | 2112.40 ± 180.81 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | tg128 @ d4096 | 61.83 ± 0.70 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | pp512 @ d8192 | 1849.18 ± 124.90 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | tg128 @ d8192 | 56.62 ± 0.76 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | pp512 @ d16384 | 1494.13 ± 107.49 |
| mistral3 8B Q4_K - Medium | 4.83 GiB | 8.49 B | CUDA | 99 | 1 | tg128 @ d16384 | 49.11 ± 0.63 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
M5 Max 36GB (binned) (3B, 8B, and 14B)