Skip to content

Instantly share code, notes, and snippets.

@selfup
Last active May 10, 2026 13:49
Show Gist options
  • Select an option

  • Save selfup/2702d6bbda4472ce165a1050d4d674d3 to your computer and use it in GitHub Desktop.

Select an option

Save selfup/2702d6bbda4472ce165a1050d4d674d3 to your computer and use it in GitHub Desktop.
Ministral 3 (3B, 8B, 14B) Instruct - M3 Ultra 96GB and M5 Max 36GB llama.cpp benchmark
# Apple Silicon and Linux x86 llama-bench: Ministral 3 family Q4_K_M depth sweep
# Testing for the Ministral 3 family (Ministral-3B, Ministral-8B, Ministral-14B) on Apple Silicon
# and (3B, 8B) on Ubuntu x86 with a 3060ti
# Instruct only no reasoning
llama-bench \
-m ~/.lmstudio/models/lmstudio-community/Ministral-3-3B-Instruct-2512-GGUF/Ministral-3-3B-Instruct-2512-Q4_K_M.gguf \
-m ~/.lmstudio/models/lmstudio-community/Ministral-3-8B-Instruct-2512-GGUF/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf \
-m ~/.lmstudio/models/lmstudio-community/Ministral-3-14B-Instruct-2512-GGUF/Ministral-3-14B-Instruct-2512-Q4_K_M.gguf \
-p 512 -n 128 -fa 1 -ngl 99 \
-d 0,4096,8192,16384 \
-o md > local-ministral-q4km-x_y_z.md 2>/dev/null
@selfup
Copy link
Copy Markdown
Author

selfup commented May 9, 2026

M5 Max 36GB (binned) (3B, 8B, and 14B)

model size params backend threads fa test t/s
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 pp512 4819.75 ± 12.18
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 tg128 144.31 ± 0.10
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 pp512 @ d4096 2521.25 ± 3.78
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 tg128 @ d4096 124.80 ± 1.25
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 pp512 @ d8192 1669.45 ± 29.63
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 tg128 @ d8192 110.67 ± 0.06
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 pp512 @ d16384 956.26 ± 13.74
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 6 1 tg128 @ d16384 89.22 ± 0.22
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 pp512 2138.77 ± 20.24
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 tg128 71.55 ± 0.01
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 pp512 @ d4096 1407.99 ± 3.10
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 tg128 @ d4096 65.19 ± 0.02
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 pp512 @ d8192 1034.77 ± 1.58
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 tg128 @ d8192 59.84 ± 0.02
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 pp512 @ d16384 645.73 ± 3.25
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 6 1 tg128 @ d16384 51.34 ± 0.07
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 pp512 1354.15 ± 2.06
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 tg128 46.76 ± 0.13
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 pp512 @ d4096 971.73 ± 0.51
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 tg128 @ d4096 43.59 ± 0.01
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 pp512 @ d8192 732.74 ± 2.31
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 tg128 @ d8192 40.70 ± 0.02
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 pp512 @ d16384 490.21 ± 1.30
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 6 1 tg128 @ d16384 35.97 ± 0.02

@selfup
Copy link
Copy Markdown
Author

selfup commented May 9, 2026

M3 Ultra 96GB (3B, 8B, and 14B)

model size params backend threads fa test t/s
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 pp512 2416.50 ± 2.82
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 tg128 155.32 ± 0.26
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 pp512 @ d4096 1779.07 ± 4.11
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 tg128 @ d4096 137.48 ± 1.98
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 pp512 @ d8192 1408.60 ± 2.96
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 tg128 @ d8192 126.19 ± 0.18
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 pp512 @ d16384 988.50 ± 1.77
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B BLAS,MTL 20 1 tg128 @ d16384 106.02 ± 0.18
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 pp512 1056.49 ± 0.53
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 tg128 86.99 ± 0.22
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 pp512 @ d4096 877.38 ± 0.98
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 tg128 @ d4096 80.11 ± 0.07
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 pp512 @ d8192 750.03 ± 0.65
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 tg128 @ d8192 74.40 ± 0.13
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 pp512 @ d16384 579.46 ± 0.38
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B BLAS,MTL 20 1 tg128 @ d16384 65.04 ± 0.09
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 pp512 653.67 ± 0.82
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 tg128 59.99 ± 0.06
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 pp512 @ d4096 570.07 ± 0.75
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 tg128 @ d4096 56.24 ± 0.05
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 pp512 @ d8192 504.48 ± 0.37
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 tg128 @ d8192 52.79 ± 0.03
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 pp512 @ d16384 409.19 ± 0.28
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B BLAS,MTL 20 1 tg128 @ d16384 47.31 ± 0.06

@selfup
Copy link
Copy Markdown
Author

selfup commented May 10, 2026

Ubuntu 3800x 32GB 3060ti (3B and 8B)

model size params backend ngl fa test t/s
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 5390.67 ± 69.51
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 139.09 ± 2.12
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d4096 4318.27 ± 36.41
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d4096 119.71 ± 1.62
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d8192 3475.10 ± 307.83
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d8192 106.04 ± 2.56
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d16384 2612.65 ± 144.04
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d16384 87.11 ± 1.15
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 2439.48 ± 192.02
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 68.64 ± 1.31
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d4096 2112.40 ± 180.81
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d4096 61.83 ± 0.70
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d8192 1849.18 ± 124.90
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d8192 56.62 ± 0.76
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d16384 1494.13 ± 107.49
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d16384 49.11 ± 0.63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment