Skip to content

Instantly share code, notes, and snippets.

@selfup
Last active May 11, 2026 22:48
Show Gist options
  • Select an option

  • Save selfup/8954c1850d9103dce94e8d191f57a119 to your computer and use it in GitHub Desktop.

Select an option

Save selfup/8954c1850d9103dce94e8d191f57a119 to your computer and use it in GitHub Desktop.
Gemma 4 (E2B and E4B) - M5 Max 36GB (binned) and Ubuntu 3800x 3060ti llama.cpp benchmark
# Apple Silicon llama-bench: Gemma 4 (E2B and E4B) Q4_K_M depth sweep
# Testing for Gemma 4 (E2B and E4B) on Apple Silicon
llama-bench \
-m ~/.lmstudio/models/lmstudio-community/gemma-4-E2B-it-GGUF/gemma-4-E2B-it-Q4_K_M.gguf \
-m ~/.lmstudio/models/lmstudio-community/gemma-4-E4B-it-GGUF/gemma-4-E4B-it-Q4_K_M.gguf \
-p 512 -n 128 -fa 1 -ngl 99 \
-d 0,4096,8192,16384 \
-o md > local-gemma-q4km-m5_max_36gb.md 2>/dev/null
@selfup

selfup commented May 9, 2026

Copy link
Copy Markdown
Author
model size params backend threads fa test t/s
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 pp512 6407.09 ± 13.29
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 tg128 156.23 ± 0.22
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 pp512 @ d4096 4467.20 ± 18.36
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 tg128 @ d4096 147.44 ± 4.51
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 pp512 @ d8192 3540.88 ± 16.37
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 tg128 @ d8192 139.93 ± 1.04
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 pp512 @ d16384 2456.58 ± 6.16
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B BLAS,MTL 6 1 tg128 @ d16384 130.72 ± 0.10
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 pp512 3486.18 ± 13.98
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 tg128 93.63 ± 0.03
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 pp512 @ d4096 2829.55 ± 13.31
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 tg128 @ d4096 89.22 ± 1.64
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 pp512 @ d8192 2404.37 ± 7.81
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 tg128 @ d8192 86.82 ± 0.07
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 pp512 @ d16384 1792.39 ± 11.49
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B BLAS,MTL 6 1 tg128 @ d16384 81.57 ± 0.05

@selfup

selfup commented May 11, 2026

Copy link
Copy Markdown
Author

Ubuntu 3800x 32GB 3060ti

model size params backend ngl fa test t/s
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 pp512 6044.38 ± 382.66
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 tg128 159.16 ± 1.67
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 pp512 @ d4096 5289.56 ± 100.77
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 tg128 @ d4096 154.08 ± 0.24
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 pp512 @ d8192 4698.99 ± 75.61
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 tg128 @ d8192 150.27 ± 0.16
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 pp512 @ d16384 3849.82 ± 52.89
gemma4 E2B Q4_K - Medium 3.18 GiB 4.65 B CUDA 99 1 tg128 @ d16384 144.19 ± 0.25
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 pp512 3538.31 ± 150.44
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 tg128 94.83 ± 0.10
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 pp512 @ d4096 3119.65 ± 56.53
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 tg128 @ d4096 90.50 ± 0.52
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 pp512 @ d8192 2836.10 ± 35.33
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 tg128 @ d8192 88.40 ± 0.47
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 pp512 @ d16384 2460.92 ± 29.41
gemma4 E4B Q4_K - Medium 4.95 GiB 7.52 B CUDA 99 1 tg128 @ d16384 84.19 ± 0.34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment