Last active
May 11, 2026 22:48
-
-
Save selfup/8954c1850d9103dce94e8d191f57a119 to your computer and use it in GitHub Desktop.
Gemma 4 (E2B and E4B) - M5 Max 36GB (binned) and Ubuntu 3800x 3060ti llama.cpp benchmark
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Apple Silicon llama-bench: Gemma 4 (E2B and E4B) Q4_K_M depth sweep | |
| # Testing for Gemma 4 (E2B and E4B) on Apple Silicon | |
| llama-bench \ | |
| -m ~/.lmstudio/models/lmstudio-community/gemma-4-E2B-it-GGUF/gemma-4-E2B-it-Q4_K_M.gguf \ | |
| -m ~/.lmstudio/models/lmstudio-community/gemma-4-E4B-it-GGUF/gemma-4-E4B-it-Q4_K_M.gguf \ | |
| -p 512 -n 128 -fa 1 -ngl 99 \ | |
| -d 0,4096,8192,16384 \ | |
| -o md > local-gemma-q4km-m5_max_36gb.md 2>/dev/null |
selfup
commented
May 9, 2026
Author
| model | size | params | backend | threads | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | pp512 | 6407.09 ± 13.29 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | tg128 | 156.23 ± 0.22 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | pp512 @ d4096 | 4467.20 ± 18.36 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | tg128 @ d4096 | 147.44 ± 4.51 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | pp512 @ d8192 | 3540.88 ± 16.37 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | tg128 @ d8192 | 139.93 ± 1.04 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | pp512 @ d16384 | 2456.58 ± 6.16 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | BLAS,MTL | 6 | 1 | tg128 @ d16384 | 130.72 ± 0.10 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | pp512 | 3486.18 ± 13.98 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | tg128 | 93.63 ± 0.03 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | pp512 @ d4096 | 2829.55 ± 13.31 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | tg128 @ d4096 | 89.22 ± 1.64 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | pp512 @ d8192 | 2404.37 ± 7.81 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | tg128 @ d8192 | 86.82 ± 0.07 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | pp512 @ d16384 | 1792.39 ± 11.49 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | BLAS,MTL | 6 | 1 | tg128 @ d16384 | 81.57 ± 0.05 |
Author
Ubuntu 3800x 32GB 3060ti
| model | size | params | backend | ngl | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | pp512 | 6044.38 ± 382.66 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | tg128 | 159.16 ± 1.67 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | pp512 @ d4096 | 5289.56 ± 100.77 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | tg128 @ d4096 | 154.08 ± 0.24 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | pp512 @ d8192 | 4698.99 ± 75.61 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | tg128 @ d8192 | 150.27 ± 0.16 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | pp512 @ d16384 | 3849.82 ± 52.89 |
| gemma4 E2B Q4_K - Medium | 3.18 GiB | 4.65 B | CUDA | 99 | 1 | tg128 @ d16384 | 144.19 ± 0.25 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | pp512 | 3538.31 ± 150.44 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | tg128 | 94.83 ± 0.10 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | pp512 @ d4096 | 3119.65 ± 56.53 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | tg128 @ d4096 | 90.50 ± 0.52 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | pp512 @ d8192 | 2836.10 ± 35.33 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | tg128 @ d8192 | 88.40 ± 0.47 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | pp512 @ d16384 | 2460.92 ± 29.41 |
| gemma4 E4B Q4_K - Medium | 4.95 GiB | 7.52 B | CUDA | 99 | 1 | tg128 @ d16384 | 84.19 ± 0.34 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment