llama.cpp version: https://github.com/ggerganov/llama.cpp/commit/925e5584a058afb612f9c20bc472c130f5d0f891
LLM: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q4_K_M.gguf
llama-bench -m ../models/llama-2-7b-chat.Q4_K_M.gguf
model | size | params | backend | threads | test | t/s |
---|---|---|---|---|---|---|
llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | BLAS | 4 | pp 512 | 7.58 ± 0.08 |
llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | BLAS | 4 | tg 128 | 6.27 ± 0.01 |
model | size | params | backend | threads | test | t/s |
---|---|---|---|---|---|---|
llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | BLAS | 8 | pp 512 | 27.12 ± 0.39 |
llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | BLAS | 8 | tg 128 | 11.31 ± 0.01 |
model | size | params | backend | ngl | test | t/s |
---|---|---|---|---|---|---|
llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Metal | 99 | pp 512 | 229.66 ± 7.05 |
llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Metal | 99 | tg 128 | 28.99 ± 0.19 |