Based on ggerganov/llama.cpp#4167
PP means "prompt processing" (bs = 512), TG means "text-generation" (bs = 1), t/s means "tokens per second"
BW [GB/s] |
GPU Cores |
F16 PP [t/s] |
F16 TG [t/s] |
Q8_0 PP [t/s] |
Q8_0 TG [t/s] |
Q4_0 PP [t/s] |
Q4_0 TG [t/s] |
|
---|---|---|---|---|---|---|---|---|
✅ M1 Pro 16GB | 200 | 14 | 262.65 | 12.75 | 235.16 | 21.95 | 232.55 | 35.52 |
✅ [M3 Pro 36 |