Skip to content

Instantly share code, notes, and snippets.

@estsauver
Last active April 30, 2025 11:48
Show Gist options
  • Save estsauver/a70c929398479f3166f3d69bcededac3 to your computer and use it in GitHub Desktop.
Save estsauver/a70c929398479f3166f3d69bcededac3 to your computer and use it in GitHub Desktop.
Local LLM Inference Speeds

All speeds captured on Apple M4 Max, 128GB Ram, using LM Studio

Qwen 3

Qwen-3-0.6b (Q8_0)

184.45 tok/sec, 8718 tokens, 0.04s to first token

Config: 10,000 token context limit, flash attention, evaluation batch size 4096

Qwen-3-8b (Q4_K_M)

63.15 tok/sec, 2585 tokens, 0.14s to first token

Config: 10,000 token context limit, flash attention, evaluation batch size 4096

https://screen.studio/share/yVIhJeDz

Qwen-3-30b-a3b

70.15 tok/sec, 3142 tokens, 0.62s to first token

Config: 10,000 token context limit, flash attention, evaluation batch size 4096

https://screen.studio/share/DUHG3k2J

Qwen-3-32b

11.72 tok/sec, 5554 tokens, 0.81s to first token

Config: 10,000 token context limit, flash attention, evaluation batch size 4096

Qwen-235b-a22b

8.08 tok/sec, 8731 tokens, 1.87s to first token

Config: 10,000 token context limit, flash attention, evaluation batch size 4096

Gemma-3-27b-it (Q8_0)

Run 1

14.49 tok/sec, 1594 tokens, 1.71s to first token

Config params: Evaluation Batch Size 4096, Flash attention on, no speculative decoding, context length 131072

Labeled Config 1 in Uploads

Run2

12.99 tok/sec, 1733 tokens, 0.79s to first token

Config Params: Evaluation batch size 4096, flash attention off, Context Size 10000 Labeled Config 2 in Uploads

Gemma 3 4b-it-qat (Q4_0)

100.54 tok/sec, 1803 tokens, 0.14s to first token

Config params, Evaluation bathc size 4096, flash attention off, context size 4096

https://screen.studio/share/P85E0Scw

Qwen 2.5 7b instruct Q8

49.67 tok/sec, 1074 tokens, 0.26s to first token

Config: evaluation batch size 4096, flash attention off, Context 10k

https://screen.studio/share/N49C0C2h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment