Skip to content

Instantly share code, notes, and snippets.

@awni
Last active October 14, 2025 09:43
Show Gist options
  • Save awni/c1790e4c3a39be6e8f1c4afd42423d2d to your computer and use it in GitHub Desktop.
Save awni/c1790e4c3a39be6e8f1c4afd42423d2d to your computer and use it in GitHub Desktop.
MLX LM Benchmarks

Benchmarks for mlx-lm

The command for evaluating on MMLU Pro:

mlx_lm.evaluate --model model/repo --task mmlu_pro

The command for efficiency benchmarks:

mlx_lm.benchmark --model model/repo -p 2048 -g 128
Qwen/Qwen3-4B-Instruct-2507
Precision MMLU Pro Prompt (2048) tok/sec Generation (128) tok/sec Memory GB Repo
bf16 64.05 1780.63 52.47 9.02 Qwen/Qwen3-4B-Instruct-2507
q8 63.85 1606.573 86.907 5.254 mlx-community/Qwen3-4B-Instruct-2507-8bit
q6 63.53 1576.73 104.68 4.25 mlx-community/Qwen3-4B-Instruct-2507-6bit
q5 g32 63.16 1570.80 110.29 4.00 mlx-community/Qwen3-4B-Instruct-2507-5bit-g32
q5 62.38 1584.33 116.39 3.86 mlx-community/Qwen3-4B-Instruct-2507-5bit
q4 g32 61.46 1610.03 126.00 3.603 mlx-community/Qwen3-4B-Instruct-2507-4bit-g32
q4 60.72 1622.27 134.52 3.35 mlx-community/Qwen3-4B-Instruct-2507-4bit
  • Performance benchmark on 64GB M4 Max
  • mlx 0.29.2.dev20251008+85a8824a8
  • mlx-lm 0.28.2
  • macOS 26.1
Qwen/Qwen3-30B-A3B-Instruct-2507
Precision MMLU Pro Prompt (2048) tok/sec Generation (128) tok/sec Memory GB Repo
bf16 72.62 πŸ’€ πŸ’€ πŸ’€ Qwen/Qwen3-30B-A3B-Instruct-2507
q8 72.46 1719.47 83.16 33.46 mlx-community/Qwen3-30B-A3B-Instruct-2507-8bit
q6 72.41 1667.45 94.14 25.82 mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit
q5 71.97 1664.24 101.00 22.01 mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit
q4 70.71 1753.90 113.33 18.20 mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit
  • Performance benchmarks on 64GB M4 Max
  • mlx 0.29.2.dev20251008+85a8824a8
  • mlx-lm 0.28.2
  • macOS 26.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment