The command for evaluating on MMLU Pro:
mlx_lm.evaluate --model model/repo --task mmlu_pro
The command for efficiency benchmarks:
mlx_lm.benchmark --model model/repo -p 2048 -g 128
Qwen/Qwen3-4B-Instruct-2507
| Precision |
MMLU Pro |
Prompt (2048) tok/sec |
Generation (128) tok/sec |
Memory GB |
Repo |
| bf16 |
64.05 |
1780.63 |
52.47 |
9.02 |
Qwen/Qwen3-4B-Instruct-2507 |
| q8 |
63.85 |
1606.573 |
86.907 |
5.254 |
mlx-community/Qwen3-4B-Instruct-2507-8bit |
| q6 |
63.53 |
1576.73 |
104.68 |
4.25 |
mlx-community/Qwen3-4B-Instruct-2507-6bit |
| q5 g32 |
63.16 |
1570.80 |
110.29 |
4.00 |
mlx-community/Qwen3-4B-Instruct-2507-5bit-g32 |
| q5 |
62.38 |
1584.33 |
116.39 |
3.86 |
mlx-community/Qwen3-4B-Instruct-2507-5bit |
| q4 g32 |
61.46 |
1610.03 |
126.00 |
3.603 |
mlx-community/Qwen3-4B-Instruct-2507-4bit-g32 |
| q4 |
60.72 |
1622.27 |
134.52 |
3.35 |
mlx-community/Qwen3-4B-Instruct-2507-4bit |
- Performance benchmark on 64GB M4 Max
- mlx 0.29.2.dev20251008+85a8824a8
- mlx-lm 0.28.2
- macOS 26.1
Qwen/Qwen3-30B-A3B-Instruct-2507
| Precision |
MMLU Pro |
Prompt (2048) tok/sec |
Generation (128) tok/sec |
Memory GB |
Repo |
| bf16 |
72.62 |
π |
π |
π |
Qwen/Qwen3-30B-A3B-Instruct-2507 |
| q8 |
72.46 |
1719.47 |
83.16 |
33.46 |
mlx-community/Qwen3-30B-A3B-Instruct-2507-8bit |
| q6 |
72.41 |
1667.45 |
94.14 |
25.82 |
mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit |
| q5 |
71.97 |
1664.24 |
101.00 |
22.01 |
mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit |
| q4 |
70.71 |
1753.90 |
113.33 |
18.20 |
mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit |
- Performance benchmarks on 64GB M4 Max
- mlx 0.29.2.dev20251008+85a8824a8
- mlx-lm 0.28.2
- macOS 26.1
Qwen/Qwen3-Next-80B-A3B-Instruct
| Precision |
MMLU Pro |
Prompt (2048) tok/sec |
Generation (128) tok/sec |
Memory GB |
Repo |
| bf16 |
74.85 |
2131.38 |
53.94 |
162.57 |
Qwen/Qwen3-30B-A3B-Instruct-2507 |
| q8 |
74.80 |
2160.22 |
65.11 |
86.79 |
mlx-community/Qwen3-Next-80B-A3B-Instruct-8bit |
| q6 |
75.25 |
2112.99 |
67.28 |
66.89 |
mlx-community/Qwen3-Next-80B-A3B-Instruct-6bit |
| q5 |
74.87 |
2113.03 |
69.63 |
56.93 |
mlx-community/Qwen3-Next-80B-A3B-Instruct-5bit |
| q4 |
74.13 |
2203.82 |
74.61 |
46.98 |
mlx-community/Qwen3-Next-80B-A3B-Instruct-4bit |
- Performance benchmarks on 512GB M3 Ultra
- mlx 0.29.2
- mlx-lm 0.28.2
- macOS 26.1