MLX LM Benchmarks

Benchmarks for mlx-lm

The command for evaluating on MMLU Pro:

mlx_lm.evaluate --model model/repo --task mmlu_pro

The command for efficiency benchmarks:

mlx_lm.benchmark --model model/repo -p 2048 -g 128

Qwen/Qwen3-4B-Instruct-2507

Precision	MMLU Pro	Prompt (2048) tok/sec	Generation (128) tok/sec	Memory GB	Repo
bf16	64.05	1780.63	52.47	9.02	Qwen/Qwen3-4B-Instruct-2507
q8	63.85	1606.573	86.907	5.254	mlx-community/Qwen3-4B-Instruct-2507-8bit
q6	63.53	1576.73	104.68	4.25	mlx-community/Qwen3-4B-Instruct-2507-6bit
q5 g32	63.16	1570.80	110.29	4.00	mlx-community/Qwen3-4B-Instruct-2507-5bit-g32
q5	62.38	1584.33	116.39	3.86	mlx-community/Qwen3-4B-Instruct-2507-5bit
q4 g32	61.46	1610.03	126.00	3.603	mlx-community/Qwen3-4B-Instruct-2507-4bit-g32
q4	60.72	1622.27	134.52	3.35	mlx-community/Qwen3-4B-Instruct-2507-4bit

Qwen/Qwen3-30B-A3B-Instruct-2507

Precision	MMLU Pro	Prompt (2048) tok/sec	Generation (128) tok/sec	Memory GB	Repo
bf16	72.62	💀	💀	💀	Qwen/Qwen3-30B-A3B-Instruct-2507
q8	72.46	1719.47	83.16	33.46	mlx-community/Qwen3-30B-A3B-Instruct-2507-8bit
q6	72.41	1667.45	94.14	25.82	mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit
q5	71.97	1664.24	101.00	22.01	mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit
q4	70.71	1753.90	113.33	18.20	mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit

Qwen/Qwen3-Next-80B-A3B-Instruct

Precision	MMLU Pro	Prompt (2048) tok/sec	Generation (128) tok/sec	Memory GB	Repo
bf16	74.85	2131.38	53.94	162.57	Qwen/Qwen3-30B-A3B-Instruct-2507
q8	74.80	2160.22	65.11	86.79	mlx-community/Qwen3-Next-80B-A3B-Instruct-8bit
q6	75.25	2112.99	67.28	66.89	mlx-community/Qwen3-Next-80B-A3B-Instruct-6bit
q5	74.87	2113.03	69.63	56.93	mlx-community/Qwen3-Next-80B-A3B-Instruct-5bit
q4	74.13	2203.82	74.61	46.98	mlx-community/Qwen3-Next-80B-A3B-Instruct-4bit