Skip to content

Instantly share code, notes, and snippets.

@malfet
Last active January 15, 2025 17:06
Show Gist options
  • Save malfet/0f7aea327853a4daf4920be5e0afc39e to your computer and use it in GitHub Desktop.
Save malfet/0f7aea327853a4daf4920be5e0afc39e to your computer and use it in GitHub Desktop.
PyTorch LLM perf
dtype SOTA 2.2.2+eager 2.3.0+eager 2.3.0+compile trunk + compile
bfloat16 (M1) 111 tokens/sec 110 tokens/sec 80 tokens/sec
float32 (M1) 687 tokens/sec 165 tokens/sec 176 tokens/sec
float16 (M1) 1106 tokens/sec 50 tokens/sec 187 tokens/sec
float16 (LinX86) 40 tokens/sec 43 tokens/sec 173 tokens/sec
float32 (LinX86) 38 tokens/sec 40 tokens/sec 179 tokens/sec
bfloat16 (LinX86) 73 tokens/sec 78 tokens/sec 180 tokens/sec
bfloat16 (M2Pro) 137 tokens/sec 147 tokens/sec 116 tokens/sec 228 tokens/sec
float32 (M2Pro) 947 tokens/sec 176 tokens/sec 301 tokens/sec 121 tokens/sec 460 tokens/sec
float16 (M2Pro) 1330 tokens/sec 56 tokens/sec 330 tokens/sec 116 tokens/sec 420 tokens/sec

Eager numbers are collected by running following script (gpt-fast gives a slightly higher number on eager as it preallocates KVCache even if it's longer then models context length):

 python run_llama.py --model-path stories15M.pt --random-seed 42 --dtype float16

Compile numbers are collected by running following script

python generate.py --checkpoint_path checkpoints/stories15M/stories15M.pt --prompt "Once upon a time" --compile --device cpu

P.S. Compile does not work out of the box on Mac with 2.3.0 right now, one needs to symlink libiomp into the right location LinX86 is Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz

dtype SOTA 2.2.2+eager 2.3.0+eager 2.3.0+compile trunk trunk + compile
float16 (LinX86)
float32 (LinX86) 3 tokens/sec 4(5?) tok/sec 4 tok/sec
bfloat16 (LinX86) 5 tokens/sec 3(6?) tok/sec 6 tok/sec
float16 (M2Pro) 8 tokens/sec 1.5 tok/sec .9 tok/sec 1.5 tok/sec
float16 (M2Pro+MPS) 13 tokens/sec 9.7 tok/sec
float32 (M2Pro)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment