env:
- ollama 0.4.7, quantized 4bit models
- 8xH100
- python 3.12
test command:
python benchmark.py --verbose
test prompts:
Prompt: Why is the sky blue?
Prompt: Write a report on the financials of Apple Inc.
results:
----------------------------------------------------
qwen2.5-coder:32b-mod
Prompt eval: 37312.50 t/s
Response: 55.94 t/s
Total: 206.89 t/s
Stats:
Prompt tokens: 597
Response tokens: 220
Model load time: 0.01s
Prompt eval time: 0.02s
Response time: 3.93s
Total time: 3.97s
----------------------------------------------------
----------------------------------------------------
qwen2.5-coder:32b-mod
Prompt eval: 35411.76 t/s
Response: 55.24 t/s
Total: 88.11 t/s
Stats:
Prompt tokens: 602
Response tokens: 1009
Model load time: 0.01s
Prompt eval time: 0.02s
Response time: 18.27s
Total time: 18.31s
----------------------------------------------------
----------------------------------------------------
deepseek-r1:70b-mod
Prompt eval: 761.45 t/s
Response: 33.56 t/s
Total: 59.04 t/s
Stats:
Prompt tokens: 632
Response tokens: 768
Model load time: 25.26s
Prompt eval time: 0.83s
Response time: 22.88s
Total time: 49.51s
----------------------------------------------------
----------------------------------------------------
deepseek-r1:70b-mod
Prompt eval: 35388.89 t/s
Response: 33.74 t/s
Total: 68.34 t/s
Stats:
Prompt tokens: 637
Response tokens: 620
Model load time: 0.02s
Prompt eval time: 0.02s
Response time: 18.37s
Total time: 18.41s
----------------------------------------------------
----------------------------------------------------
deepseek-r1:671b-mod
Prompt eval: 69.26 t/s
Response: 24.84 t/s
Total: 26.68 t/s
Stats:
Prompt tokens: 73
Response tokens: 608
Model load time: 110.86s
Prompt eval time: 1.05s
Response time: 24.47s
Total time: 136.76s
----------------------------------------------------
----------------------------------------------------
deepseek-r1:671b-mod
Prompt eval: 329.11 t/s
Response: 24.56 t/s
Total: 25.97 t/s
Stats:
Prompt tokens: 78
Response tokens: 1254
Model load time: 0.01s
Prompt eval time: 0.24s
Response time: 51.06s
Total time: 51.32s
----------------------------------------------------