Skip to content

Instantly share code, notes, and snippets.

@trojblue
Created January 24, 2025 20:52
Show Gist options
  • Save trojblue/60b23386b53c9a70f35374ae57fdb0a6 to your computer and use it in GitHub Desktop.
Save trojblue/60b23386b53c9a70f35374ae57fdb0a6 to your computer and use it in GitHub Desktop.

env:

  • ollama 0.4.7, quantized 4bit models
  • 8xH100
  • python 3.12

test command:

python benchmark.py --verbose

test prompts:

Prompt: Why is the sky blue?
Prompt: Write a report on the financials of Apple Inc.

results:

----------------------------------------------------
        qwen2.5-coder:32b-mod
                Prompt eval: 37312.50 t/s
                Response: 55.94 t/s
                Total: 206.89 t/s

        Stats:
                Prompt tokens: 597
                Response tokens: 220
                Model load time: 0.01s
                Prompt eval time: 0.02s
                Response time: 3.93s
                Total time: 3.97s
----------------------------------------------------

----------------------------------------------------
        qwen2.5-coder:32b-mod
                Prompt eval: 35411.76 t/s
                Response: 55.24 t/s
                Total: 88.11 t/s

        Stats:
                Prompt tokens: 602
                Response tokens: 1009
                Model load time: 0.01s
                Prompt eval time: 0.02s
                Response time: 18.27s
                Total time: 18.31s
----------------------------------------------------

----------------------------------------------------
        deepseek-r1:70b-mod
                Prompt eval: 761.45 t/s
                Response: 33.56 t/s
                Total: 59.04 t/s

        Stats:
                Prompt tokens: 632
                Response tokens: 768
                Model load time: 25.26s
                Prompt eval time: 0.83s
                Response time: 22.88s
                Total time: 49.51s
----------------------------------------------------

----------------------------------------------------
        deepseek-r1:70b-mod
                Prompt eval: 35388.89 t/s
                Response: 33.74 t/s
                Total: 68.34 t/s

        Stats:
                Prompt tokens: 637
                Response tokens: 620
                Model load time: 0.02s
                Prompt eval time: 0.02s
                Response time: 18.37s
                Total time: 18.41s
----------------------------------------------------

----------------------------------------------------
        deepseek-r1:671b-mod
                Prompt eval: 69.26 t/s
                Response: 24.84 t/s
                Total: 26.68 t/s

        Stats:
                Prompt tokens: 73
                Response tokens: 608
                Model load time: 110.86s
                Prompt eval time: 1.05s
                Response time: 24.47s
                Total time: 136.76s
----------------------------------------------------

----------------------------------------------------
        deepseek-r1:671b-mod
                Prompt eval: 329.11 t/s
                Response: 24.56 t/s
                Total: 25.97 t/s

        Stats:
                Prompt tokens: 78
                Response tokens: 1254
                Model load time: 0.01s
                Prompt eval time: 0.24s
                Response time: 51.06s
                Total time: 51.32s
----------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment