$ go build
$ ./benchmark-go --port 8000 --model meta-llama/Llama-3.2-1B --cuda-device 0
[main] 2025/04/23 04:20:18 Using port: 8000
[main] 2025/04/23 04:20:18 Removing /home/ubuntu/vllm/benchmark-go/benchmark-compare
[main] 2025/04/23 04:20:18 Removing /home/ubuntu/vllm/benchmark-go/venv-vllm
[main] 2025/04/23 04:20:19 Removing /home/ubuntu/vllm/benchmark-go/venv-vllm-src
[main] 2025/04/23 04:20:19 Removing /home/ubuntu/vllm/benchmark-go/venv-sgl
[main] 2025/04/23 04:20:19 ▶ git clone https://github.com/neuralmagic/benchmark-compare.git /home/ubuntu/vllm/benchmark-go/benchmark-compare
Cloning into '/home/ubuntu/vllm/benchmark-go/benchmark-compare'...
remote: Enumerating objects: 64, done.
remote: Counting objects: 100% (64/64), done.
remote: Compressing objects: 100% (49/49), done.
remote: Total 64 (delta 30), reused 48 (delta 15), pack-reused 0 (from 0)
Receiving objects: 100% (64/64), 9.51 KiB | 9.51 MiB/s, done.
Resolving deltas: 100% (30/30), done.
[main] 2025/04/23 04:20:19 ▶ git clone https://github.com/vllm-project/vllm.git /home/ubuntu/vllm/benchmark-go/benchmark-compare/vllm
Cloning into '/home/ubuntu/vllm/benchmark-go/benchmark-compare/vllm'...
remote: Enumerating objects: 68514, done.
remote: Counting objects: 100% (142/142), done.
remote: Compressing objects: 100% (99/99), done.
remote: Total 68514 (delta 74), reused 46 (delta 43), pack-reused 68372 (from 3)
Receiving objects: 100% (68514/68514), 46.10 MiB | 36.43 MiB/s, done.
Resolving deltas: 100% (53333/53333), done.
[main] 2025/04/23 04:20:22 ▶ git -C /home/ubuntu/vllm/benchmark-go/benchmark-compare/vllm checkout benchmark-output
branch 'benchmark-output' set up to track 'origin/benchmark-output'.
Switched to a new branch 'benchmark-output'
[main] 2025/04/23 04:20:22 ▶ Running vllm
[vllm] 2025/04/23 04:20:22 === vllm benchmark start ===
[vllm] 2025/04/23 04:20:22 ▶ cmd: uv venv venv-vllm --python 3.12
[vllm] 2025/04/23 04:20:22 ▶ cmd: bash -c source venv-vllm/bin/activate && uv pip install vllm==0.8.3
[vllm] 2025/04/23 04:20:23 ▶ source venv-vllm/bin/activate && CUDA_VISIBLE_DEVICES=0 vllm serve "meta-llama/Llama-3.2-1B" --disable-log-requests --port 8000
[vllm] 2025/04/23 04:20:23 Waiting for vllm to load...
[vllm] 2025/04/23 04:20:25 vllm inference server ready; starting benchmark tests
[vllm] 2025/04/23 04:20:25 ▶ cmd: uv venv venv-vllm-src --python 3.12
[vllm] 2025/04/23 04:20:25 ▶ source venv-vllm-src/bin/activate && export VLLM_USE_PRECOMPILED=1 && uv pip install -e . && uv pip install numpy pandas datasets
[vllm] 2025/04/23 04:20:25 ▶ cmd: bash -c source venv-vllm-src/bin/activate && export VLLM_USE_PRECOMPILED=1 && uv pip install -e . && uv pip install numpy pandas datasets
[vllm] 2025/04/23 04:20:53 >>> Starting vllm benchmark script; output logged to bench-vllm.log
[vllm] 2025/04/23 04:36:44 Stopping vllm server (pgid 79254)
[vllm] 2025/04/23 04:36:44 === vllm benchmark done ===
[main] 2025/04/23 04:36:44 ✓ vllm completed
[main] 2025/04/23 04:36:44 Killing vllm serve process group
[main] 2025/04/23 04:36:44 ▶ Running sglang
[sglang] 2025/04/23 04:36:44 === sglang benchmark start ===
[sglang] 2025/04/23 04:36:44 ▶ cmd: uv venv venv-sgl --python 3.12
[sglang] 2025/04/23 04:36:44 ▶ source venv-sgl/bin/activate && uv pip install "sglang[all]==0.4.4.post1" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
[sglang] 2025/04/23 04:36:44 ▶ cmd: bash -c source venv-sgl/bin/activate && uv pip install "sglang[all]==0.4.4.post1" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
[sglang] 2025/04/23 04:36:45 ▶ source venv-sgl/bin/activate && CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server --model-path "meta-llama/Llama-3.2-1B" --host 0.0.0.0 --port 8000
[sglang] 2025/04/23 04:36:45 Waiting for sglang to load...
[sglang] 2025/04/23 04:37:13 >>> Starting sglang benchmark script; output logged to bench-sglang.log
[sglang] 2025/04/23 04:50:31 Stopping sglang server (pgid 79719)
[sglang] 2025/04/23 04:50:31 === sglang benchmark done ===
[main] 2025/04/23 04:50:31 ✓ sglang completed
[main] 2025/04/23 04:50:31 Benchmark results are in benchmark-compare/results.json
$ cat benchmark-compare/results.json
{"date": "20250423-042251", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 120, "framework": "vllm", "request_rate": 1.0, "burstiness": 1.0, "max_concurrency": null, "duration": 99.2044142789673, "completed": 120, "total_input_tokens": 120000, "total_output_tokens": 12000, "request_throughput": 1.2096235925808156, "request_goodput:": null, "output_throughput": 120.96235925808156, "total_token_throughput": 1330.5859518388972, "mean_ttft_ms": 55.7500638072573, "median_ttft_ms": 55.160129006253555, "std_ttft_ms": 7.570275505945471, "p99_ttft_ms": 78.04537966090723, "mean_tpot_ms": 7.623310503280393, "median_tpot_ms": 7.564994191981364, "std_tpot_ms": 0.5117150141516688, "p99_tpot_ms": 8.917615979191414, "mean_itl_ms": 7.623312786907062, "median_itl_ms": 7.325472979573533, "std_itl_ms": 3.689846002210612, "p99_itl_ms": 8.462078731390648}
{"date": "20250423-042504", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 1200, "framework": "vllm", "request_rate": 10.0, "burstiness": 1.0, "max_concurrency": null, "duration": 124.90469405101612, "completed": 1200, "total_input_tokens": 1200000, "total_output_tokens": 120000, "request_throughput": 9.607325081872997, "request_goodput:": null, "output_throughput": 960.7325081872996, "total_token_throughput": 10568.057590060296, "mean_ttft_ms": 98.92581659262456, "median_ttft_ms": 72.09359901025891, "std_ttft_ms": 50.64941233268753, "p99_ttft_ms": 271.18523672630545, "mean_tpot_ms": 15.954002968560584, "median_tpot_ms": 14.972195929419625, "std_tpot_ms": 4.017800930946484, "p99_tpot_ms": 28.37257138245875, "mean_itl_ms": 15.954004723327538, "median_itl_ms": 9.19444250757806, "std_itl_ms": 18.0209661379036, "p99_itl_ms": 88.13503364042845}
{"date": "20250423-042825", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 2400, "framework": "vllm", "request_rate": 20.0, "burstiness": 1.0, "max_concurrency": null, "duration": 192.27450342400698, "completed": 2400, "total_input_tokens": 2400000, "total_output_tokens": 240000, "request_throughput": 12.48215419757179, "request_goodput:": null, "output_throughput": 1248.2154197571788, "total_token_throughput": 13730.369617328968, "mean_ttft_ms": 32356.839252874222, "median_ttft_ms": 32542.39215448615, "std_ttft_ms": 20106.0256603418, "p99_ttft_ms": 66375.52823908161, "mean_tpot_ms": 138.62980695176515, "median_tpot_ms": 144.618439621285, "std_tpot_ms": 20.27816946195638, "p99_tpot_ms": 145.28733603868545, "mean_itl_ms": 138.62980861840262, "median_itl_ms": 144.38556198729202, "std_itl_ms": 25.382080379583527, "p99_itl_ms": 150.8483140799217}
{"date": "20250423-043149", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 3600, "framework": "vllm", "request_rate": 30.0, "burstiness": 1.0, "max_concurrency": null, "duration": 194.51393784600077, "completed": 2430, "total_input_tokens": 2430000, "total_output_tokens": 243000, "request_throughput": 12.492678041014535, "request_goodput:": null, "output_throughput": 1249.2678041014535, "total_token_throughput": 13741.945845115988, "mean_ttft_ms": 44553.92690839327, "median_ttft_ms": 52011.231115000555, "std_ttft_ms": 23427.653849664173, "p99_ttft_ms": 67348.50665939099, "mean_tpot_ms": 139.44016934566355, "median_tpot_ms": 144.54402875259868, "std_tpot_ms": 19.20028932757692, "p99_tpot_ms": 145.87512836310626, "mean_itl_ms": 139.44017100640383, "median_itl_ms": 144.5071385242045, "std_itl_ms": 24.750375621837115, "p99_itl_ms": 150.41050648316738}
{"date": "20250423-043514", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 4200, "framework": "vllm", "request_rate": 35.0, "burstiness": 1.0, "max_concurrency": null, "duration": 195.19769972999347, "completed": 2435, "total_input_tokens": 2435000, "total_output_tokens": 243500, "request_throughput": 12.474532247911759, "request_goodput:": null, "output_throughput": 1247.453224791176, "total_token_throughput": 13721.985472702934, "mean_ttft_ms": 47772.57688448887, "median_ttft_ms": 59878.72023397358, "std_ttft_ms": 22986.45837238394, "p99_ttft_ms": 67433.07620086358, "mean_tpot_ms": 139.79731820493421, "median_tpot_ms": 144.63429248508422, "std_tpot_ms": 19.235144995754926, "p99_tpot_ms": 149.28623128063845, "mean_itl_ms": 139.79732014905582, "median_itl_ms": 144.63578001596034, "std_itl_ms": 25.980231220073858, "p99_itl_ms": 151.83430360397318}
{"date": "20250423-043644", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 2000, "framework": "vllm", "request_rate": "inf", "burstiness": 1.0, "max_concurrency": null, "duration": 81.72117351496127, "completed": 1011, "total_input_tokens": 1011000, "total_output_tokens": 101100, "request_throughput": 12.371334826890477, "request_goodput:": null, "output_throughput": 1237.1334826890477, "total_token_throughput": 13608.468309579524, "mean_ttft_ms": 39203.69183895567, "median_ttft_ms": 38973.53941597976, "std_ttft_ms": 23184.15684173001, "p99_ttft_ms": 78754.73518895451, "mean_tpot_ms": 131.91002381993903, "median_tpot_ms": 144.17453387850952, "std_tpot_ms": 28.01955240976829, "p99_tpot_ms": 145.85183130623068, "mean_itl_ms": 131.9100255169393, "median_itl_ms": 144.17769201099873, "std_itl_ms": 35.13979588557214, "p99_itl_ms": 148.24155820766464}
{"date": "20250423-043859", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 120, "framework": "sgl", "request_rate": 1.0, "burstiness": 1.0, "max_concurrency": null, "duration": 99.12011549295858, "completed": 120, "total_input_tokens": 120000, "total_output_tokens": 12000, "request_throughput": 1.2106523423948665, "request_goodput:": null, "output_throughput": 121.06523423948666, "total_token_throughput": 1331.7175766343532, "mean_ttft_ms": 63.53720204885273, "median_ttft_ms": 62.30927899014205, "std_ttft_ms": 7.155379712480937, "p99_ttft_ms": 97.08451915474144, "mean_tpot_ms": 6.885034002675103, "median_tpot_ms": 6.912966540454877, "std_tpot_ms": 0.3761391426358146, "p99_tpot_ms": 7.993933887349093, "mean_itl_ms": 6.8850365482411435, "median_itl_ms": 6.513823493150994, "std_itl_ms": 3.601026436445237, "p99_itl_ms": 17.795703102601667}
{"date": "20250423-044112", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 1200, "framework": "sgl", "request_rate": 10.0, "burstiness": 1.0, "max_concurrency": null, "duration": 124.97874841099838, "completed": 1200, "total_input_tokens": 1200000, "total_output_tokens": 120000, "request_throughput": 9.601632399564002, "request_goodput:": null, "output_throughput": 960.1632399564002, "total_token_throughput": 10561.795639520402, "mean_ttft_ms": 102.0289229384313, "median_ttft_ms": 84.29808050277643, "std_ttft_ms": 40.999998064648764, "p99_ttft_ms": 231.96715161262546, "mean_tpot_ms": 18.958682977433337, "median_tpot_ms": 18.57806561600577, "std_tpot_ms": 4.3424856567591865, "p99_tpot_ms": 31.19588345709031, "mean_itl_ms": 18.958685202938575, "median_itl_ms": 10.372313525294885, "std_itl_ms": 33.549273839333864, "p99_itl_ms": 149.8914022010285}
{"date": "20250423-044336", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 2400, "framework": "sgl", "request_rate": 20.0, "burstiness": 1.0, "max_concurrency": null, "duration": 134.61237677297322, "completed": 2400, "total_input_tokens": 2400000, "total_output_tokens": 240000, "request_throughput": 17.828969798576942, "request_goodput:": null, "output_throughput": 1782.8969798576943, "total_token_throughput": 19611.866778434636, "mean_ttft_ms": 3546.806735109858, "median_ttft_ms": 3812.141373491613, "std_ttft_ms": 2335.8498665051056, "p99_ttft_ms": 7763.364861166919, "mean_tpot_ms": 239.80666820851468, "median_tpot_ms": 263.72618789901026, "std_tpot_ms": 56.99374619838386, "p99_tpot_ms": 276.3772107574634, "mean_itl_ms": 239.8066706249087, "median_itl_ms": 66.30254551419057, "std_itl_ms": 503.14948518744075, "p99_itl_ms": 2585.100403301767}
{"date": "20250423-044630", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 3600, "framework": "sgl", "request_rate": 30.0, "burstiness": 1.0, "max_concurrency": null, "duration": 164.3791377780144, "completed": 2938, "total_input_tokens": 2938000, "total_output_tokens": 293800, "request_throughput": 17.873314337295152, "request_goodput:": null, "output_throughput": 1787.331433729515, "total_token_throughput": 19660.645771024665, "mean_ttft_ms": 21018.583864494456, "median_ttft_ms": 28604.42430851981, "std_ttft_ms": 11401.72622895565, "p99_ttft_ms": 38175.13519490545, "mean_tpot_ms": 178.25143089392841, "median_tpot_ms": 176.75246194450682, "std_tpot_ms": 65.95767952156764, "p99_tpot_ms": 275.83803508921665, "mean_itl_ms": 178.24959505271696, "median_itl_ms": 64.23197098774835, "std_itl_ms": 1183.6693364807015, "p99_itl_ms": 583.9903466822356}
{"date": "20250423-044924", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 4200, "framework": "sgl", "request_rate": 35.0, "burstiness": 1.0, "max_concurrency": null, "duration": 164.43625359999714, "completed": 2945, "total_input_tokens": 2945000, "total_output_tokens": 294500, "request_throughput": 17.909675850216836, "request_goodput:": null, "output_throughput": 1790.9675850216836, "total_token_throughput": 19700.64343523852, "mean_ttft_ms": 24002.31294064794, "median_ttft_ms": 29364.30122097954, "std_ttft_ms": 11051.137108536845, "p99_ttft_ms": 40425.044521072414, "mean_tpot_ms": 176.6838953116777, "median_tpot_ms": 176.18773460642183, "std_tpot_ms": 66.34095410859352, "p99_tpot_ms": 275.28673660755624, "mean_itl_ms": 176.6620842472802, "median_itl_ms": 64.24372002948076, "std_itl_ms": 1184.4805117218716, "p99_itl_ms": 540.4944282199425}
{"date": "20250423-045031", "backend": "vllm", "model_id": "meta-llama/Llama-3.2-1B", "tokenizer_id": "meta-llama/Llama-3.2-1B", "num_prompts": 2000, "framework": "sgl", "request_rate": "inf", "burstiness": 1.0, "max_concurrency": null, "duration": 58.49496903299587, "completed": 1011, "total_input_tokens": 1011000, "total_output_tokens": 101100, "request_throughput": 17.283537656541277, "request_goodput:": null, "output_throughput": 1728.3537656541278, "total_token_throughput": 19011.891422195404, "mean_ttft_ms": 26112.632382734453, "median_ttft_ms": 28899.253002018668, "std_ttft_ms": 15143.323206687941, "p99_ttft_ms": 55783.15656859777, "mean_tpot_ms": 168.6593358142264, "median_tpot_ms": 167.9178248283058, "std_tpot_ms": 74.29110724440532, "p99_tpot_ms": 470.18509583325425, "mean_itl_ms": 168.62901133417776, "median_itl_ms": 64.24233899451792, "std_itl_ms": 1191.1343069053055, "p99_itl_ms": 668.6870165437873}
Created
April 23, 2025 04:57
-
-
Save nerdalert/b26394d86d0fd26cdb3d53e9f0672fdf to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment