$ python3 -m sglang.launch_server --host 0.0.0.0 --port "12345" --model-path zai-org/GLM-4.5-Air-FP8 --tp-size 4 --tool-call-parser glm45 --reasoning-parser glm45 --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --cuda-graph-bs 1 2 4 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 256 512 640 --cuda-graph-max-bs 640 --mem-fraction-static 0.8 --max-running-requests 256
============ Serving Benchmark Result ============
Backend: sglang
Traffic request rate: inf
Max request concurrency: not set
Successful requests: 10240
Benchmark duration (s): 352.15
Total input tokens: 3110292
Total generated tokens: 2013443
Total generated tokens (retokenized): 2012696
Request throughput (req/s): 29.08
Input token throughput (tok/s): 8832.24
Output token throughput (tok/s): 5717.54
Total token throughput (tok/s): 14549.77
Concurrency: 5087.98
Accept length: 2.56
----------------End-to-End Latency----------------
Mean E2E Latency (ms): 174975.07
Median E2E Latency (ms): 175330.96
---------------Time to First Token----------------
Mean TTFT (ms): 166851.21
Median TTFT (ms): 166935.17
P99 TTFT (ms): 336071.89
---------------Inter-Token Latency----------------
Mean ITL (ms): 41.53
Median ITL (ms): 36.15
P95 ITL (ms): 110.09
P99 ITL (ms): 122.53
Max ITL (ms): 974.68
==================================================
AFTER (with tuned config)
============ Serving Benchmark Result ============
Backend: sglang
Traffic request rate: inf
Max request concurrency: not set
Successful requests: 10240
Benchmark duration (s): 362.95
Total input tokens: 3110292
Total generated tokens: 2013443
Total generated tokens (retokenized): 2012703
Request throughput (req/s): 28.21
Input token throughput (tok/s): 8569.39
Output token throughput (tok/s): 5547.38
Total token throughput (tok/s): 14116.77
Concurrency: 5130.75
Accept length: 2.55
----------------End-to-End Latency----------------
Mean E2E Latency (ms): 181858.06
Median E2E Latency (ms): 182695.72
---------------Time to First Token----------------
Mean TTFT (ms): 173482.95
Median TTFT (ms): 173571.97
P99 TTFT (ms): 346657.98
---------------Inter-Token Latency----------------
Mean ITL (ms): 42.82
Median ITL (ms): 36.42
P95 ITL (ms): 111.32
P99 ITL (ms): 126.34
Max ITL (ms): 1774.48
==================================================