intern-lm-perf

model: InternLM-7B

python3 benchmark_serving.py --backend openai --host 127.0.0.1 --port 30000 --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model internlm/internlm2-chat-7b --tokenizer internlm/internlm2-chat-7b --num-prompts 3000 --trust-remote-code

H100 w/o streaming

SGLang

Successful requests:                     3000
Benchmark duration (s):                  67.75
Total input tokens:                      676842
Total generated tokens:                  613123
Request throughput (req/s):              44.28
Input token throughput (tok/s):          9990.58
Output token throughput (tok/s):         9050.05

LMDeploy

Successful requests:                     3000
Benchmark duration (s):                  79.95
Total input tokens:                      676842
Total generated tokens:                  611814
Request throughput (req/s):              37.52
Input token throughput (tok/s):          8466.16
Output token throughput (tok/s):         7652.77

H100 w/ streaming

SGLang

Successful requests:                     3000
Benchmark duration (s):                  107.11
Total input tokens:                      676842
Total generated tokens:                  613028
Request throughput (req/s):              28.01
Input token throughput (tok/s):          6319.36
Output token throughput (tok/s):         5723.55

LMDeploy

Successful requests:                     3000
Benchmark duration (s):                  78.54
Total input tokens:                      676842
Total generated tokens:                  611717
Request throughput (req/s):              38.20
Input token throughput (tok/s):          8617.65
Output token throughput (tok/s):         7788.47

merrymercy/gist:e3041f0029db232adbe845cf995b3290

H100 w/o streaming

H100 w/ streaming