model: InternLM-7B
python3 benchmark_serving.py --backend openai --host 127.0.0.1 --port 30000 --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model internlm/internlm2-chat-7b --tokenizer internlm/internlm2-chat-7b --num-prompts 3000 --trust-remote-code
SGLang
Successful requests: 3000
Benchmark duration (s): 67.75
Total input tokens: 676842
Total generated tokens: 613123
Request throughput (req/s): 44.28
Input token throughput (tok/s): 9990.58
Output token throughput (tok/s): 9050.05
LMDeploy
Successful requests: 3000
Benchmark duration (s): 79.95
Total input tokens: 676842
Total generated tokens: 611814
Request throughput (req/s): 37.52
Input token throughput (tok/s): 8466.16
Output token throughput (tok/s): 7652.77
SGLang
Successful requests: 3000
Benchmark duration (s): 107.11
Total input tokens: 676842
Total generated tokens: 613028
Request throughput (req/s): 28.01
Input token throughput (tok/s): 6319.36
Output token throughput (tok/s): 5723.55
LMDeploy
Successful requests: 3000
Benchmark duration (s): 78.54
Total input tokens: 676842
Total generated tokens: 611717
Request throughput (req/s): 38.20
Input token throughput (tok/s): 8617.65
Output token throughput (tok/s): 7788.47