Skip to content

Instantly share code, notes, and snippets.

@yangsheng6810
yangsheng6810 / llm_benchmark.py
Created April 26, 2026 10:57
benchmark local llm
#!/usr/bin/env python3
"""
Benchmark tool for local LLM (OpenAI-compatible API, e.g., Kimi-K2.6)
Measures:
- Time to First Token (TTFT)
- End-to-end latency
- Completion tokens per second (generation speed)
- Throughput (requests per second)
- Success rate
hKRib2R5hqhkZXRhY2hlZMOpaGFzaF90eXBlCqNrZXnEIwEgjkoioTSNr5f1jvlYS5m8vpy51Fh51se0iwYhAd0s4cYKp3BheWxvYWTFAvF7ImJvZHkiOnsia2V5Ijp7ImVsZGVzdF9raWQiOiIwMTIwOGU0YTIyYTEzNDhkYWY5N2Y1OGVmOTU4NGI5OWJjYmU5Y2I5ZDQ1ODc5ZDZjN2I0OGIwNjIxMDFkZDJjZTFjNjBhIiwiaG9zdCI6ImtleWJhc2UuaW8iLCJraWQiOiIwMTIwOGU0YTIyYTEzNDhkYWY5N2Y1OGVmOTU4NGI5OWJjYmU5Y2I5ZDQ1ODc5ZDZjN2I0OGIwNjIxMDFkZDJjZTFjNjBhIiwidWlkIjoiZTc5N2IwM2IwMTdmMTlmNzJiNDNiZGEwYmYxNzEwMTkiLCJ1c2VybmFtZSI6ImRpbWxpZ2h0In0sInNlcnZpY2UiOnsibmFtZSI6ImdpdGh1YiIsInVzZXJuYW1lIjoieWFuZ3NoZW5nNjgxMCJ9LCJ0eXBlIjoid2ViX3NlcnZpY2VfYmluZGluZyIsInZlcnNpb24iOjF9LCJjbGllbnQiOnsibmFtZSI6ImtleWJhc2UuaW8gZ28gY2xpZW50IiwidmVyc2lvbiI6IjEuMC4xNiJ9LCJjdGltZSI6MTQ2NjcyNzk2MiwiZXhwaXJlX2luIjo1MDQ1NzYwMDAsIm1lcmtsZV9yb290Ijp7ImN0aW1lIjoxNDY2NzI3OTM0LCJoYXNoIjoiYTBjZjUzMGY4NzAzYzcxOWEwNjRmOWY0Y2Q5ODJhMTExNjcxMWFlNjAzMTFkNDdiNjk0MzNhMWVmNTBjYmRhZmE4NjhlY2MzZjQwZGQyMDBkYTA3OWEyMWZkN2Q4ZmFiNTIyNTM3NjY1YmNmOTkzZDg4NTMxOTk5YWI5N2Y2MTAiLCJzZXFubyI6NDk3MDU5fSwicHJldiI6IjJlNzczZTE2MTIwYjUxMDgxYzQ3YjQ1ZTU4ZjFl