A practical speed benchmark comparing 6 frontier coding LLMs on the same task, routed through claudish — an open-source proxy that lets Claude Code use any AI model.
Each model is tested via two routes: OpenRouter (proxy) and the provider's native Direct API.
Task: Generate a TypeScript function parseQueryParams — parse URL query parameters into Record<string, string>, handling edge cases (missing values, duplicate keys, encoded characters), with JSDoc.
This is a representative real-world coding task: small, well-defined, requires understanding of URL parsing, TypeScript types, and documentation conventions.
Models tested:
| Model | Provider | OpenRouter ID | Context | In $/M | Out $/M |
|---|---|---|---|---|---|
| MiniMax M2.5 | MiniMax | minimax/minimax-m2.5 |
197K | $0.29 | $1.20 |
| Kimi K2.5 | Moonshot AI | moonshotai/kimi-k2.5 |
262K | $0.45 | $2.20 |
| GLM-5 | Zhipu AI | z-ai/glm-5 |
203K | $0.80 | $2.56 |
| Gemini 3 Flash Preview | google/gemini-3-flash-preview |
1049K | $0.50 | $3.00 | |
| GPT-5.1 Codex Mini | OpenAI | openai/gpt-5.1-codex-mini |
400K | $0.25 | $2.00 |
| Qwen3.5 Plus | Alibaba | qwen/qwen3.5-plus-02-15 |
1000K | $0.26 | $1.56 |
Prices from OpenRouter as of March 5, 2026.
- 5 rounds of the identical prompt, all 12 model-routes (6 models x 2 routes) launched in parallel per round
- Timing: wall-clock ms from
claudishinvocation to completion (includes proxy overhead) - Two routes per model: OpenRouter (OR) and Direct API (native provider endpoint)
- Single-shot mode with
--jsonoutput, no system prompts, no conversation history — cold start each time - Direct API routes use claudish provider shortcuts:
g@(Google),oai@(OpenAI),kimi@(Moonshot),mm@(MiniMax),glm@(Zhipu)
| Machine | MacBook Pro M1 Max, 64GB RAM |
| OS | macOS 26.3 (Tahoe) |
| Network | ~1.8s baseline latency to OpenRouter API |
| Claude Code | v2.1.69 |
| Claudish | v5.5.2 |
| Date | March 5, 2026, ~12:50 PM UTC+2 |
# Model Route Mean Min Max StdDev OK In $/M Out $/M
----------------------------------------------------------------------------------------
1 Gemini 3 Flash OR 32.6s 28.6s 41.7s 4.7s 5 $0.50 $3.00
2 GPT-5.1 Codex Mini OR 32.7s 29.9s 41.5s 4.4s 5 $0.25 $2.00
3 GPT-5.1 Codex Mini Direct 32.9s 28.2s 40.2s 4.1s 5 $0.25 $2.00
4 Gemini 3 Flash Direct 33.4s 29.1s 41.7s 4.7s 5 $0.50 $3.00
5 MiniMax M2.5 OR 40.4s 31.5s 50.2s 6.2s 5 $0.29 $1.20
6 Qwen3.5 Plus Direct 41.7s 37.5s 50.2s 4.6s 5 $0.26 $1.56
7 Qwen3.5 Plus OR 42.7s 35.9s 57.9s 7.8s 5 $0.26 $1.56
8 Kimi K2.5 Direct 43.8s 35.5s 53.5s 5.9s 5 $0.45 $2.20
9 GLM-5 OR 47.4s 35.7s 61.3s 9.1s 5 $0.80 $2.56
10 Kimi K2.5 OR 48.7s 39.0s 65.6s 9.4s 5 $0.45 $2.20
11 MiniMax M2.5 Direct FAIL - - - 0 $0.29 $1.20
12 GLM-5 Direct FAIL - - - 0 $0.80 $2.56
Model OR Mean Direct Diff Faster
----------------------------------------------------------
Gemini 3 Flash 32.6s 33.4s 0.8s OR 2%
GPT-5.1 Codex Mini 32.7s 32.9s 0.2s OR 1%
Kimi K2.5 48.7s 43.8s 4.9s Direct 10%
MiniMax M2.5 40.4s FAILED - OR
Qwen3.5 Plus 42.7s 41.7s 1.0s Direct 2%
GLM-5 47.4s FAILED - OR
| Model (Route) | R1 | R2 | R3 | R4 | R5 |
|---|---|---|---|---|---|
| Gemini 3 Flash (OR) | 32362 | 28618 | 30177 | 30304 | 41705 |
| GPT-5.1 Codex Mini (OR) | 30522 | 31153 | 30536 | 29894 | 41513 |
| GPT-5.1 Codex Mini (Direct) | 34030 | 28189 | 31368 | 30930 | 40184 |
| Gemini 3 Flash (Direct) | 35521 | 29132 | 30856 | 29927 | 41668 |
| MiniMax M2.5 (OR) | 42435 | 40693 | 37116 | 31497 | 50234 |
| Qwen3.5 Plus (Direct) | 41992 | 38196 | 40572 | 37493 | 50227 |
| Qwen3.5 Plus (OR) | 38661 | 35925 | 40021 | 41107 | 57929 |
| Kimi K2.5 (Direct) | 41590 | 35539 | 42686 | 53506 | 45873 |
| GLM-5 (OR) | 39490 | 49774 | 50911 | 35738 | 61322 |
| Kimi K2.5 (OR) | 40626 | 38983 | 65555 | 49001 | 49518 |
Both consistently landed at ~32-33s mean. The race was so close that the ranking flipped between routes — Gemini won via OpenRouter, GPT won via Direct. For practical purposes, they're equal.
At $0.25/M input + $2.00/M output, it's the cheapest of the top-tier models while matching Gemini Flash on speed. If you're cost-sensitive, this is the clear pick.
For Gemini and GPT, Direct API was actually 0.2-0.8s slower than OpenRouter — within noise. The proxy overhead is negligible for fast models because the routing time is tiny compared to inference time.
Kimi K2.5 showed a 10% speed improvement via Direct API (43.8s vs 48.7s). This suggests OpenRouter's routing adds measurable latency for some Chinese providers.
At $0.29/M in + $1.20/M out (cheapest output pricing), it's competitive at 40s mean. If you don't need sub-35s response times, it's the most cost-effective option.
Slowest model (47.4s), most inconsistent (9.1s std dev), AND second most expensive ($0.80 in + $2.56 out). Hard to recommend over any competitor.
Every model was 20-50% slower in Round 5 compared to Rounds 1-4. This likely reflects increased load (time-of-day effect) or rate limiting from running 12 parallel requests per round.
- Install Claude Code
- Install claudish:
npm install -g claudish - Set your API keys:
export OPENROUTER_API_KEY='...' # Required for OpenRouter routes export GEMINI_API_KEY='...' # For g@ direct export OPENAI_API_KEY='...' # For oai@ direct export MOONSHOT_API_KEY='...' # For kimi@ direct export MINIMAX_API_KEY='...' # For mm@ direct export ZHIPU_API_KEY='...' # For glm@ direct
# Download the test script
curl -O https://gist.githubusercontent.com/.../speed-test.sh
chmod +x speed-test.sh
# Run with default 5 rounds
./speed-test.sh
# Run with custom rounds
./speed-test.sh 10Edit the OR_MODELS and DIRECT_MODELS arrays in the script. Find model IDs with:
claudish --models <search-term>- End-to-end latency, not pure inference. Includes: claudish proxy startup, API routing, queue time, inference, and response streaming.
- Single task type — results may differ for longer prompts, multi-turn, or different languages.
- 5 rounds shows trends but isn't statistically rigorous. For publication-grade results, run 20+ rounds.
- Time-of-day effects — load patterns vary. Our R5 slowdown confirms this.
- Direct API failures for MiniMax (auth format mismatch) and GLM (env var naming) are claudish-specific, not model issues.
MIT — use freely, attribution appreciated.
Tested with claudish v5.5.2 on March 5, 2026.