Skip to content

Instantly share code, notes, and snippets.

@kurtextrem
Created April 7, 2026 11:14
Show Gist options
  • Select an option

  • Save kurtextrem/e6919f9e1117ddce3865699fc4b6f11d to your computer and use it in GitHub Desktop.

Select an option

Save kurtextrem/e6919f9e1117ddce3865699fc4b6f11d to your computer and use it in GitHub Desktop.
Translation Evals
| Model | German | Dutch | Spanish | Russian | French | Chinese | Japanese | Arabic | Avg | Duration | Tokens in/out | Cost |
|------------------------|----------|----------|----------|----------|----------|----------|----------|----------|------|----------|---------------|--------|
| GPT 5.2 | 95 | 92 | 94 | 95 | 95 | 96 | 95 | 94 | 95 | 18.42s | 999/1.1k | 0.0172 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.2 | 95 | 92 | 94 | 95 | 95 | 96 | 94 | 95 | 95 | 16.96s | 999/1.1k | 0.0172 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT-5.3 Codex | 95 | 93 | 94 | 95 | 95 | 95 | 95 | 94 | 95 | 47.27s | 999/3.3k | 0.0472 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 | 95 | 93 | 93 | 95 | 95 | 95 | 96 | 94 | 95 | 20.92s | 999/1.1k | 0.0195 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.6 | 93 | 91 | 93 | 95 | 93 | 94 | 95 | 94 | 94 | 23.45s | 1.1k/1.7k | 0.0471 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.6 | 94 | 93 | 91 | 94 | 93 | 95 | 95 | 94 | 94 | 29.20s | 1.1k/1.7k | 0.0484 |
| reasoning: interleaved | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.6 | 93 | 92 | 92 | 95 | 92 | 95 | 95 | 94 | 94 | 44.14s | 1.1k/1.7k | 0.0482 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.6 | 93 | 91 | 93 | 94 | 94 | 94 | 95 | 94 | 94 | 31.59s | 1.1k/1.7k | 0.0485 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.6 | 94 | 93 | 92 | 95 | 93 | 95 | 95 | 95 | 94 | 24.94s | 1.1k/1.7k | 0.0484 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.2 | 95 | 92 | 94 | 94 | 94 | 96 | 94 | 95 | 94 | 19.45s | 999/1.1k | 0.0172 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.2 | 95 | 90 | 94 | 94 | 95 | 96 | 94 | 94 | 94 | 15.98s | 999/1.1k | 0.0173 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT-5.3 Codex | 95 | 92 | 94 | 94 | 94 | 95 | 95 | 95 | 94 | 18.27s | 999/1.1k | 0.0176 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT-5.3 Codex | 95 | 92 | 94 | 95 | 94 | 95 | 95 | 95 | 94 | 21.93s | 999/1.2k | 0.0186 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3 Flash | 95 | 93 | 94 | 93 | 95 | 94 | 93 | 93 | 94 | 47.62s | 1.0k/3.7k | 0.0117 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3 Flash | 94 | 93 | 95 | 94 | 95 | 94 | 96 | 93 | 94 | 26.35s | 1.0k/3.8k | 0.0120 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Pro | 95 | 94 | 94 | 95 | 94 | 88 | 96 | 95 | 94 | 57.48s | 1.0k/7.6k | 0.0936 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 | 94 | 91 | 94 | 95 | 94 | 95 | 96 | 95 | 94 | 18.61s | 999/1.1k | 0.0197 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Mini | 95 | 93 | 94 | 94 | 90 | 95 | 96 | 94 | 94 | 12.34s | 999/1.7k | 0.0083 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Mini | 96 | 93 | 93 | 94 | 92 | 95 | 96 | 94 | 94 | 40.34s | 999/6.2k | 0.0286 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3 Flash | 94 | 90 | 91 | 94 | 94 | 94 | 96 | 91 | 93 | 12.57s | 1.0k/1.2k | 0.0041 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Flash Lite | 92 | 91 | 93 | 94 | 94 | 94 | 93 | 92 | 93 | 8.59s | 1.0k/1.2k | 0.0021 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Flash Lite | 93 | 89 | 92 | 94 | 95 | 94 | 95 | 94 | 93 | 16.12s | 1.0k/1.3k | 0.0023 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Flash Lite | 93 | 90 | 93 | 94 | 95 | 92 | 93 | 93 | 93 | 19.99s | 1.0k/3.1k | 0.0049 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Mini | 93 | 90 | 89 | 95 | 94 | 94 | 95 | 95 | 93 | 8.11s | 999/1.1k | 0.0057 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Mini | 93 | 92 | 91 | 94 | 94 | 94 | 95 | 94 | 93 | 12.59s | 999/1.1k | 0.0058 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Mini | 91 | 88 | 94 | 94 | 95 | 95 | 96 | 94 | 93 | 8.21s | 999/1.1k | 0.0059 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.6 | 91 | 80 | 92 | 94 | 93 | 95 | 95 | 93 | 92 | 48.36s | 1.1k/1.7k | 0.0294 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.5 | 92 | 88 | 88 | 95 | 92 | 93 | 95 | 92 | 92 | 39.74s | 1.1k/3.7k | 0.0981 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.5 | 92 | 90 | 88 | 95 | 92 | 93 | 95 | 90 | 92 | 36.90s | 1.1k/3.5k | 0.0941 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.5 | 93 | 89 | 90 | 94 | 92 | 92 | 95 | 91 | 92 | 43.48s | 1.1k/4.5k | 0.1169 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3 Flash | 92 | 92 | 92 | 91 | 92 | 93 | 94 | 89 | 92 | 10.21s | 1.0k/1.2k | 0.0041 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3 Flash | 94 | 93 | 93 | 92 | 93 | 91 | 95 | 88 | 92 | 9.92s | 1.0k/1.2k | 0.0040 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Flash Lite | 92 | 89 | 88 | 94 | 94 | 94 | 95 | 90 | 92 | 6.91s | 1.0k/1.2k | 0.0021 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Flash Lite | 91 | 89 | 91 | 94 | 94 | 94 | 96 | 89 | 92 | 27.13s | 1.0k/5.2k | 0.0081 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Nano | 89 | 87 | 94 | 92 | 91 | 94 | 94 | 91 | 92 | 6.79s | 999/1.1k | 0.0016 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.6 | 93 | 82 | 87 | 95 | 93 | 94 | 95 | 86 | 91 | 27.60s | 1.1k/1.7k | 0.0292 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Opus 4.5 | 92 | 87 | 88 | 94 | 91 | 92 | 94 | 91 | 91 | 39.03s | 1.1k/3.5k | 0.0927 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Grok 4.20 Beta | 95 | 87 | 91 | 86 | 93 | 95 | 92 | 92 | 91 | 7.57s | 1.1k/1.3k | 0.0097 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Nano | 86 | 86 | 92 | 93 | 91 | 93 | 93 | 91 | 91 | 11.48s | 999/1.1k | 0.0016 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Nano | 88 | 88 | 92 | 93 | 91 | 94 | 93 | 92 | 91 | 7.62s | 999/1.1k | 0.0016 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Nano | 84 | 89 | 91 | 92 | 91 | 94 | 92 | 92 | 91 | 6.51s | 999/1.1k | 0.0016 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Nano | 88 | 90 | 92 | 90 | 90 | 94 | 95 | 91 | 91 | 12.01s | 999/1.1k | 0.0016 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 Nano | 93 | 94 | 67 | 94 | 96 | 96 | 96 | 92 | 91 | 75.85s | 999/11.1k | 0.0141 |
| reasoning: xhigh | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.5 | 90 | 86 | 88 | 90 | 91 | 90 | 93 | 91 | 90 | 35.34s | 1.1k/3.6k | 0.0576 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.5 | 91 | 86 | 86 | 88 | 93 | 92 | 92 | 90 | 90 | 35.05s | 1.1k/3.6k | 0.0570 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.6 | 92 | 81 | 88 | 92 | 92 | 94 | 94 | 90 | 90 | 52.87s | 1.1k/1.6k | 0.0281 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.6 | 89 | 78 | 90 | 93 | 93 | 93 | 93 | 92 | 90 | 34.86s | 1.1k/1.7k | 0.0291 |
| reasoning: interleaved | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.6 | 90 | 77 | 86 | 96 | 93 | 94 | 96 | 90 | 90 | 27.94s | 1.1k/1.7k | 0.0291 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Kimi K2.5 | 90 | 85 | 93 | 93 | 91 | 94 | 91 | 85 | 90 | 135.61s | 670/5.6k | 0.0169 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT OSS 120B | 87 | 87 | 91 | 89 | 93 | 94 | 92 | 85 | 90 | 18.78s | 1.1k/8.3k | 0.0051 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Pro | 92 | 90 | 90 | 94 | 91 | 81 | 94 | 91 | 90 | 26.84s | 1.0k/3.3k | 0.0412 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Pro | 92 | 91 | 81 | 93 | 93 | 87 | 95 | 90 | 90 | 27.90s | 1.0k/2.5k | 0.0321 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Gemini 3.1 Pro | 94 | 92 | 87 | 95 | 92 | 77 | 90 | 92 | 90 | 35.47s | 1.0k/4.6k | 0.0572 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GLM 5 | 89 | 85 | 85 | 94 | 93 | 94 | 92 | 88 | 90 | 29.20s | 1.0k/1.3k | 0.0049 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Grok 4.20 Beta | 91 | 89 | 92 | 85 | 90 | 92 | 93 | 87 | 90 | 29.78s | 1.1k/8.1k | 0.0508 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Grok 4.20 Beta | 93 | 87 | 92 | 81 | 88 | 94 | 92 | 89 | 90 | 23.25s | 1.1k/6.3k | 0.0401 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.5 | 92 | 84 | 89 | 84 | 87 | 91 | 93 | 90 | 89 | 33.71s | 1.1k/3.2k | 0.0521 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT OSS 120B | 82 | 88 | 87 | 88 | 92 | 93 | 93 | 87 | 89 | 6.08s | 1.1k/1.2k | 0.0009 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT OSS 120B | 84 | 84 | 90 | 89 | 90 | 94 | 93 | 89 | 89 | 18.28s | 1.1k/4.2k | 0.0027 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GLM 5 | 91 | 85 | 82 | 93 | 90 | 92 | 94 | 82 | 89 | 27.82s | 1.0k/1.3k | 0.0046 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Grok 4.20 Beta | 87 | 84 | 90 | 91 | 91 | 94 | 90 | 83 | 89 | 27.48s | 1.1k/5.9k | 0.0371 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Grok 4.20 Beta | 89 | 85 | 93 | 89 | 92 | 93 | 83 | 86 | 89 | 18.97s | 1.1k/4.9k | 0.0316 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Sonnet 4.5 | 89 | 86 | 87 | 84 | 91 | 90 | 91 | 89 | 88 | 20.60s | 1.1k/1.7k | 0.0284 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Qwen3.5 397B A17B | 87 | 80 | 82 | 84 | 89 | 87 | 91 | 85 | 86 | 91.02s | 679/5.2k | 0.0158 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Haiku 4.5 | 84 | 80 | 85 | 85 | 86 | 89 | 89 | 85 | 85 | 45.07s | 1.1k/5.8k | 0.0300 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Qwen3.5 397B A17B | 87 | 80 | 87 | 84 | 76 | 90 | 87 | 86 | 85 | 90.84s | 679/5.1k | 0.0156 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Haiku 4.5 | 86 | 78 | 86 | 88 | 86 | 89 | 81 | 79 | 84 | 131.82s | 1.1k/16.1k | 0.0816 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Qwen3.5 397B A17B | 84 | 85 | 83 | 79 | 89 | 87 | 85 | 78 | 84 | 146.03s | 1.0k/7.7k | 0.0244 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| DeepSeek V3.2 | 84 | 70 | 80 | 91 | 84 | 88 | 85 | 78 | 83 | 112.71s | 1.0k/2.1k | 0.0011 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiMo V2 Pro | 80 | 80 | 89 | 78 | 86 | 88 | 87 | 65 | 82 | 95.42s | 1.0k/5.9k | 0.0184 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Haiku 4.5 | 82 | 73 | 86 | 78 | 84 | 88 | 87 | 70 | 81 | 29.31s | 1.1k/3.6k | 0.0190 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiMo V2 Pro | 83 | 86 | 87 | 45 | 90 | 91 | 83 | 81 | 81 | 20.59s | 1.0k/1.3k | 0.0050 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiMo V2 Pro | 85 | 79 | 85 | 62 | 90 | 89 | 83 | 75 | 81 | 62.05s | 1.0k/5.1k | 0.0163 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| DeepSeek V3.2 | 74 | 73 | 83 | 82 | 70 | 94 | 93 | 80 | 81 | 25.31s | 1.0k/1.3k | 0.0014 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Haiku 4.5 | 85 | 71 | 82 | 79 | 82 | 87 | 82 | 73 | 80 | 15.08s | 1.1k/1.6k | 0.0093 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiMo V2 Pro | 82 | 81 | 90 | 55 | 88 | 86 | 85 | 75 | 80 | 66.54s | 1.0k/5.2k | 0.0164 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Qwen3.5 397B A17B | 74 | 87 | 81 | 66 | 70 | 90 | 82 | 88 | 80 | 25.20s | 1.0k/1.1k | 0.0041 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GLM 5 | 86 | 81 | 86 | 92 | 92 | 66 | 40 | 89 | 79 | 39.51s | 1.0k/1.4k | 0.0048 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiniMax M2.7 | 76 | 83 | 82 | 88 | 88 | 92 | 46 | 79 | 79 | 63.20s | 1.0k/2.3k | 0.0059 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiMo V2 Pro | 82 | 86 | 86 | 57 | 85 | 84 | 84 | 69 | 79 | 85.59s | 1.0k/6.8k | 0.0212 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GLM 5 | 90 | 84 | 85 | 94 | 59 | 60 | 61 | 89 | 78 | 39.09s | 1.0k/1.3k | 0.0045 |
| reasoning: none | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GLM 5 | 56 | 64 | 84 | 89 | 90 | 60 | 94 | 85 | 78 | 31.51s | 1.1k/913 | 0.0038 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Qwen3.5 397B A17B | 75 | 65 | 73 | 79 | 87 | 90 | 89 | 66 | 78 | 150.13s | 1.0k/8.8k | 0.0283 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| DeepSeek V3.2 | 81 | 36 | 89 | 86 | 68 | 92 | 80 | 75 | 76 | 86.64s | 687/2.0k | 0.0011 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiniMax M2.7 | 70 | 69 | 80 | 82 | 86 | 90 | 41 | 81 | 75 | 39.96s | 1.0k/1.5k | 0.0034 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiniMax M2.7 | 71 | 59 | 74 | 90 | 87 | 91 | 48 | 78 | 75 | 59.12s | 1.0k/1.9k | 0.0050 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| DeepSeek V3.2 | 62 | 48 | 77 | 74 | 91 | 89 | 82 | 79 | 75 | 46.24s | 1.0k/2.6k | 0.0032 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| DeepSeek V3.2 | 36 | 63 | 81 | 91 | 82 | 89 | 78 | 65 | 73 | 98.15s | 1.0k/4.7k | 0.0049 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 | 95 | 60 | 34 | 68 | 44 | 74 | 97 | 95 | 71 | 48.63s | 999/3.3k | 0.0521 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Nemotron 3 Super | 72 | 66 | 79 | 66 | 72 | 82 | 91 | 28 | 70 | 36.55s | 1.1k/8.2k | 0.0050 |
| reasoning: minimal | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| MiniMax M2.7 | 26 | 66 | 74 | 85 | 88 | 91 | 42 | 82 | 69 | 15.58s | 334/721 | 0.0019 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Nemotron 3 Super | 64 | 72 | 83 | 64 | 69 | 86 | 84 | 28 | 69 | 13.69s | 1.1k/4.2k | 0.0022 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Nemotron 3 Super | 61 | 71 | 79 | 44 | 79 | 82 | 75 | 30 | 65 | 12.76s | 1.1k/3.7k | 0.0019 |
| reasoning: low | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Nemotron 3 Super | 66 | 63 | 59 | 71 | 75 | 72 | 65 | 26 | 62 | 14.64s | 1.1k/2.9k | 0.0015 |
| reasoning: high | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| GPT 5.4 | 95 | 67 | 65 | 95 | 32 | 67 | 37 | 32 | 61 | 20.51s | 999/1.3k | 0.0225 |
| reasoning: medium | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| Nemotron 3 Super | 72 | 59 | 52 | 59 | 78 | 61 | 50 | 40 | 59 | 9.31s | 1.1k/1.2k | 0.0009 |
| reasoning: none | | | | | | | | | | | | |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment