Skip to content

Instantly share code, notes, and snippets.

@kurtextrem
Last active April 2, 2026 15:08
Show Gist options
  • Select an option

  • Save kurtextrem/adf4f1f085d9446b402f34417839ff47 to your computer and use it in GitHub Desktop.

Select an option

Save kurtextrem/adf4f1f085d9446b402f34417839ff47 to your computer and use it in GitHub Desktop.
Translation Evals

See https://www.framer.com/blog/how-we-pick-translation-models-for-framer/.

Model German Dutch Spanish Russian French Chinese Japanese Arabic Avg Duration Tokens in/out Cost
Opus 4.6 93 92 93 95 93 93 96 93 94 35.36s 1.1k/1.7k 0.0476
reasoning: interleaved
GPT 5.2 94 92 94 94 95 96 95 94 94 75.71s 989/1.1k 0.0171
reasoning: none
GPT 5.2 94 92 94 95 94 97 94 95 94 40.82s 989/1.1k 0.0171
reasoning: low
GPT 5.2 95 92 94 92 95 96 95 95 94 38.70s 989/1.1k 0.0171
reasoning: medium
GPT 5.2 94 93 93 94 94 96 93 95 94 26.05s 989/1.1k 0.0170
reasoning: high
GPT-5.3 Codex 94 91 92 94 94 95 95 94 94 48.06s 989/1.1k 0.0176
reasoning: low
GPT-5.3 Codex 94 92 93 95 94 96 94 94 94 69.78s 989/1.2k 0.0179
reasoning: medium
GPT-5.3 Codex 93 92 93 94 94 95 95 96 94 31.91s 989/1.2k 0.0182
reasoning: high
GPT 5.4 93 89 93 96 94 96 96 95 94 44.37s 989/1.1k 0.0196
reasoning: none
GPT 5.4 94 92 94 94 94 96 96 94 94 45.82s 989/1.1k 0.0195
reasoning: low
GPT 5.4 93 91 94 95 94 96 97 95 94 20.66s 989/1.1k 0.0195
reasoning: medium
GPT 5.4 93 92 93 94 94 95 97 94 94 37.93s 989/1.2k 0.0197
reasoning: high
GPT 5.4 Mini 94 92 94 94 93 95 95 93 94 10.99s 989/1.1k 0.0057
reasoning: none
Opus 4.6 94 91 92 95 93 94 95 93 93 42.12s 1.1k/1.7k 0.0471
reasoning: none
Opus 4.6 93 92 93 94 93 93 96 93 93 28.83s 1.1k/1.7k 0.0472
reasoning: medium
Opus 4.6 93 92 92 95 92 94 95 92 93 24.53s 1.1k/1.7k 0.0469
reasoning: high
Gemini 3 Flash 93 92 91 94 94 93 96 88 93 9.52s 1.0k/1.2k 0.0040
reasoning: minimal
Gemini 3 Flash 93 91 94 94 93 93 95 92 93 8.16s 1.0k/1.2k 0.0041
reasoning: low
Gemini 3 Flash 92 91 94 94 93 94 95 90 93 11.67s 1.0k/1.2k 0.0041
reasoning: medium
Gemini 3.1 Pro 92 92 91 92 92 92 96 94 93 25.88s 1.0k/2.9k 0.0374
reasoning: low
Gemini 3.1 Pro 94 89 92 94 92 94 94 92 93 23.80s 1.0k/2.9k 0.0370
reasoning: medium
Gemini 3.1 Pro 94 91 92 94 90 94 94 93 93 24.45s 1.0k/3.0k 0.0386
reasoning: high
Gemini 3.1 Flash Lite 90 90 94 94 94 93 95 93 93 9.09s 1.0k/1.2k 0.0021
reasoning: none
Gemini 3.1 Flash Lite 91 89 92 94 94 94 95 91 93 7.27s 1.0k/1.2k 0.0021
reasoning: minimal
Gemini 3.1 Flash Lite 91 90 91 94 94 94 96 93 93 7.46s 1.0k/1.2k 0.0021
reasoning: low
Gemini 3.1 Flash Lite 91 90 92 94 94 93 95 93 93 6.69s 1.0k/1.2k 0.0021
reasoning: medium
GPT 5.4 Mini 93 91 93 93 92 95 95 93 93 10.51s 989/1.1k 0.0057
reasoning: minimal
GPT 5.4 Mini 93 91 93 93 94 95 95 93 93 25.93s 989/1.1k 0.0057
reasoning: low
GPT 5.4 Mini 92 92 90 93 92 94 96 93 93 6.98s 989/1.1k 0.0057
reasoning: high
GPT 5.4 Mini 93 91 93 93 90 94 95 94 93 7.56s 989/1.1k 0.0057
reasoning: xhigh
Opus 4.5 93 89 90 94 92 92 95 90 92 99.99s 1.1k/2.6k 0.0715
reasoning: none
Opus 4.5 91 91 89 94 93 93 95 90 92 69.03s 1.1k/4.5k 0.1182
reasoning: medium
Opus 4.5 92 90 89 94 92 93 94 91 92 47.23s 1.1k/2.3k 0.0643
reasoning: high
Opus 4.6 94 91 84 94 92 94 95 92 92 34.04s 1.1k/1.7k 0.0470
reasoning: low
Gemini 3 Flash 92 92 91 93 93 92 95 90 92 9.34s 1.0k/1.2k 0.0041
reasoning: none
Gemini 3 Flash 92 89 94 94 92 93 93 91 92 8.03s 1.0k/1.2k 0.0041
reasoning: high
Gemini 3.1 Flash Lite 91 90 92 94 94 94 92 91 92 6.90s 1.0k/1.2k 0.0021
reasoning: high
GPT 5.4 Mini 85 89 93 93 92 95 95 93 92 13.56s 989/1.1k 0.0057
reasoning: medium
GPT 5.4 Nano 92 90 92 92 92 94 94 91 92 10.28s 989/1.1k 0.0016
reasoning: none
GPT 5.4 Nano 88 89 92 92 93 94 94 91 92 17.46s 989/1.1k 0.0016
reasoning: medium
GPT 5.4 Nano 89 90 91 90 93 94 95 91 92 10.23s 989/1.1k 0.0016
reasoning: high
GPT 5.4 Nano 90 89 93 92 91 94 94 91 92 14.10s 989/1.1k 0.0016
reasoning: xhigh
Opus 4.5 91 89 89 94 90 91 95 91 91 81.45s 1.1k/3.8k 0.0997
reasoning: low
Kimi K2.5 94 81 89 93 92 94 94 92 91 175.65s 995/8.4k 0.0256
reasoning: none
GPT OSS 120B 92 90 91 92 92 93 94 85 91 116.60s 1.1k/4.2k 0.0026
reasoning: medium
GLM 5 89 88 85 93 92 93 94 90 91 68.14s 990/1.9k 0.0052
reasoning: none
GPT 5.4 Nano 85 87 93 92 93 95 93 91 91 46.80s 989/1.1k 0.0016
reasoning: minimal
GPT 5.4 Nano 86 87 92 91 90 94 94 90 91 33.02s 989/1.1k 0.0016
reasoning: low
Sonnet 4.6 90 78 88 94 92 92 93 90 90 62.71s 1.1k/1.7k 0.0284
reasoning: none
Sonnet 4.6 88 83 88 95 92 92 95 88 90 42.06s 1.1k/1.7k 0.0286
reasoning: interleaved
Sonnet 4.6 91 79 86 94 92 92 95 88 90 29.07s 1.1k/1.7k 0.0281
reasoning: low
Sonnet 4.6 91 81 85 92 93 92 94 89 90 24.35s 1.1k/1.7k 0.0282
reasoning: medium
Sonnet 4.6 89 81 86 94 92 92 95 89 90 24.40s 1.1k/1.7k 0.0287
reasoning: high
GPT OSS 120B 86 89 90 90 90 94 95 86 90 80.62s 1.1k/3.3k 0.0021
reasoning: low
GPT OSS 120B 89 86 90 93 92 93 94 80 90 29.85s 1.1k/3.8k 0.0024
reasoning: high
Gemini 3.1 Pro 92 82 77 94 91 94 94 93 90 27.86s 1.0k/3.4k 0.0434
reasoning: minimal
GLM 5 87 85 86 93 92 94 94 90 90 33.31s 990/1.3k 0.0045
reasoning: minimal
GLM 5 89 87 84 94 92 91 95 89 90 84.50s 990/2.5k 0.0084
reasoning: medium
Grok 4.20 Beta 90 88 92 91 91 92 87 90 90 28.01s 1.1k/5.5k 0.0331
reasoning: none
Grok 4.20 Beta 92 87 90 92 88 90 90 91 90 19.34s 1.1k/5.1k 0.0309
reasoning: medium
Sonnet 4.5 87 87 88 85 93 92 92 90 89 26.44s 1.1k/1.7k 0.0288
reasoning: none
Sonnet 4.5 89 86 89 89 91 90 93 85 89 25.20s 1.1k/1.7k 0.0288
reasoning: low
Sonnet 4.5 88 87 88 87 91 91 92 88 89 20.38s 1.1k/1.7k 0.0286
reasoning: medium
Sonnet 4.5 89 88 87 88 91 92 91 88 89 21.06s 1.1k/1.7k 0.0284
reasoning: high
Grok 4.20 Beta 88 85 88 90 92 90 89 88 89 44.23s 1.1k/7.9k 0.0476
reasoning: minimal
Grok 4.20 Beta 89 87 90 89 89 90 88 89 89 19.82s 1.1k/5.5k 0.0330
reasoning: high
GLM 5 88 86 85 93 92 93 74 90 88 36.73s 990/2.8k 0.0075
reasoning: high
Grok 4.20 Beta 88 88 86 92 87 91 90 85 88 26.98s 1.1k/5.6k 0.0336
reasoning: low
GLM 5 84 88 84 93 92 92 73 91 87 26.98s 990/1.4k 0.0053
reasoning: low
MiMo V2 Pro 84 85 86 78 89 87 85 80 84 73.22s 998/4.1k 0.0125
reasoning: none
MiMo V2 Pro 85 82 88 81 86 91 81 66 83 91.11s 998/5.1k 0.0157
reasoning: medium
Qwen3.5 397B A17B 86 62 80 87 81 88 91 90 83 121.70s 1.0k/7.1k 0.0229
reasoning: high
Haiku 4.5 79 70 86 83 85 86 86 84 82 32.43s 1.1k/1.7k 0.0094
reasoning: none
Haiku 4.5 79 78 86 78 84 87 86 76 82 30.62s 1.1k/1.7k 0.0096
reasoning: low
MiMo V2 Pro 82 78 89 71 88 88 87 73 82 109.16s 998/6.4k 0.0198
reasoning: low
DeepSeek V3.2 66 84 83 72 92 90 84 84 82 135.75s 1.1k/2.2k 0.0011
reasoning: none
Haiku 4.5 79 72 86 76 84 88 86 74 81 19.43s 1.1k/1.6k 0.0093
reasoning: medium
Haiku 4.5 84 70 84 82 84 86 82 76 81 20.99s 1.1k/1.6k 0.0092
reasoning: high
DeepSeek V3.2 87 85 85 83 56 90 92 72 81 119.19s 1.0k/2.7k 0.0012
reasoning: medium
MiMo V2 Pro 84 81 84 60 88 90 84 65 80 107.72s 998/6.1k 0.0185
reasoning: minimal
Qwen3.5 397B A17B 86 74 62 86 73 82 92 85 80 147.30s 1.0k/8.1k 0.0261
reasoning: minimal
DeepSeek V3.2 84 85 84 73 70 70 85 86 80 146.29s 1.0k/3.2k 0.0024
reasoning: high
Qwen3.5 397B A17B 88 48 83 63 85 89 86 88 79 139.67s 1.0k/8.5k 0.0242
reasoning: medium
DeepSeek V3.2 55 72 86 70 89 92 90 75 79 54.18s 1.0k/3.0k 0.0034
reasoning: minimal
MiniMax M2.7 74 74 84 89 90 92 48 71 78 21.41s 991/1.3k 0.0035
reasoning: none
MiniMax M2.7 82 71 82 82 92 92 40 86 78 33.04s 991/1.3k 0.0016
reasoning: medium
MiMo V2 Pro 80 85 80 57 89 78 82 75 78 78.76s 998/4.8k 0.0151
reasoning: high
Qwen3.5 397B A17B 70 78 84 63 81 91 92 61 78 115.08s 1.0k/9.8k 0.0325
reasoning: low
MiniMax M2.7 81 75 77 81 88 91 49 76 77 48.27s 991/2.2k 0.0028
reasoning: high
DeepSeek V3.2 84 51 86 70 56 88 89 89 77 106.50s 1.0k/4.7k 0.0023
reasoning: low
MiniMax M2.7 79 51 78 87 92 92 46 81 76 43.51s 991/1.7k 0.0022
reasoning: low
MiMo V2 Flash 77 73 88 82 83 87 68 51 76 35.16s 998/3.9k 0.0012
reasoning: minimal
MiMo V2 Flash 77 67 88 79 74 91 71 60 76 37.00s 998/3.3k 0.0010
reasoning: low
MiMo V2 Flash 74 74 85 80 82 89 63 63 76 33.36s 998/3.8k 0.0012
reasoning: medium
MiMo V2 Flash 71 70 88 82 81 82 65 62 75 34.66s 998/4.7k 0.0014
reasoning: high
Qwen3.5 397B A17B 64 83 82 69 91 83 72 55 75 105.12s 1.0k/8.7k 0.0317
reasoning: none
MiMo V2 Flash 71 69 88 76 80 85 62 61 74 32.01s 998/4.1k 0.0013
reasoning: none
MiniMax M2.7 82 54 86 88 65 90 31 83 72 30.39s 991/2.1k 0.0053
reasoning: minimal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment