Raw data glm-4.6-results.tar.gz
| Benchmark | Kimi-K2-Instruct | GLM-4.6-FP8 |
|---|---|---|
| Overall Accuracy | 45.62% | 60.13% |
| Latency Mean | 3.32 s | 6.66 s |
| Latency Std Dev | 10.03 s | 7.50 s |
| Latency 95th Percentile | 7.83 s | 16.14 s |
| Benchmark | Kimi-K2-Instruct | GLM-4.6-FP8 |
|---|---|---|
| Non-Live AST Accuracy | 89.42% | 87.48% |
| Non-Live Simple AST | 79.67% | 73.92% |
| Non-Live Multiple AST | 92.50% | 90.00% |
| Non-Live Parallel AST | 93.50% | 90.00% |
| Non-Live Parallel Multiple AST | 92.00% | 89.00% |
| Benchmark | Kimi-K2-Instruct | GLM-4.6-FP8 |
|---|---|---|
| Live Accuracy | 78.61% | 80.74% |
| Live Simple AST | 87.98% | 90.00% |
| Live Multiple AST | 76.07% | 78.13% |
| Live Parallel AST | 100.00% | 92.50% |
| Live Parallel Multiple AST | 75.00% | 70.83% |
| Benchmark | Kimi-K2-Instruct | GLM-4.6-FP8 |
|---|---|---|
| Multi-Turn Accuracy | 43.62% | 52.00% |
| Multi-Turn Base | 54.50% | 57.00% |
| Multi-Turn Miss Func | 46.50% | 44.00% |
| Multi-Turn Miss Param | 37.00% | 39.50% |
| Multi-Turn Long Context | 36.50% | 42.00% |
| Benchmark | Kimi-K2-Instruct | GLM-4.6-FP8 |
|---|---|---|
| Web Search Accuracy | 2.00% | 5.00% |
| Web Search Base | 1.00% | 4.00% |
| Web Search No Snippet | 3.00% | 2.00% |
| Memory Accuracy | 39.78% | 56.13% |
| Memory KV | 26.45% | 52.26% |
| Memory Vector | 29.68% | 57.42% |
| Memory Recursive Summarization | 63.23% | 58.71% |
| Benchmark | Kimi-K2-Instruct | GLM-4.6-FP8 |
|---|---|---|
| Relevance Detection | 81.25% | 75.00% |
| Irrelevance Detection | 73.75% | 83.88% |