Created
February 10, 2026 08:30
-
-
Save haltakov/72f732bacb7c81a056fc1853cc6e970a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // In agents.models: | |
| "nvidia/moonshotai/kimi-k2.5": { | |
| "alias": "kimi" | |
| } | |
| // In the root | |
| "models": { | |
| "mode": "merge", | |
| "providers": { | |
| "nvidia": { | |
| "baseUrl": "https://integrate.api.nvidia.com/v1", | |
| "apiKey": "<API_KEY>", | |
| "api": "openai-completions", | |
| "models": [ | |
| { | |
| "id": "moonshotai/kimi-k2.5", | |
| "name": "Kimi K2.5", | |
| "reasoning": true, | |
| "input": [ | |
| "text", | |
| "image" | |
| ], | |
| "cost": { | |
| "input": 0, | |
| "output": 0, | |
| "cacheRead": 0, | |
| "cacheWrite": 0 | |
| }, | |
| "contextWindow": 256000, | |
| "maxTokens": 8192 | |
| } | |
| ] | |
| } | |
| } | |
| } |
Yes! Sry. I also messed up the output: 262144.
Free with Ollama free account. I am not sure about the limit cos I didn't follow tokens, but I hit the limit around the same on bith nvidia and ollama. But ollama is much faster. At least for me
Yes! Sry. I also messed up the output: 262144. Free with Ollama free account. I am not sure about the limit cos I didn't follow tokens, but I hit the limit around the same on bith nvidia and ollama. But ollama is much faster. At least for me
Here are the essential facts regarding Kimi-K2.5:cloud quotas on the Ollama Cloud Free Tier:
- Token Context: Supports up to 256,000 (256K) tokens for input.
- Generation Limit: Output is capped at approximately 16,384 tokens per response.
- Usage Model: Operates on a percentage-based system (usage credits) rather than a fixed token count.
- Hourly Quota: Roughly 250,000 input tokens per hour (subject to system load).
- Request Rate: Approximately 135 requests every 5 hours for typical chat usage.
- Resource Intensity: As a 1-trillion parameter model, Kimi-K2.5 consumes your usage percentage faster than smaller models (like Llama 3).
- Monitoring: Live usage tracking is available in your Ollama Dashboard under account settings.
My config: https://github.com/rainman74/OpenClaw-Configs/blob/main/.openclaw/Windows/openclaw.json
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Kimi K2.5 Cloud is currently free in Ollama, but you need an Ollama online account 😉