-
-
Save haltakov/72f732bacb7c81a056fc1853cc6e970a to your computer and use it in GitHub Desktop.
| // In agents.models: | |
| "nvidia/moonshotai/kimi-k2.5": { | |
| "alias": "kimi" | |
| } | |
| // In the root | |
| "models": { | |
| "mode": "merge", | |
| "providers": { | |
| "nvidia": { | |
| "baseUrl": "https://integrate.api.nvidia.com/v1", | |
| "apiKey": "<API_KEY>", | |
| "api": "openai-completions", | |
| "models": [ | |
| { | |
| "id": "moonshotai/kimi-k2.5", | |
| "name": "Kimi K2.5", | |
| "reasoning": true, | |
| "input": [ | |
| "text", | |
| "image" | |
| ], | |
| "cost": { | |
| "input": 0, | |
| "output": 0, | |
| "cacheRead": 0, | |
| "cacheWrite": 0 | |
| }, | |
| "contextWindow": 256000, | |
| "maxTokens": 8192 | |
| } | |
| ] | |
| } | |
| } | |
| } |
DarthVad3rx
commented
Feb 21, 2026
via email
Where do you put the API key, replace “Ollama”? Also is it free ? The nvidia one is free.
Kimi K2.5 Cloud is currently free in Ollama, but you need an Ollama online account 😉
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"apiKey": "ollama-local",
"api": "openai-completions",
"models": [
{
"id": "kimi-k2.5:cloud",
"name": "Kimi K2.5 Cloud",
"cost": {
"input": 0.002,
"output": 0.006,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 262144,
"maxTokens": 262144
}
]
}
Yes! Sry. I also messed up the output: 262144.
Free with Ollama free account. I am not sure about the limit cos I didn't follow tokens, but I hit the limit around the same on bith nvidia and ollama. But ollama is much faster. At least for me
Yes! Sry. I also messed up the output: 262144. Free with Ollama free account. I am not sure about the limit cos I didn't follow tokens, but I hit the limit around the same on bith nvidia and ollama. But ollama is much faster. At least for me
Here are the essential facts regarding Kimi-K2.5:cloud quotas on the Ollama Cloud Free Tier:
- Token Context: Supports up to 256,000 (256K) tokens for input.
- Generation Limit: Output is capped at approximately 16,384 tokens per response.
- Usage Model: Operates on a percentage-based system (usage credits) rather than a fixed token count.
- Hourly Quota: Roughly 250,000 input tokens per hour (subject to system load).
- Request Rate: Approximately 135 requests every 5 hours for typical chat usage.
- Resource Intensity: As a 1-trillion parameter model, Kimi-K2.5 consumes your usage percentage faster than smaller models (like Llama 3).
- Monitoring: Live usage tracking is available in your Ollama Dashboard under account settings.
My config: https://github.com/rainman74/OpenClaw-Configs/blob/main/.openclaw/Windows/openclaw.json