Skip to content

Instantly share code, notes, and snippets.

@haltakov
Created February 10, 2026 08:30
Show Gist options
  • Select an option

  • Save haltakov/72f732bacb7c81a056fc1853cc6e970a to your computer and use it in GitHub Desktop.

Select an option

Save haltakov/72f732bacb7c81a056fc1853cc6e970a to your computer and use it in GitHub Desktop.
// In agents.models:
"nvidia/moonshotai/kimi-k2.5": {
"alias": "kimi"
}
// In the root
"models": {
"mode": "merge",
"providers": {
"nvidia": {
"baseUrl": "https://integrate.api.nvidia.com/v1",
"apiKey": "<API_KEY>",
"api": "openai-completions",
"models": [
{
"id": "moonshotai/kimi-k2.5",
"name": "Kimi K2.5",
"reasoning": true,
"input": [
"text",
"image"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 256000,
"maxTokens": 8192
}
]
}
}
}
@rainman74
Copy link

rainman74 commented Feb 23, 2026

Where do you put the API key, replace “Ollama”? Also is it free ? The nvidia one is free.

Kimi K2.5 Cloud is currently free in Ollama, but you need an Ollama online account 😉

      "ollama": {
        "baseUrl": "http://localhost:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "kimi-k2.5:cloud",
            "name": "Kimi K2.5 Cloud",
            "cost": {
              "input": 0.002,
              "output": 0.006,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": 262144,
            "maxTokens": 262144
          }
        ]
      }

@Driaq
Copy link

Driaq commented Feb 23, 2026

Yes! Sry. I also messed up the output: 262144.
Free with Ollama free account. I am not sure about the limit cos I didn't follow tokens, but I hit the limit around the same on bith nvidia and ollama. But ollama is much faster. At least for me

@rainman74
Copy link

Yes! Sry. I also messed up the output: 262144. Free with Ollama free account. I am not sure about the limit cos I didn't follow tokens, but I hit the limit around the same on bith nvidia and ollama. But ollama is much faster. At least for me

Here are the essential facts regarding Kimi-K2.5:cloud quotas on the Ollama Cloud Free Tier:

  • Token Context: Supports up to 256,000 (256K) tokens for input.
  • Generation Limit: Output is capped at approximately 16,384 tokens per response.
  • Usage Model: Operates on a percentage-based system (usage credits) rather than a fixed token count.
  • Hourly Quota: Roughly 250,000 input tokens per hour (subject to system load).
  • Request Rate: Approximately 135 requests every 5 hours for typical chat usage.
  • Resource Intensity: As a 1-trillion parameter model, Kimi-K2.5 consumes your usage percentage faster than smaller models (like Llama 3).
  • Monitoring: Live usage tracking is available in your Ollama Dashboard under account settings.

My config: https://github.com/rainman74/OpenClaw-Configs/blob/main/.openclaw/Windows/openclaw.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment