Skip to content

Instantly share code, notes, and snippets.

@Hegghammer
Last active February 9, 2026 13:12
Show Gist options
  • Select an option

  • Save Hegghammer/86d2070c0be8b3c62083d6653ad27c23 to your computer and use it in GitHub Desktop.

Select an option

Save Hegghammer/86d2070c0be8b3c62083d6653ad27c23 to your computer and use it in GitHub Desktop.
Working Clawdbot/Moltbot setup with local Ollama model

Working Clawdbot/Moltbot setup with local Ollama model

[Update 2026-02-02: nemotron-3-nano also performs well on same setup; see comment below]

This is a guide to setting up Clawdbot/Moltbot with a local Ollama model that actually works -- meaning it has good tool use and decent speed. The main requirement is 48GB of VRAM. I have yet to find a model that fits on less than this and still works on Moltbot.

The setup involves creating a tool-tuned variant of qwen2.5:72b and modifying a range of configs in Moltbot. At the end you'll get a local Moltbot instance that can use tools (exec, read, write, web search), read skills, and perform agentic tasks without any cloud API dependencies. On my system I get ~16 t/s and have yet to come across a tool/skill that my bot can't use.

Claude Opus wrote the first draft of this Gist, then I (a human) checked and edited it.

Background

I was keen to find an Ollama model that would fit my 48GB VRAM rig and also work with Moltbot. So I set up two Moltbot agents, EngineerBot and TestBot, and gave the first Claude Opus 4.5 access and the latter only Ollama access. I gave EngineerBot SSH access to TestBot and asked it to craft and run a suite of tool-use tests. EngineerBot and I then iteratively tweaked configs on TestBot together until we found the best setup.

Models tested:

  • GLM-4.7-Flash (q4_K_M)
  • Qwen3-Coder
  • GPT-OSS 20B
  • Qwen 2.5 72B Instruct (q4_K_M)
  • Qwen 2.5 72B Instruct (q3_K_M)

The winner was a tuned (details below) version of qwen2.5:72b-instruct-q3_K_M. The smaller models struggled with Moltbot's system prompt complexity — rambling about identity, calling wrong tools, or hallucinating tool names. The 72B models performed significantly better at tool use, with the Q3 quantization providing the best balance of performance and VRAM headroom.

The process uncovered several non-obvious issues:

  • The api setting mismatch that causes silent empty responses
  • The missing "read" permission that prevents skill file access
  • The need for a custom system prompt to make Qwen actually use tools instead of describing them
  • Environment variable inheritance issues with exec subprocesses
  • Verbosity problems requiring SOUL.md guidance

What you see below is the configuration that emerged from this testing — the setup that, for me at least, gave the best tool-use performance on 48GB VRAM.

Prerequisites

Hardware

  • GPU: Minimum 24GB VRAM recommended for 72B models (quantized). We used 2x RTX 3090 (48GB total).
  • RAM: 64GB+ recommended
  • Storage: ~50GB for the model

VRAM note: The Q3 quantized model with 16k context window uses ~43GB VRAM, leaving headroom on a 48GB setup for other tasks.

Performance: Expect ~16 tokens/sec on 2x RTX 3090 with flash attention enabled.

Software

  • Ollama installed and running (https://ollama.ai)
  • Moltbot installed (npm install -g clawdbot)
  • Node.js 22+

Ollama Server Configuration

Set these environment variables for the Ollama server (in your systemd service, Docker compose, or shell):

OLLAMA_CONTEXT_LENGTH=16384    # Context window size
OLLAMA_FLASH_ATTENTION=1       # Enable flash attention (faster, less VRAM)
OLLAMA_NEW_ENGINE=1            # Use new inference engine

For Docker, add to your docker-compose.yml:

environment:
  - OLLAMA_CONTEXT_LENGTH=16384
  - OLLAMA_FLASH_ATTENTION=1
  - OLLAMA_NEW_ENGINE=1

The Model

The qwen-agentic model is a custom Ollama model built from Qwen 2.5 72B with a tool-tuned system prompt. Create it like this:

Step 1: Pull the base model

ollama pull qwen2.5:72b-instruct-q3_K_M

Step 2: Create the Modelfile

Save this as qwen-agentic.Modelfile:

FROM qwen2.5:72b-instruct-q3_K_M

SYSTEM """You are a helpful assistant with access to tools.

CRITICAL TOOL BEHAVIOR:
- When you have tools available, USE THEM directly without asking for confirmation
- Don't describe what you could do — just do it
- If the user asks about weather, check the weather. If they ask to search something, search it
- Never say "I don't have access to X" when you have a tool that provides X
- Check your available tools and use them immediately
- Execute the task, then report results

Be concise. Act decisively. Don't ask permission for routine tool use."""

That's it — the tool-calling TEMPLATE is inherited automatically from the base Qwen model.

Step 3: Build the model

ollama create qwen-agentic -f qwen-agentic.Modelfile

Why this matters

The custom SYSTEM prompt is crucial — it tells the model to:

  • Use tools immediately without asking for confirmation
  • Be decisive instead of explaining what it could do
  • Act first, report results instead of describing plans

Without this, Qwen tends to describe tools rather than use them, or ask permission before every action.


Moltbot Configuration

Create or edit ~/.clawdbot/clawdbot.json:

1. Ollama Provider (Critical Settings)

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://localhost:11434/v1",
        "apiKey": "ollama",
        "api": "openai-completions",
        "authHeader": false,
        "models": [
          {
            "id": "qwen-agentic:latest",
            "name": "Qwen 2.5 72B Agentic",
            "reasoning": false,
            "input": ["text"],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": 32768,
            "maxTokens": 8192
          }
        ]
      }
    }
  }
}

⚠️ Critical: The api setting MUST be "openai-completions", NOT "openai-responses".

  • openai-responses is for OpenAI's newer Responses API
  • Ollama uses standard chat completions = openai-completions
  • Getting this wrong causes empty responses — the model runs but Moltbot can't parse the output

2. Agent Defaults

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/qwen-agentic:latest",
        "fallbacks": []
      },
      "models": {
        "ollama/qwen-agentic:latest": {
          "alias": "qwen-agentic"
        }
      },
      "workspace": "/home/youruser/clawd",
      "skipBootstrap": false,
      "memorySearch": {
        "enabled": false
      },
      "compaction": {
        "mode": "safeguard"
      },
      "maxConcurrent": 4,
      "subagents": {
        "maxConcurrent": 8
      }
    }
  }
}

3. Tools Configuration (Critical for Tool Use)

{
  "tools": {
    "profile": "coding",
    "allow": [
      "read",
      "exec",
      "write",
      "edit"
    ],
    "exec": {
      "host": "gateway",
      "ask": "off",
      "security": "full"
    }
  }
}

Add "web_search" and "web_fetch" to allow if you want web search (requires additional config).

Critical settings explained:

Setting Value Why
allow includes "read" Required Without this, the agent can't read skill files!
exec.ask "off" Prevents approval popups for every command
exec.security "full" Allows all commands, not just an allowlist
exec.host "gateway" Commands run on the gateway host

Complete Working Config

Here's a full, tested configuration:

{
  "diagnostics": {
    "enabled": true,
    "flags": ["*"]
  },
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://localhost:11434/v1",
        "apiKey": "ollama",
        "api": "openai-completions",
        "authHeader": false,
        "models": [
          {
            "id": "qwen-agentic:latest",
            "name": "Qwen 2.5 72B Agentic",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 32768,
            "maxTokens": 8192
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/qwen-agentic:latest",
        "fallbacks": []
      },
      "models": {
        "ollama/qwen-agentic:latest": {
          "alias": "qwen-agentic"
        }
      },
      "workspace": "/home/youruser/clawd",
      "memorySearch": { "enabled": false },
      "compaction": { "mode": "safeguard" }
    }
  },
  "tools": {
    "profile": "coding",
    "allow": ["read", "exec", "write", "edit"],
    "exec": {
      "host": "gateway",
      "ask": "off",
      "security": "full"
    }
  }
}

Taming Verbosity (SOUL.md)

Local models like Qwen tend to over-explain. They'll dump entire skill documentation, show full JSON responses, and ask for confirmation before every action.

Fix: Add explicit brevity instructions to your workspace's SOUL.md:

## Brevity is a Virtue

**Be concise.** This is critical. You tend to over-explain. Fight that urge.

- Do NOT dump skill documentation at the user — just use it
- Do NOT show full JSON responses — summarize the result
- Do NOT ask for confirmation when env vars or configs are already set — trust your setup
- Do NOT explain what you're about to do in detail — just do it
- One short sentence confirming success is enough: "Done, server lights are on."

**Bad:** "I'll use the curl command to turn on the switch via the Home Assistant API. First, I need to ensure the environment variables..."

**Good:** "Turning on server lights..." *[runs command]* "Done."

When in doubt: fewer words.

This won't make it as concise as Claude, but it significantly reduces the verbosity.


Environment Variables

If your skills need environment variables (API tokens, URLs, etc.), there are two approaches:

Option 1: ~/.clawdbot/.env (Simpler)

Add variables to ~/.clawdbot/.env:

HA_URL=https://your-homeassistant.local
HA_TOKEN=your-long-lived-token

Caveat: These must be present before starting the gateway. If you add them later and restart, they may not propagate to exec subprocesses reliably.

Option 2: Wrapper Scripts (More Robust)

Create wrapper scripts with credentials baked in:

#!/bin/bash
# ~/bin/ha - Home Assistant wrapper
export HA_URL="https://your-homeassistant.local"
export HA_TOKEN="your-token"

case "$1" in
  on)    curl -s -X POST "$HA_URL/api/services/switch/turn_on" -H "Authorization: Bearer $HA_TOKEN" -H "Content-Type: application/json" -d "{\"entity_id\": \"$2\"}" ;;
  off)   curl -s -X POST "$HA_URL/api/services/switch/turn_off" -H "Authorization: Bearer $HA_TOKEN" -H "Content-Type: application/json" -d "{\"entity_id\": \"$2\"}" ;;
  state) curl -s "$HA_URL/api/states/$2" -H "Authorization: Bearer $HA_TOKEN" ;;
esac

Then document in TOOLS.md:

### Helper Script
Use `~/bin/ha` for Home Assistant:
- `~/bin/ha on switch.living_room`
- `~/bin/ha off switch.living_room`
- `~/bin/ha state switch.living_room`

Installing Skills

Skills are just SKILL.md files that tell the agent how to use tools. Copy them to your workspace:

mkdir -p ~/clawd/skills/skill-name
cp /path/to/SKILL.md ~/clawd/skills/skill-name/

The agent will automatically discover skills in ~/clawd/skills/*/SKILL.md.


Troubleshooting

Empty Responses

Symptom: Model runs (you see it thinking) but returns nothing. Cause: Wrong api setting. Fix: Change api: "openai-responses"api: "openai-completions"

Approval Popups for Every Command

Symptom: Web UI asks for approval before running commands. Fix: Set tools.exec.ask: "off"

"Permission denied" or Tool Not Available

Symptom: Agent says it can't use a tool. Fix: Add the tool to tools.allow array. Common missing one: "read" (needed for reading skill files).

Environment Variables Not Working

Symptom: 401 Unauthorized or "variable not set" errors. Causes:

  1. Variables added after gateway started
  2. Shell inheritance issues

Fixes:

  1. Add vars to .env before first start
  2. Use wrapper scripts with credentials baked in
  3. Add vars to systemd service file directly

Model Too Verbose

Symptom: Agent dumps documentation, explains everything, asks unnecessary questions. Fix: Add brevity instructions to SOUL.md (see above).


Verifying It Works

  1. Start the gateway: clawdbot gateway start
  2. Open the web UI: http://localhost:18789
  3. Test basic tool use:
    • "What time is it?" (should run date)
    • "List files in the current directory" (should run ls)
    • "What's the weather?" (if you have the weather skill)

If these work, your setup is good.


Summary Checklist

  • Ollama running with qwen-agentic model
  • api: "openai-completions" in provider config
  • "read" in tools.allow array
  • exec.ask: "off" and exec.security: "full"
  • Brevity instructions in SOUL.md
  • Environment variables in .env (before first start) or wrapper scripts
  • Skills copied to workspace

Guide based on real debugging sessions, January 2026. Your mileage may vary with different models or Moltbot versions.

@Hegghammer
Copy link
Author

Hegghammer commented Feb 2, 2026

Addendum on context window size and VRAM requirements for nemotron. In my tests I get:

Context VRAM Headroom (48GB)
32K 25.6GB 22.4GB
65K 27.8GB 20.2GB
131K 32.2GB 15.8GB
262K 39.7GB 8.3GB
512K 40.2GB 7.8GB
768K 49.0GB ❌ exceeds

So basically nemotron gives you 16x more context than qwen for the same VRAM usage.

@contextablemark
Copy link

Addendum on context window size and VRAM requirements for nemotron. In my tests I get:
So basically nemotron gives you 16x more context than qwen for the same VRAM usage.

Any feel for whether Nemotron suffers context rot as the window starts to fill up?

Also, have you looked at any of the Qwen3 Next models? https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 in particular looks like it could be interesting.

@Hegghammer
Copy link
Author

So far I've only had short convos on Nemotron, so I don't know how bad the context rot is.

I can't test Qwen3-next unfortunately. The FP8 version is too big, and the INT4 has an architecture that my old RTX 3090s don't seem to handle.

@Hegghammer
Copy link
Author

Hegghammer commented Feb 4, 2026

Update: this repo has quantizations of Qwen 3 Coder Next that worked for me. I got the Q3_K_M up and running via llama-server (the GGUF in Ollama doesn't bring the tool capability). I can't test systematically any longer because the new security features in OpenClaw stop ExpertBot from doing what it wants on TestBot, but my subjective first impression is that it is a bit faster and just as smart as nemotron-agentic. With 256k context it takes up 42GB RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment