Skip to content

Instantly share code, notes, and snippets.

@simbo1905
Created May 27, 2026 07:46
Show Gist options
  • Select an option

  • Save simbo1905/3bfa704fca1dccd62388aceba30ca43c to your computer and use it in GitHub Desktop.

Select an option

Save simbo1905/3bfa704fca1dccd62388aceba30ca43c to your computer and use it in GitHub Desktop.
mistral vibe tool calling
┃ i want to educate a friend who is an exeprience engineer on how the tool call for bash works in an agentic harness. this is mystral vibe
┃ agentic tool. please find me the logic around invoking bash a test of it would be ideal
Thought: The user wants to understand how the bash tool call works in their Mistral Vibe agentic harness. They want me to find the logic
around invoking bash, ideally with tests. Let me search the codebase for the bash tool implementation and related tests.
│ Explore Task — Find bash tool logic and tests
└ 18 toolcalls · 48.4s
ctrl+x down view subagents Message Actions esc
Thought: The user wants to educate a f Search ll works in the Mistral Vibe agentic
harness. Let me read the key files to l read the bash tool implementation
and the tests. Revert undo messages and file changes
Copy message text to clipboard
Let me read the key files to give you Fork create a new session
→ Read vibe/core/tools/builtins/bash.p
→ Read tests/tools/test_bash.py
Thought: Now let me also look at the base tool class and the collect_result test helper to give a complete picture.
Here's the full walkthrough of the bash tool call from the Mistral Vibe agentic harness. The key files are:
- Implementation: vibe/core/tools/builtins/bash.py
- Tests: tests/tools/test_bash.py
- Base class: vibe/core/tools/base.py
- Discovery/registration: vibe/core/tools/manager.py
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Architecture overview
Every tool is a class extending BaseTool[Args, Result, Config, State] — a generic abstract base. The harness discovers tool classes by
filesystem scanning: ToolManager globs every .py in vibe/core/tools/builtins/, imports each, and picks any non-abstract BaseTool
subclass. The class name is snake-cased to produce the tool name (Bash → "bash").
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
The Bash tool flow (end to end)
1. Args model — what the LLM provides
class BashArgs(BaseModel):
command: str
timeout: int | None = None # overrides the 300s default
2. Permission resolution (resolve_permission)
Called before execution. Returns a PermissionContext that gates whether the tool runs, asks the user, or is denied.
The logic decomposes a compound command (e.g. pwd && whoami) using tree-sitter bash AST into individual commands (_extract_commands).
Then for each sub-command:
1. Denylist check (_find_denylist_match) — hard NEVER if matched (e.g. vim, rm -rf /)
2. Standalone denylist (_is_standalone_denylisted) — python or bash without args are denied
3. find -exec guard (_has_find_execution_predicate) — find . -exec always forces ASK regardless of config
4. Outside-dir analysis (_collect_outside_dirs) — inspects path-manipulating commands (cp, rm, chmod, etc.) for arguments that reference
directories outside the workdir; those become OUTSIDE_DIRECTORY required permissions
5. Allowlist check — if all sub-commands are on the allowlist (common safe commands like echo, ls, git status, cat, pwd) and no
sensitive/outside-dir access, returns ALWAYS
6. Sensitive patterns (sudo by default) — always triggers ASK
7. Everything else — produces ASK with a RequiredPermission per unique non-allowlisted command
3. Execution (async run)
async def run(self, args: BashArgs, ctx) -> AsyncGenerator[ToolStreamEvent | BashResult, None]:
- Spawns a subprocess via asyncio.create_subprocess_shell with:
- A sanitized environment (CI=true, TERM=dumb, GIT_PAGER=cat, no TTY)
- $SHELL as the executable (Unix)
- start_new_session=True (process group isolation on Unix) — so killing the shell kills its children
- stdin=DEVNULL, stdout/stderr=PIPE
- Timeout: wraps proc.communicate() in asyncio.wait_for (default 300s). On timeout, kills the process group and raises ToolError.
- Output: decoded with errors="replace" and truncated to max_output_bytes (default 16KB). On non-zero return code, raises ToolError with stderr/stdout in the message.
- On success, yields a BashResult(command, stdout, stderr, returncode).
4. Result handling
_build_result distinguishes success (returncode 0) from failure — non-zero exit codes raise ToolError rather than yielding a result, which the harness turns into a ToolResultEvent with success=False.
Test structure highlights
The test file (tests/tools/test_bash.py, 311 lines) covers:
┌─────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│Test │What it verifies │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_runs_echo_successfully │Basic execution, stdout capture │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_fails_cat_command_with_missing_file │Non-zero exit → ToolError with stderr context │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_uses_effective_workdir │pwd matches the tmp dir (workdir isolation) │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_handles_timeout │sleep 2 with timeout=1 → timeout error │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_truncates_output_to_max_bytes │Hard cap on output size │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_decodes_non_utf8_bytes │Encoding resilience (errors="replace") │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_find_execution_predicates_force_ask │find -exec always triggers ASK even when permission=ALWAYS │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│test_resolve_permission │allowlisted/denylisted/mixed/empty cases │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│TestResolvePermissionWindowsSyntax │Windows-style commands (dir, type, findstr, cmd /k, etc.) — 8+ tests │
├─────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│TestDenylistWordBoundary │Whole-word matching: vi blocks vi but not vibe, bash -i blocks exactly that but not bash -init │
└─────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Key design decisions
- Permission is first-class, not baked into the tool. The tool declares its permission requirements; the harness asks the user (or auto-grants based on session permissions) before run() is ever called.
- No raw Popen/os.system. Everything is asyncio.create_subprocess_shell with process-group isolation and timeout.
- Tree-sitter AST for command extraction. Instead of naive &&/| splitting, it uses a real bash grammar to parse the command string into AST command nodes — far more robust for compounds like find . -exec foo \; | xargs bar && (cd x; make).
- Zero-trust environment. The subprocess gets CI=true, NONINTERACTIVE=1, DEBIAN_FRONTEND=noninteractive, pagers forced to cat, TERM=dumb — no TTY, no interactive traps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment