Skip to content

Instantly share code, notes, and snippets.

@thomasdavis
Last active December 17, 2025 01:59
Show Gist options
  • Select an option

  • Save thomasdavis/13e3d2d47693b8f2fe19cd59c7289f01 to your computer and use it in GitHub Desktop.

Select an option

Save thomasdavis/13e3d2d47693b8f2fe19cd59c7289f01 to your computer and use it in GitHub Desktop.
tool_disocvery.md

Dynamic Tool Selection for AI Agents

A guide to implementing per-step tool filtering using ensemble scoring. This approach solves the problem of LLMs having too many tools to choose from effectively.

The Problem

When you give an LLM agent 40+ tools, several issues emerge:

  1. Context bloat: Tool definitions consume tokens, leaving less room for actual conversation
  2. Selection confusion: LLMs struggle to pick the right tool when faced with many similar options
  3. Hallucination risk: More tools = more chances to pick the wrong one or hallucinate parameters

The solution: dynamically select 8-12 relevant tools per step based on conversation context.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                      AI SDK Agent                           │
│                                                             │
│  ┌─────────────┐    prepareStep()    ┌─────────────────┐   │
│  │   Messages  │ ──────────────────► │  ToolSelector   │   │
│  │   + Steps   │                     │                 │   │
│  └─────────────┘                     │  ┌───────────┐  │   │
│                                      │  │ToolScorer │  │   │
│                                      │  │           │  │   │
│  ┌─────────────┐    activeTools[]    │  │ BM25F     │  │   │
│  │   LLM Call  │ ◄────────────────── │  │ Focus     │  │   │
│  │  (filtered) │                     │  │ Transition│  │   │
│  └─────────────┘                     │  │ Anchors   │  │   │
│                                      │  └───────────┘  │   │
│                                      └─────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Integration with AI SDK

The key is the prepareStep callback in the AI SDK's ToolLoopAgent:

import { ToolLoopAgent } from 'ai';
import { ToolSelector } from './search';

const selector = new ToolSelector();
const toolKeys = Object.keys(tools);

const agent = new ToolLoopAgent({
  model,
  instructions,
  tools,
  prepareStep: ({ messages, steps, stepNumber }) => {
    // Select tools based on current context
    const activeTools = selector.selectActiveTools(messages, steps);

    return { activeTools };
  }
});

Ensemble Scoring

Pure BM25 text matching isn't enough. "Add a market role" might match roleUpdate higher than roleAdd because "role" appears more times in update's description.

The solution is ensemble scoring - combining multiple signals:

score = (0.40 × BM25F)
      + (0.15 × focus)
      + (0.15 × transition)
      + (0.10 × sticky)
      + microAnchor          // Additive, not weighted
      - (0.20 × avoidPenalty)

Signal 1: BM25F (Lexical Relevance)

BM25 with field weights. Different parts of tool metadata matter more:

const FIELD_WEIGHTS = {
  name: 3.0,        // Tool name is highly relevant
  title: 2.5,       // Human-readable title
  keywords: 3.0,    // Explicit keywords we define
  examples: 2.0,    // Example phrases
  description: 1.0, // General description
  category: 0.5,    // Less important
  avoidWhen: 0.3,   // Low weight (mainly for penalty)
};

Key implementation details:

  • Split camelCase tool names: roleAddrole add
  • Compute per-field IDF (inverse document frequency)
  • Tokenize and score each field separately, then combine with weights

Signal 2: Focus Tracking (Entity Context)

Track what entity type the agent is currently working with:

const TOOL_ENTITY_TYPE = {
  groupAdd: 'group',
  groupUpdate: 'group',
  roleAdd: 'role',
  roleUpdate: 'role',
  activityAdd: 'activity',
  queryData: null,  // Doesn't change focus
  navigateTo: null,
};

After each tool call, update the focus. Then boost tools matching current focus:

getFocusBoost(toolName) {
  const toolEntityType = TOOL_ENTITY_TYPE[toolName];

  if (toolEntityType === this.currentFocus) {
    return 1.0;  // Exact match
  }

  // Related types get partial boost
  // (group → role is common flow)
  const RELATED_BOOSTS = {
    group: { role: 0.6, activity: 0.3 },
    role: { group: 0.5, activity: 0.7 },
    activity: { role: 0.6, group: 0.3 },
  };

  return RELATED_BOOSTS[this.currentFocus]?.[toolEntityType] || 0.2;
}

Signal 3: Transition Matrix (Tool Sequences)

Hand-seed probabilities for P(nextTool | lastTool):

const TRANSITIONS = {
  groupAdd: {
    roleAdd: 0.8,        // Very likely after creating a group
    groupUpdate: 0.5,
    groupAssignLead: 0.6,
    activityAdd: 0.3,
  },
  roleAdd: {
    activityAdd: 0.8,    // Very likely after creating a role
    roleAdd: 0.6,        // Might add more roles
    roleUpdate: 0.4,
  },
  // ... etc
};

This captures natural workflows without being rigid.

Signal 4: Sticky Boost (Recently Used)

Boost tools that were recently used, with decay:

#getStickyBoost(toolName) {
  const index = this.recentTools.indexOf(toolName);
  if (index === -1) return 0;

  // Decay by recency: most recent = 1.0, then 0.7, 0.4, 0.2, 0.1
  const decays = [1.0, 0.7, 0.4, 0.2, 0.1];
  return decays[index] || 0;
}

Signal 5: Micro-Anchors (Pattern Hard Triggers)

This is the most important signal for preventing catastrophic misses. When the query clearly indicates intent, apply a hard additive boost:

const ANCHOR_PATTERNS = [
  {
    pattern: /\b(add|create|new|make)\b.*\b(role|position|job)\b/i,
    tools: ['roleAdd'],
    boost: 2.0,
  },
  {
    pattern: /\b(role|position|job)\b.*\b(add|create|new)\b/i,
    tools: ['roleAdd'],
    boost: 2.0,  // Handles reversed word order
  },
  {
    pattern: /\b(move|transfer|relocate)\b.*\b(role|position)\b/i,
    tools: ['roleMove'],
    boost: 1.8,
  },
  // ... more patterns
];

Micro-anchors are additive, not weighted because they represent high-confidence intent detection that should override other signals.

Signal 6: Avoid Penalty

Tools can define when they shouldn't be used:

// In tool config:
{
  name: 'roleAdd',
  avoidWhen: 'not for updating existing roles, use roleUpdate',
}

If the query contains terms from avoidWhen (like "update"), apply a penalty.

Tool Configuration

Each tool needs rich metadata for the scoring to work:

export const toolsConfig = {
  roleAdd: {
    title: 'Add Roles',
    description: 'Create new roles with associated activities',
    category: 'Roles',
    examples: [
      'create a new position',
      'add a manager role',
      'create engineering team structure',
    ],
    keywords: ['create', 'new', 'role', 'position', 'job', 'hire', 'add'],
    avoidWhen: 'not for updating existing roles, use roleUpdate',
    tool: roleAddTool,  // The actual AI SDK tool
  },
  // ... more tools
};

Metadata Fields

Field Purpose Weight
name Tool key (camelCase split) 3.0
title Human-readable name 2.5
keywords Explicit trigger words 3.0
examples Example user phrases 2.0
description What the tool does 1.0
category Grouping (Roles, Groups, etc.) 0.5
avoidWhen When NOT to use this tool 0.3

Exploration Slot

Reserve one slot for exploration - randomly sample a tool from ranks 9-20:

selectTools(stepQuery, limit = 9, withExploration = true) {
  // ... score and sort tools

  const topCount = withExploration ? limit - 1 : limit;
  const topTools = sorted.slice(0, topCount);

  if (withExploration) {
    const explorationPool = sorted.slice(topCount, 20);
    const explorationTool = this.#sampleFromPool(explorationPool);
    topTools.push(explorationTool);
  }

  return [...CORE_TOOLS, ...topTools];
}

This prevents the system from getting stuck in local optima.

Core Tools

Some tools should always be available:

const CORE_TOOLS = ['queryData'];  // Always need to query data

These are prepended to the selection regardless of scoring.

Debugging

Add comprehensive logging:

getDebugInfo(stepQuery) {
  return {
    focus: this.focusTracker.getSummary(),
    lastTool: this.transitionMatrix.getLastTool(),
    recentTools: this.recentTools,
    topScores: sorted.slice(0, 15),
    anchorBoosts: Object.fromEntries(this.microAnchors.getBoosts(stepQuery)),
  };
}

Log on every step:

console.log('[ToolSelector] stepQuery:', stepQuery.substring(0, 150));
console.log('[ToolSelector] activeTools:', activeTools);
console.log('[ToolSelector] top 5 scores:', debug.topScores.slice(0, 5));

Common Issues

1. Wrong Tools Selected

Check your micro-anchors first. If "add a role" isn't selecting roleAdd, add a pattern:

{
  pattern: /\b(add|create)\b.*\brole\b/i,
  tools: ['roleAdd'],
  boost: 2.0,
}

2. Tools Not Attaching

Verify tool key naming matches between:

  • Your tools object keys
  • The activeTools array returned
  • Your tool config keys

Use the AI SDK debugger to see what's actually being attached.

3. BM25 Ranking Weird Results

BM25 alone will fail on cases like "add a market role" because:

  • "role" appears more in roleUpdate's description
  • "market" might match something unrelated

That's why ensemble scoring exists. Don't try to fix BM25 alone - rely on micro-anchors for clear intent.

File Structure

search/
├── index.js           # Exports
├── toolScorer.js      # Main ensemble orchestration
├── bm25f.js          # BM25 with field weights
├── focusTracker.js   # Entity focus tracking
├── transitionMatrix.js # Tool-to-tool probabilities
├── microAnchors.js   # Pattern-based hard boosts
└── ToolSelector.js   # Integration layer (builds query from messages)

Summary

  1. Don't rely on BM25 alone - it will fail on ambiguous queries
  2. Micro-anchors are essential - they catch obvious intent that BM25 misses
  3. Validate tool keys - mismatches are silent failures
  4. Log everything - you need visibility into what's being selected
  5. Use exploration - prevents getting stuck
  6. Focus tracking helps continuity - boosts related tools in a workflow
  7. Transition matrix captures workflows - "after groupAdd, roleAdd is likely"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment