Dynamic Tool Selection for AI Agents

A guide to implementing per-step tool filtering using ensemble scoring. This approach solves the problem of LLMs having too many tools to choose from effectively.

The Problem

When you give an LLM agent 40+ tools, several issues emerge:

Context bloat: Tool definitions consume tokens, leaving less room for actual conversation
Selection confusion: LLMs struggle to pick the right tool when faced with many similar options
Hallucination risk: More tools = more chances to pick the wrong one or hallucinate parameters

The solution: dynamically select 8-12 relevant tools per step based on conversation context.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                      AI SDK Agent                           │
│                                                             │
│  ┌─────────────┐    prepareStep()    ┌─────────────────┐   │
│  │   Messages  │ ──────────────────► │  ToolSelector   │   │
│  │   + Steps   │                     │                 │   │
│  └─────────────┘                     │  ┌───────────┐  │   │
│                                      │  │ToolScorer │  │   │
│                                      │  │           │  │   │
│  ┌─────────────┐    activeTools[]    │  │ BM25F     │  │   │
│  │   LLM Call  │ ◄────────────────── │  │ Focus     │  │   │
│  │  (filtered) │                     │  │ Transition│  │   │
│  └─────────────┘                     │  │ Anchors   │  │   │
│                                      │  └───────────┘  │   │
│                                      └─────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Integration with AI SDK

The key is the prepareStep callback in the AI SDK's ToolLoopAgent:

import { ToolLoopAgent } from 'ai';
import { ToolSelector } from './search';

const selector = new ToolSelector();
const toolKeys = Object.keys(tools);

const agent = new ToolLoopAgent({
  model,
  instructions,
  tools,
  prepareStep: ({ messages, steps, stepNumber }) => {
    // Select tools based on current context
    const activeTools = selector.selectActiveTools(messages, steps);

    return { activeTools };
  }
});

Ensemble Scoring

Pure BM25 text matching isn't enough. "Add a market role" might match roleUpdate higher than roleAdd because "role" appears more times in update's description.

The solution is ensemble scoring - combining multiple signals:

score = (0.40 × BM25F)
      + (0.15 × focus)
      + (0.15 × transition)
      + (0.10 × sticky)
      + microAnchor          // Additive, not weighted
      - (0.20 × avoidPenalty)

Signal 1: BM25F (Lexical Relevance)

BM25 with field weights. Different parts of tool metadata matter more:

const FIELD_WEIGHTS = {
  name: 3.0,        // Tool name is highly relevant
  title: 2.5,       // Human-readable title
  keywords: 3.0,    // Explicit keywords we define
  examples: 2.0,    // Example phrases
  description: 1.0, // General description
  category: 0.5,    // Less important
  avoidWhen: 0.3,   // Low weight (mainly for penalty)
};

Key implementation details:

Split camelCase tool names: roleAdd → role add
Compute per-field IDF (inverse document frequency)
Tokenize and score each field separately, then combine with weights

Signal 2: Focus Tracking (Entity Context)

Track what entity type the agent is currently working with:

const TOOL_ENTITY_TYPE = {
  groupAdd: 'group',
  groupUpdate: 'group',
  roleAdd: 'role',
  roleUpdate: 'role',
  activityAdd: 'activity',
  queryData: null,  // Doesn't change focus
  navigateTo: null,
};

After each tool call, update the focus. Then boost tools matching current focus:

getFocusBoost(toolName) {
  const toolEntityType = TOOL_ENTITY_TYPE[toolName];

  if (toolEntityType === this.currentFocus) {
    return 1.0;  // Exact match
  }

  // Related types get partial boost
  // (group → role is common flow)
  const RELATED_BOOSTS = {
    group: { role: 0.6, activity: 0.3 },
    role: { group: 0.5, activity: 0.7 },
    activity: { role: 0.6, group: 0.3 },
  };

  return RELATED_BOOSTS[this.currentFocus]?.[toolEntityType] || 0.2;
}

Signal 3: Transition Matrix (Tool Sequences)

Hand-seed probabilities for P(nextTool | lastTool):

const TRANSITIONS = {
  groupAdd: {
    roleAdd: 0.8,        // Very likely after creating a group
    groupUpdate: 0.5,
    groupAssignLead: 0.6,
    activityAdd: 0.3,
  },
  roleAdd: {
    activityAdd: 0.8,    // Very likely after creating a role
    roleAdd: 0.6,        // Might add more roles
    roleUpdate: 0.4,
  },
  // ... etc
};

This captures natural workflows without being rigid.

Signal 4: Sticky Boost (Recently Used)

Boost tools that were recently used, with decay:

#getStickyBoost(toolName) {
  const index = this.recentTools.indexOf(toolName);
  if (index === -1) return 0;

  // Decay by recency: most recent = 1.0, then 0.7, 0.4, 0.2, 0.1
  const decays = [1.0, 0.7, 0.4, 0.2, 0.1];
  return decays[index] || 0;
}

Signal 5: Micro-Anchors (Pattern Hard Triggers)

This is the most important signal for preventing catastrophic misses. When the query clearly indicates intent, apply a hard additive boost:

const ANCHOR_PATTERNS = [
  {
    pattern: /\b(add|create|new|make)\b.*\b(role|position|job)\b/i,
    tools: ['roleAdd'],
    boost: 2.0,
  },
  {
    pattern: /\b(role|position|job)\b.*\b(add|create|new)\b/i,
    tools: ['roleAdd'],
    boost: 2.0,  // Handles reversed word order
  },
  {
    pattern: /\b(move|transfer|relocate)\b.*\b(role|position)\b/i,
    tools: ['roleMove'],
    boost: 1.8,
  },
  // ... more patterns
];

Micro-anchors are additive, not weighted because they represent high-confidence intent detection that should override other signals.

Signal 6: Avoid Penalty

Tools can define when they shouldn't be used:

// In tool config:
{
  name: 'roleAdd',
  avoidWhen: 'not for updating existing roles, use roleUpdate',
}

If the query contains terms from avoidWhen (like "update"), apply a penalty.

Tool Configuration

Each tool needs rich metadata for the scoring to work:

export const toolsConfig = {
  roleAdd: {
    title: 'Add Roles',
    description: 'Create new roles with associated activities',
    category: 'Roles',
    examples: [
      'create a new position',
      'add a manager role',
      'create engineering team structure',
    ],
    keywords: ['create', 'new', 'role', 'position', 'job', 'hire', 'add'],
    avoidWhen: 'not for updating existing roles, use roleUpdate',
    tool: roleAddTool,  // The actual AI SDK tool
  },
  // ... more tools
};

Metadata Fields

Field	Purpose	Weight
`name`	Tool key (camelCase split)	3.0
`title`	Human-readable name	2.5
`keywords`	Explicit trigger words	3.0
`examples`	Example user phrases	2.0
`description`	What the tool does	1.0
`category`	Grouping (Roles, Groups, etc.)	0.5
`avoidWhen`	When NOT to use this tool	0.3

Exploration Slot

Reserve one slot for exploration - randomly sample a tool from ranks 9-20:

selectTools(stepQuery, limit = 9, withExploration = true) {
  // ... score and sort tools

  const topCount = withExploration ? limit - 1 : limit;
  const topTools = sorted.slice(0, topCount);

  if (withExploration) {
    const explorationPool = sorted.slice(topCount, 20);
    const explorationTool = this.#sampleFromPool(explorationPool);
    topTools.push(explorationTool);
  }

  return [...CORE_TOOLS, ...topTools];
}

This prevents the system from getting stuck in local optima.

Core Tools

Some tools should always be available:

const CORE_TOOLS = ['queryData'];  // Always need to query data

These are prepended to the selection regardless of scoring.

Debugging

Add comprehensive logging:

getDebugInfo(stepQuery) {
  return {
    focus: this.focusTracker.getSummary(),
    lastTool: this.transitionMatrix.getLastTool(),
    recentTools: this.recentTools,
    topScores: sorted.slice(0, 15),
    anchorBoosts: Object.fromEntries(this.microAnchors.getBoosts(stepQuery)),
  };
}

Log on every step:

console.log('[ToolSelector] stepQuery:', stepQuery.substring(0, 150));
console.log('[ToolSelector] activeTools:', activeTools);
console.log('[ToolSelector] top 5 scores:', debug.topScores.slice(0, 5));

Common Issues

1. Wrong Tools Selected

Check your micro-anchors first. If "add a role" isn't selecting roleAdd, add a pattern:

{
  pattern: /\b(add|create)\b.*\brole\b/i,
  tools: ['roleAdd'],
  boost: 2.0,
}

2. Tools Not Attaching

Verify tool key naming matches between:

Your tools object keys
The activeTools array returned
Your tool config keys

Use the AI SDK debugger to see what's actually being attached.

3. BM25 Ranking Weird Results

BM25 alone will fail on cases like "add a market role" because:

"role" appears more in roleUpdate's description
"market" might match something unrelated

That's why ensemble scoring exists. Don't try to fix BM25 alone - rely on micro-anchors for clear intent.

File Structure

search/
├── index.js           # Exports
├── toolScorer.js      # Main ensemble orchestration
├── bm25f.js          # BM25 with field weights
├── focusTracker.js   # Entity focus tracking
├── transitionMatrix.js # Tool-to-tool probabilities
├── microAnchors.js   # Pattern-based hard boosts
└── ToolSelector.js   # Integration layer (builds query from messages)

Summary

Don't rely on BM25 alone - it will fail on ambiguous queries
Micro-anchors are essential - they catch obvious intent that BM25 misses
Validate tool keys - mismatches are silent failures
Log everything - you need visibility into what's being selected
Use exploration - prevents getting stuck
Focus tracking helps continuity - boosts related tools in a workflow
Transition matrix captures workflows - "after groupAdd, roleAdd is likely"

thomasdavis/tool_disocvery.md

Select an option

No results found