Skip to content

Instantly share code, notes, and snippets.

@mkbctrl
Created May 2, 2025 10:40
Show Gist options
  • Save mkbctrl/a35764e99fe0c8e8c00b2358f55cd7fa to your computer and use it in GitHub Desktop.
Save mkbctrl/a35764e99fe0c8e8c00b2358f55cd7fa to your computer and use it in GitHub Desktop.
Intent Recognition and Auto‑Routing in Multi-Agent Systems

Intent Recognition and Auto‑Routing in Multi-Agent Systems

Modern conversational AI systems often split functionality into multiple tools or sub-agents, each specialized for a task (e.g. search, booking, math, etc.). When a user sends a query, the system must interpret intent and dispatch it to the right tool/agent. There are two broad approaches: letting a general-purpose LLM handle intent detection itself, or using a dedicated router component. In practice, many practitioners use a hybrid: an initial “router” classifies the intent and then a specialized agent or tool handles the task. Below we survey best practices and examples of each approach, referencing frameworks like LangChain and Semantic Router.

LLM-Based Intent Recognition (General-Agent Approach)

A common approach is to have the LLM itself decide which tool or chain to invoke. For example, one can prompt the model to output a JSON field indicating the desired “tool” or “function” (using OpenAI’s function-calling or ChatGPT Plugin interface). LangChain’s older LLMRouterChain worked this way: it prompts the model to return a JSON {"destination": "..."} indicating which sub-chain to use (Migrating from LLMRouterChain | ️ LangChain). Similarly, with OpenAI’s function-calling API, you define possible “functions” (tools) and the LLM returns a function name and arguments for the best match.

  • Pros: This requires no separate classifier; the LLM naturally parses user text. It’s easy to implement via few-shot prompts or function schemas. (Migrating from LLMRouterChain | ️ LangChain)
  • Cons: It can be slow and expensive, since every decision is an LLM call (token cost and latency). With many tools/intents, prompts become complex and the LLM may misclassify or hallucinate a wrong function. (For example, instructing a model to choose among dozens of tools can confuse it or lead to inconsistent JSON outputs.)

LangChain’s documentation also illustrates using structured output / function calling for routing. In one example, a ChatOpenAI model is wrapped to return a RouteQuery JSON schema selecting a “datasource” (e.g. python_docs or js_docs) (Routing | ️ LangChain) (Routing | ️ LangChain). This shows that with careful prompting, an LLM can classify a query, but note the warning: this requires beta features and fixed schemas.

In practice, purely LLM-based routing is often acceptable for small numbers of tools or very clear-cut tasks. For instance, a simple chatbot might use an LLM to decide “does this user want weather or news?” and then call the weather API or return a news snippet. But if the system has many tools or complex logic, relying solely on the LLM can be brittle. The model might invent an answer rather than call a tool, or the JSON schema might break.

Dedicated Router Agents (Coordinator/Orchestration Layer)

For larger systems, it’s often better to separate routing from task execution. A router agent or coordinator takes the user query, identifies intent, and dispatches to a specialized agent or tool. This router can be a lightweight model, rules engine, or even a classical classifier. The advantage is modularity: each sub-agent focuses on its task without worrying about classifying unrelated queries.

A helpful analogy is the “agent supervisor” in multi-agent workflows. In LangChain’s LangGraph framework, for example, one design is a supervisor agent whose “tools” are other agents. After each step, the supervisor checks which (if any) tool was invoked and routes accordingly (LangGraph: Multi-Agent Workflows). “The supervisor can also be thought of as an agent whose tools are other agents” (LangGraph: Multi-Agent Workflows). In effect, the supervisor has a simple routing logic (often rule-based or model-based) to hand off work.

A concrete example is the TravelPlannerAgent from Arun Shankar’s agentic workflow patterns (Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch | by Arun Shankar | Google Cloud - Community | Medium). In this semantic routing pattern, the user asks a travel-related question. The TravelPlannerAgent (coordinator) first uses an LLM to determine intent (is the user asking about flights, hotels, or car rentals?). It then routes the query to one of several sub-agents (FlightAgent, HotelAgent, CarRentalAgent) specialized in that domain (Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch | by Arun Shankar | Google Cloud - Community | Medium). Once a sub-agent returns a result, the TravelPlannerAgent consolidates the responses into a final answer. This architecture (Figure 1 below) cleanly separates intent detection from specialized processing:

(GitHub - arunpshankar/Agentic-Workflow-Patterns: Repository demonstrating best practices and patterns for implementing agentic workflows in Python, featuring modular, scalable, and reusable design patterns for intelligent automation.) Figure 1: Semantic routing (coordinator–delegate) pattern. The TravelPlannerAgent first determines user intent (e.g. “Flight search”) and then delegates to a specialized sub-agent (FlightAgent, HotelAgent or CarRentalAgent) (Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch | by Arun Shankar | Google Cloud - Community | Medium). The coordinator merges sub-agent outputs into a final response.

Such a coordinator-delegate structure has several benefits: each agent can have its own prompt and tools optimized for its task, and the routing logic can be simpler. As one LangGraph blog notes, “an agent is more likely to succeed on a focused task than if it has to select from dozens of tools” (LangGraph: Multi-Agent Workflows). In other words, letting a small router agent decide among a few channels is easier than expecting one giant agent to manage everything.

In practice, a dedicated router can be implemented in various ways:

  • A small LLM with a constrained prompt (e.g. “Classify this query as [WEATHER, NEWS, CHITCHAT, ...] and output the category.”). This still uses an LLM, but with a simpler, deterministic prompt.
  • A rules engine or keyword matcher for simple intents (e.g. if the question contains “weather” then call the weather agent).
  • A classical ML classifier (e.g. fine-tuned BERT) trained on intent-labelled data (this requires a dataset, but can be highly accurate).
  • An embedding-based router (see next section).

Many frameworks support router chains. LangChain used to have an LLMRouterChain that prompted the model to choose a destination from a list (Migrating from LLMRouterChain | ️ LangChain). Newer designs use structured outputs or chains (e.g. MultiPromptChain) to branch into sub-chains. The key idea is consistent: use a brief initial step to route and then invoke the right specialist.

Semantic Routing (Embedding‑Based Routing)

A recent approach is Semantic Routing using vector embeddings. Rather than asking an LLM to classify the query at runtime, one can pre-encode a set of example utterances for each intent (route) and then route by nearest-neighbor in embedding space. This is the idea behind the open-source Semantic Router library (GitHub - aurelio-labs/semantic-router: Superfast AI decision making and intelligent processing of multi-modal data.).

Semantic Router defines “routes” with sample utterances (seed examples) for each category. At runtime, it encodes the user query and finds which route’s examples are closest in vector space. In effect, it’s doing semantic similarity classification. As the creators explain, “instead of waiting for slow LLM generations to make tool-use decisions, we use the magic of semantic vector space… routing our requests using semantic meaning” (GitHub - aurelio-labs/semantic-router: Superfast AI decision making and intelligent processing of multi-modal data.).

When is Semantic Router useful? It shines when you have a fixed set of known intents or categories and want very fast routing. Once the route vectors are precomputed, classifying a new query is just a quick embedding lookup (much cheaper than an LLM call). This makes it attractive for high-throughput or low-latency needs. In practice, teams have used Semantic Router to classify chat queries in customer support bots or menu-driven assistants. For example, one case study in a car-sales chatbot showed a semantic-router classifier (with routes like “car-search-by-vin”, “cars-search”, “office-info”, etc.) achieving 92–96% precision after tuning (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium). The authors note that building this took only a week or two and required no model training – just defining routes and embedding them (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium). Compared to a full ML model, Semantic Router was faster to deploy and much cheaper per query (sub-penny cost vs ~$0.65 per 10k queries for an OpenAI LLM classifier (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium) (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium)).

A rough workflow with Semantic Router is: define routes and example prompts, load them into a RouteLayer, and then at runtime call RouteLayer.route(query) to get the best-matching route name. You can then map that route name to the appropriate tool or agent call. For instance, the blog post by Kasaju shows routes like “diet_plan”, “exercise_recommendation”, etc., each with sample questions (Enhancing LLM Conversations through Semantic Router | by Dikshya Kasaju | Medium) (Enhancing LLM Conversations through Semantic Router | by Dikshya Kasaju | Medium). The router quickly decides which category the user’s question belongs to, then you can call the corresponding function or prompt. Kasaju emphasizes that Semantic Router not only speeds up decisions but “serves as a vigilant guard… ensuring prompt responses and potentially avoiding misleading outputs” (Enhancing LLM Conversations through Semantic Router | by Dikshya Kasaju | Medium).

Trade-offs: Semantic routing requires preparing representative utterances for each intent. Its accuracy depends on how well those examples capture query variability. If a user’s phrasing is far from all samples, the router may mis-route. Also, semantic routes are static categories; handling dynamic or nested intents might be harder. In contrast, an LLM router could theoretically handle new intents on the fly (with new prompts), but at higher cost. In practice, many teams use semantic routing for the stable, well-defined parts of the domain, and fall back to an LLM or default agent for anything “other” or uncategorized.

In summary, Semantic Router is a powerful tool for multi-agent orchestration when you have clear intent categories. It has been praised for its speed and ease of deployment compared to training an ML classifier (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium). Our impression is that it works best as a first-pass filter or coordinator: run the semantic router to pick the route, then engage the specialized agent for that route. This hybrid approach combines speed (semantic lookup) with flexibility (agents do the heavy work).

Frameworks and Patterns

Several frameworks and design patterns facilitate routing:

  • LangChain: Offers RouterChains and function-calling. Its old LLMRouterChain (now deprecated) showed how to use LLMs to pick a chain (Migrating from LLMRouterChain | ️ LangChain). More recently, LangChain’s Expression Language allows RunnableBranch or conditional steps to route based on model output (as in their query-index routing example (Routing | ️ LangChain)). LangChain also has MultiPromptChain to select among prompts. However, all of these rely on LLM calls at routing time. You can also combine LangChain with an embedding store: e.g. use an embedding-based similarity check to choose a vector-store or prompt.
  • LangGraph / LangChain’s Blog: The LangGraph multi-agent post illustrates “agent supervisor” and “hierarchical teams” patterns (LangGraph: Multi-Agent Workflows). These emphasize that a supervisor agent’s “tools” can be other agents. Essentially, the router agent itself can be implemented as a LangChain agent that simply chooses which agent to call (or which API to use).
  • Autogen, WeiDU, etc.: There are other emerging agent frameworks (Microsoft Autogen, NVIDIA Minerva, Cohere’s tools) that use similar ideas. Many implement a “manager agent” or “orchestrator” that plans and delegates. The specific mechanisms differ (some use DB logs, others use function calls), but the concept of a routing layer is common.
  • Semantic Router (aurelio-labs): Can integrate with LangChain or any Python agent framework. In practice, you might call the semantic router before invoking a LangChain AgentExecutor. For example, one could use semantic_router.RouteLayer to pick a “tool name”, then pass that tool call into a LangChain agent loop.

Best Practices and Recommendations

  1. Start Simple: For a small number of tools/intents, try letting the LLM do the routing via function calling or prompt labels. This minimizes engineering effort. Use clear instructions (few-shot examples) so the LLM learns the categories.

  2. Monitor and Fallback: Regardless of method, log routing decisions. If the agent frequently fails (tools mischosen or no tool used), consider a more rigid router. Provide a fallback (“I’m not sure, forwarding to human”).

  3. Dedicated Classifier for Scale: If you have many tools or performance matters, build a router component. This could be as simple as a zero-shot classifier (GPT-based) or an embedding router. The Semantic Router library is worth trying: it often gives good accuracy with little training (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium).

  4. Modular Design: Architect the system so that adding a new tool means only adding a new route and example utterances (for embedding router) or a new branch/prompt (for LLM router). Keep the router logic separate from the tool code.

  5. Asynchronous Execution: Once intent is identified, tasks can run concurrently if they’re independent. For instance, if a user query touches multiple topics, you might route it to several agents in parallel (the “Parallel Delegation” pattern). But ensure the router can handle multiple intents (semantic-router can return multiple best-matches, or LLM can output a list).

  6. Iterate with Real Data: Especially for embedding routing, refine the example utterances over time. In the car-dealer example, adding more examples raised precision to 92–96% (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium). Use chat logs to identify misrouted queries.

  7. Combine Methods if Needed: You can cascade approaches. E.g. use a fast keyword/rule filter for obvious cases, semantic router next, and finally an LLM for “catch-all.” Or use an LLM router but verify its choice with embeddings.

Example – Customer Support Bot: Suppose a support bot has intents like billing, technical, sales, and other. A practical design is: the user query is first run through an embedding router (with routes “billing”, “technical”, etc.). If confidence is high, route to that sub-agent. If the query is out-of-scope or ambiguous, call a fallback LLM agent to either ask clarifying questions or loop in a human. Over time, use misrouted cases to expand the semantic router’s examples.

In summary, a general LLM agent can do intent recognition, especially for small-scale systems, but a dedicated router (whether ML-based, rule-based, or embedding-based) often yields more robust and efficient routing for larger, production systems. Libraries like Semantic Router exemplify the embedding-based approach: they provide a lightweight decision layer “to categorize text into topics” without retraining models (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium). In our experience, combining a semantic router (fast, cheap) with a final LLM agent (flexible, fall-back) gives a good balance of speed and accuracy. The specific choice depends on your domain: for static, clearly-defined intents, semantic routing is excellent; for very fluid conversation, an LLM classifier or interactive clarification may be needed.

References: The above insights draw on practical guides and research on agent orchestration (Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch | by Arun Shankar | Google Cloud - Community | Medium) (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium) (LangGraph: Multi-Agent Workflows) (GitHub - aurelio-labs/semantic-router: Superfast AI decision making and intelligent processing of multi-modal data.), including LangChain and agent pattern documentation as well as case studies of routing in chatbots (A new approach for text classification in chatbots (AI/ML) | by Ivan Chetverikov | Innova company blog | Medium) (Enhancing LLM Conversations through Semantic Router | by Dikshya Kasaju | Medium). These sources illustrate the trade-offs of different intent-recognition strategies and the emerging best practices in multi-agent AI design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment