A New Paradigm for Multi-Agent Routing: Directing Large Models

With the rise of the Model Context Protocol (MCP) ecosystem, an Assistant may be supported by hundreds of tools/sub-Agents.

Stuffing all tool descriptions into the Prompt? → Starting at 4,600+ tokens, which is painfully expensive.
Selecting an Agent before choosing tools? → Coarse-grained descriptions often bury “hidden treasure tools”.
Using only a single tool? → A set of tools needed for multi-step tasks gets completely disassembled.

The author highlights the pain points with a diagram: A New Paradigm for Multi-Agent Routing: Directing Large Models

Figure 1: Traditional “Agent-only” retrieval (left) vs. Tool-to-Agent unified retrieval (right)

2. Core Idea: Bringing “Tools” and “Agents” into the Same Vector Space

Tool-to-Agent Retrieval (T2A) = Unified vector indexing + Metadata jumping

Build a bipartite graph: Agent ↔ Owned tools
Use the same set of encoders to embed both Agent descriptions & tool descriptions
During retrieval, first get Top-N (tools + Agents), then map back to the unique Agent set using <span>owner(·)</span>.
Finally, return Top-K Agents to make the decision of “select tool or select Agent” in a single step

Algorithm pseudocode overview:

Algorithm 1: Combined Tool–Agent Top-K Retrieval

Dataset: LiveMCPBench

70 MCP Servers, 527 tools, 95 multi-turn user Queries
Each Query manually annotated with 2.68 steps, 2.82 tools, and 1.40 Agents

Comparison Baselines:

Evaluation Metrics: Recall@K / mAP@K / nDCG@K, K∈{1,5,10}

A New Paradigm for Multi-Agent Routing: Directing Large Models Table 1: Main Metrics of LiveMCPBench

Next, let’s look at the stability of 8 types of embeddings:

Table 2: Model-by-Model Comparison (Recall@5)

Amazon Titan v2 shows the most significant improvement: 0.66 → 0.85 (+28%)
Even the lightweight All-MiniLM-L6 also improved by +13%, indicating that the improvement comes from the framework rather than the large model

In the Top-5 returns, **39%** directly hit Agent descriptions, while 34%** are recalled through tool → Agent mapping → proving that “tool details” indeed supplement the semantic omissions of Agent summaries
Step-wise Querying (decomposing first and then retrieving step by step) averages an additional +4–6 points in Recall compared to Direct Querying → complex task decomposition retrieval remains effective

https://arxiv.org/pdf/2511.01854
Tool-to-Agent Retrieval: Bridging Tools and Agents for Scalable LLM Multi-Agent Systems