HomeArticle

Structured expansion achieves a new SOTA in Agent tool retrieval, accurately finding APIs

量子位2026-03-18 21:06
The large model can't find the tool because the tool documentation is not written.

In the era of large models, Tool-Use has become a core component of an agent's capabilities.

From code generation to data analysis, from web queries to complex API calls, LLMs are learning to "use tools." However, a practical problem is becoming increasingly evident:

Tools are really hard to find.

The research work of the team led by Shen Xiaoyu from Ningbo Institute of Technology, Zhejiang University/Ningbo Institute of Digital Twin (Eastern Institute of Technology) published a paper at ICLR 2026:

《Tools Are Under-Documented: Simple Document Expansion Boosts Tool Retrieval》

The paper puts forward a direct but important judgment:

The bottleneck in current tool retrieval often lies not in the model's capabilities but in the tool documentation.

Currently, this paper has been accepted by ICLR 2026.

Background: Invisible Obstacles in Tool Retrieval

As the number of APIs expands to thousands or even tens of thousands, tool retrieval has gradually become a crucial pre - step in the Tool-Use system: the model must first find the appropriate tool in a large tool set before it can complete the call and execution.

In recent years, a series of benchmarks (such as ToolBench, ToolRet, etc.) have promoted the development of relevant models. However, in practical applications, a fundamental but long - neglected problem always exists: The quality of tool documentation itself often varies widely. The descriptions of many tools have issues such as inconsistent structures and incomplete descriptions, and there are significant differences in the granularity of function introductions for different APIs. At the same time, user queries usually express specific task requirements in natural language, while tool documentation is mostly presented in brief technical descriptions or function explanations, and there is often an obvious semantic gap between the two.

Therefore, the problem does not entirely lie in whether the model can understand the tools, but in the fact that current tool documentation lacks a structured, retrievable, and semantically aligned expression with user queries. In this case, even a powerful retrieval model can hardly stably match the correct tool.

Core Idea: Optimize the Documentation First, Then Train the Model

This work proposes a seemingly simple but systematic solution:

Perform structured expansion (document expansion) on tool documentation, and then conduct training and evaluation based on the expanded documentation.

Specifically, through structured expansion (document expansion) of tool documentation, the originally scattered and brief API descriptions are supplemented with more complete and retrievable semantic information. Then, the training data is reconstructed based on the expanded documentation, and the model is trained.

Compared with directly improving the model structure, this approach starts from the quality of data and documentation, systematically narrowing the semantic gap between user queries and tool descriptions.

The paper constructs three key components:

1. TOOL - REX: Extended Tool Retrieval Benchmark

Based on the original ToolRet benchmark, the paper introduces a structured tool_profile field to systematically expand the tool documentation. The newly added information includes: function (the core function of the tool), tags (keywords describing the tool's capabilities), when_to_use (applicable scenarios and task types), limitation (usage restrictions or boundary conditions).

These fields are constructed through a low - cost automated document expansion pipeline. Specifically, Qwen3 - 32B is first used to perform structured expansion on the original tool documentation, organizing the function descriptions, usage conditions, and limitation information originally scattered in the documentation into a unified tool_profile structure. The expansion process is strictly based on the original document, and all generated content must find semantic support in the original text.

Subsequently, the system uses LLaMA - 3.1 - 70B to verify the semantic consistency of the generated results, checking whether the expanded fields are faithful to the original document, and ensures that the output structure is legal and non - empty through rule checking. For a small number of samples that fail the verification, a stronger model (such as GPT - 4o) is used for regeneration and correction. Finally, the authenticity and consistency of the expanded documentation are verified through sampling and manual review, thus ensuring that the entire expansion process is both automated and reliable.

Through this process of "LLM expansion → LLM verification → regeneration and correction → manual sampling", the original tool documentation is systematically supplemented into a structured tool description, making the document semantics more complete while maintaining a faithful expression of the original tool information.

2. Large - Scale Training Corpus

Based on a low - cost automated data construction pipeline, the paper further generates large - scale tool retrieval training data, including:

50k embedding training samples

200k reranker training samples

These data are all constructed based on the structured and expanded documentation, forming one of the largest structured tool retrieval training corpora at present, providing a more abundant and semantically aligned data foundation for subsequent model training.

3. Two Dedicated Models

Based on the above data, the paper trains two models specifically for the tool retrieval scenario, filling the gap in the lack of dedicated models in this field:

Tool - Embed: An embedding model for dense retrieval, used for efficient recall in a large - scale tool library

Tool - Rank: An LLM reranker based on a large language model, used for fine - grained sorting in the candidate tool set

Through the combination of "structured documentation + large - scale data + dedicated models", this work constructs a complete tool retrieval solution.

Results: Simple Expansion, Significant Improvement

Experiments on the ToolRet and the newly constructed TOOL - REX benchmarks show that simply performing structured expansion on tool documentation can bring stable and significant performance improvements.

First, the document expansion itself can significantly improve the retrieval effect. With the same model structure, simply replacing the tool documentation with the expanded version leads to a significant improvement in retrieval performance, indicating that the quality of document expression has a direct impact on tool retrieval.

On this basis, the two dedicated models Tool - Embed and Tool - Rank trained in the paper reach new state - of - the - art (SOTA) levels in multiple evaluation tasks. Not only are the overall indicators significantly improved, but more intuitive improvements can also be seen in specific case analyses: the correct tools that were originally outside the top 10 of the candidate list can be re - retrieved and promoted to a more prominent position.

These improvements do not come from a more complex reasoning process, nor do they rely on a larger - scale model, but rather from more complete and structured semantic expressions.

Deeper Discoveries

The paper further analyzes the contributions of different structured fields to retrieval performance and finds that different information plays different roles in the retrieval process.

Among them, fields such as function and tags have the most significant impact on dense retrieval. They provide more explicit functional semantics for the model, making the representation of tools in the vector space clearer. Scenario descriptions such as when_to_use play a more important role in the reranking stage, helping the model determine whether a tool truly meets specific task requirements.

At the same time, the expanded documentation can not only improve the effect during the training stage but also bring more stable retrieval performance during the evaluation process, reducing semantic matching errors caused by incomplete descriptions.

These analyses together indicate that:

The quality of documentation itself is an important part of the retrieval system.

Summary

When "model enhancement" becomes the default direction, this research provides a more straightforward and effective answer:

In the tool retrieval task, improving the quality of document expression often improves the retrieval effect more directly than increasing the model complexity.

Better documentation → Better retrieval.

Paper Title: Tools are under - documented: Simple Document Expansion Boosts Tool Retrieval

First Authors: Lu Xuan, Huang Haohang

Corresponding Author: Shen Xiaoyu (Ningbo Institute of Technology, Zhejiang University)

arxiv: https://arxiv.org/abs/2510.22670

github: https://github.com/EIT - NLP/Tool - REX

Author Introduction: The first authors, Lu Xuan and Huang Haohang, are a doctoral student (jointly trained by Ningbo Institute of Technology, Zhejiang University and Shanghai Jiao Tong University) and an intern in the team led by Shen Xiaoyu from Ningbo Institute of Technology, Zhejiang University/Ningbo Institute of Digital Twin (Eastern Institute of Technology) respectively. Their research interests include information retrieval and efficient reasoning. They have published multiple papers in top - tier conferences such as ICLR, CVPR, and EMNLP. For more research project results, please refer to the laboratory homepage: https://idt.eitech.edu.cn/nlp/#/

This article is from the WeChat official account “Quantum Bit”, written by the EIT - NLP team. It is published by 36Kr with permission.