New Progress in "AI Self-Evolution": Independently Build and Optimize Agent Skills

Self-evolve in failure.

Current general agents are not sufficient to handle the complex requirements of professional fields. Most existing solutions involve manually writing domain - specific Skills. However, this approach not only relies on manual labor but is also difficult to scale.

To address this limitation, a research team from Sentient and Virginia Tech proposed the EvoSkill framework, a self - evolving system that can automatically discover and optimize Agent Skills through failure analysis.

Experimental data shows that EvoSkill increased the accuracy from 60.6% to 67.9% in the OfficeQA financial document question - answering task and achieved a 12.1 - percentage - point improvement in the SealQA adversarial search question - answering task. This research indicates that the automatic evolution at the Skills level is expected to become a new direction for enhancing the professional capabilities of Coding Agents.

Paper link: https://arxiv.org/abs/2603.02766v1

How does EvoSkill "self - evolve"?

In current AI development, Agent Skills mostly rely on manual writing, which is costly and difficult to scale. Although existing evolution methods can automatically optimize, they only target low - level products such as prompts or code, are strongly coupled with specific tasks, and are difficult to reuse.

EvoSkill elevates the optimization level to the Skills themselves, automatically generating interpretable and transferable structured Skills. This design enables the evolved capabilities to be no longer limited to a single task and have the generality similar to human skills.

The core of EvoSkill is an evolutionary cycle in which three agents collaborate:

Execution Agent (A) processes tasks with the support of the existing Skills library and generates execution trajectories and answers.
Proposal Agent (P) receives the failure cases generated by the Execution Agent on the training set, combines historical feedback records, diagnoses the capability gap, and proposes suggestions for creating new Skills or modifying existing Skills.
Skills Building Agent (S) transforms the abstract descriptions into specific Skills folders according to the proposals, including metadata, instruction files, and necessary script code.

The evolutionary process follows the following workflow:

First, run the current optimal program on the training set and collect the failure samples with scores below the threshold;
Proposer P analyzes these cases and proposes modification suggestions based on the recorded feedback history;
Skills Builder S generates candidate programs according to the suggestions and then evaluates the performance of the candidate programs on the validation set;
If the score of the candidate program exceeds the worst member in the current frontier set, it is included in the frontier; otherwise, it is discarded.

The frontier set maintains a fixed number of high - performance programs to ensure that the evolutionary direction continuously converges to a better state.

Figure | Overview of the EvoSkill cycle.

The key to this mechanism is that the underlying model remains frozen, and only the Skills library is updated with iterations, thus attributing the improvement in capabilities to Skills optimization; the feedback history records each proposal and its result, helping the proposer avoid repeating ineffective solutions and gradually enriching the context information through iterations; Skills are stored in folder form, including metadata and instructions, which facilitates reuse across tasks and models.

Figure | EvoSkill - Iterative Skills induction based on text feedback.

Experimental verification

To verify the actual effectiveness of EvoSkill, the research team conducted a rigorous evaluation in two distinct fields: financial document reasoning and search - enhanced question - answering.

In the OfficeQA benchmark test, EvoSkill processed the US Treasury bulletin containing complex data. The experimental results show that through automatic evolution, the accuracy of the Agent increased from 60.6% to 67.9%. In this process, EvoSkill automatically discovered the data extraction and verification Skills and the quantitative analysis method Skills, effectively solving the errors of the Agent in complex data processing.

Figure | Performance of EvoSkill in the OfficeQA benchmark test under different training splits and tolerance levels.

In the SealQA task containing noise and conflicting information, EvoSkill's performance was particularly outstanding. The accuracy of the baseline model was only 26.6%, and after evolution, it increased to 38.7%, with an increase of 12.1%. The core lies in the discovery of the search persistence protocol Skills, which requires the Agent to conduct multi - source verification and term expansion before reaching a conclusion, effectively avoiding the problem of premature search termination due to insufficient retrieval results.

The experiment further tested the transferability of Skills. The research team directly applied the search persistence protocol evolved on SealQA to the BrowseComp task for zero - shot transfer testing. Without any modification, the accuracy of the model increased from 43.5% to 48.8%, achieving a 5.3 - percentage - point gain. This result indicates that the Skills generated by EvoSkill have cross - task generality, and their effectiveness is not limited to the original training scenario.

Implications and prospects

The research on EvoSkill provides new ideas for enhancing the capabilities of coding agents.

From a theoretical perspective, elevating the optimization object from prompts or code snippets to the Skills level helps decouple the capabilities from specific tasks and models. Skills are stored in a structured form, including clear trigger conditions and execution processes, making them transferable in different scenarios. This direction may provide a new technical path for Agent capability building.

From a practical perspective, the automated Skills discovery mechanism can reduce the human burden of manually writing Skills. Skills are stored in independent folders, which facilitates sharing and reuse among different agents, laying the foundation for building an open Skills library and promoting the inter - communication of Agent capabilities in collaborative scenarios.

In the future, the research team plans to evaluate EvoSkill in a wider range of fields to better understand the universality of evolved Skills and distinguish which Skills have domain generality and domain specificity. Moreover, they will extend it to multi - modal tasks so that Skills can coordinate the processing of various input forms such as text, images, and code. At the same time, they will explore the transferability of Skills between different models and Agent frameworks and consider establishing a Skills sharing community to support users in discovering, combining, and contributing Skills.

This article is from the WeChat official account “Academic Headlines” (ID: SciTouTiao), author: Wang Yueran. It is published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

New progress in "AI self-evolution": Independently build and optimize Agent Skills

How does EvoSkill "self - evolve"?

Experimental verification

Implications and prospects