HomeArticle

The U.S. Argonne National Laboratory has proposed CVEvolve, a zero-code algorithm for autonomous discovery of scientific image processing. It has full-stack capabilities such as code writing, result self-checking, and strategy optimization.

超神经HyperAI2026-05-25 21:17
Overcome three major imaging challenges

After systematically analyzing previous AI-based automation work, the research team at the Argonne National Laboratory (ANL) in the United States developed a zero-code autonomous agent framework called CVEvolve for mining algorithms required for scientific research data processing. This framework has extremely strong general capabilities. It does not require pre-set problem architectures or fixed process templates. It can link various elements such as codes, data, evaluation indicators, retrieval records, and visualization results in a closed-loop manner, and supports the development of executable algorithms in computer vision, image processing, etc.

Reaching an objective and rigorous scientific conclusion is as difficult as panning for gold in the desert. Especially in the current era when a large number of advanced scientific instruments and simulation technologies are fully popularized, the data produced by scientific research is large in volume, loose in structure, and highly unstructured. The process of scientific research data processing is like sifting for gold from sand and has become the most critical and core step before unlocking data value and revealing scientific research truths.

However, this is precisely the real dilemma: domain scientists often lack professional skills in computer vision, image processing, and software engineering required for data processing; while technical experts who are good at data processing cannot deeply understand the disciplinary background and have difficulty designing adaptive processing processes suitable for real scientific research scenarios.

In response to the professional knowledge gap in scientific research data processing, after systematically analyzing previous AI-based automation work, the research team at the Argonne National Laboratory (ANL) in the United States developed a zero-code autonomous agent framework called CVEvolve for mining algorithms required for scientific research data processing. This framework has extremely strong general capabilities. It does not require pre-set problem architectures or fixed process templates. It can link various elements such as codes, data, evaluation indicators, retrieval records, and visualization results in a closed-loop manner, supports the development of executable algorithms in computer vision, image processing, etc., is not restricted by a single modeling method, and has full-stack capabilities in code writing (running), effect evaluation, historical traceability, result self-check, and strategic iterative optimization.

In short, CVEvolve can independently develop dedicated algorithms for processing various scientific research data in real scenarios. It enables domain scientists who do not understand programming or image processing to quickly master intelligent analysis methods without writing a single line of code, and the results are more comprehensive, reliable, and efficient than previous methods.

The relevant results were published on the preprint platform arXiv under the title "CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing".

Research Highlights:

*  Proposed a general agent framework for autonomous scientific research data processing algorithm discovery, specifically designed for unstructured problems, without the need for pre-set problem frameworks or fixed process templates 

*  CVEvolve introduces a long-horizon search architecture that combines generate, tune, and evolve mechanisms with traceability-aware state management and agent-driven retention testing to ensure the framework's flexibility, autonomy, and maturity 

*  CVEvolve was verified on multiple tasks such as X-ray fluorescence microscope image registration, Bragg peak detection, and high-energy diffraction microscope image segmentation, confirming its ability to discover practical algorithms and accelerate scientific discovery 

View the paper: https://hyper.ai/papers/2605.11359

Construct Dedicated Validation Datasets for Three Types of Tasks

In this study, all datasets were individually customized for control experiments.

Fluorescence Microscope Image Registration Dataset

Based on real XRF images, translational offsets, Poisson noise, scanning jitter, and blurring were artificially applied to simulate image differences under real focus drift. The images were plotted on a logarithmic scale and were only 10 - 30 pixels in size. The data scale consisted of 809 pairs of test/reference images. 10% were randomly selected as a holdout set, and the remaining 90% were used for algorithm iterative development.

Example image of the fluorescence microscope image registration dataset

Bragg Peak Detection Dataset

It was derived from diffraction images collected at all scanning points and then divided into two groups. The images in each group were synthesized into two images by pixel superposition. One was used for effect evaluation during the algorithm development stage, and the other was used as a holdout set. The Bragg peaks in both images were manually labeled.

High-energy Diffraction Microscope Image Segmentation Dataset: The development dataset included 5 images and their manually created labels, and the holdout set consisted of 2 samples.

Three Major Processes and Five Core Tools to Build an LLM-based Intelligent Agent Tool

In terms of the overall architecture, CVEvolve is an autonomous search controller centered around a large language model intelligent agent. The intelligent agent can generate, run, and evaluate candidate solutions with the help of tools, and the controller determines the subsequent exploration direction based on historical data. The iterative strategy is borrowed from the Pty-Chi-Evolve framework and involves three types of operation steps: generate, tune, and evolve. It adapts to more tasks through an extended toolset and improved state management.

To control the context length and reduce calculation costs, a new context is enabled in each iteration, only retaining the system prompt and the task prompt corresponding to the current execution action, without accumulating historical conversation records. In the same iteration, generate and tune can be executed simultaneously by multiple parallel workers, allowing the system to explore multiple new solutions or make multiple rounds of optimization adjustments for different original contents before updating the conversation records.

After each iteration, the candidate algorithms submitted by the agent are grouped according to the evolutionary lineage, recording the parent-child inheritance relationship and retaining excellent design patterns. The candidate sampling architecture is borrowed from the MAP-Elites algorithm and is performed randomly. For the tune and evolve steps, CVEvolve adopts random candidate sampling instead of always selecting the current optimal candidate.

Three-Stage Workflow

Schematic diagram of the CVEvolve workflow

Workspace Preparation Stage: Start from workspace preparation, build the operating environment, and automatically write the evaluation indicators in the task description or user prompts into executable evaluation code.

Baseline Evaluation Stage: Run and evaluate existing benchmark algorithms to provide a baseline for subsequent comparison work.

Algorithm Iterative Development Stage: Conduct multi-round cyclic searches following the generate, tune, and evolve strategies. Generate is responsible for extensive exploration, designing new algorithms with multiple threads; tune is responsible for basic optimization, randomly selecting and optimizing parameters of candidate algorithms; evolve is responsible for iterative evolution, fusing the advantages of multiple algorithms and generating new algorithms.

In addition, for the rigor and rationality of the research work, the overall process also includes an optional repair round for repairing candidate algorithms that cannot run, setting aside an independent test after each round, and searching the SQL state database, recording candidates, indicators, iteration rounds, and evolutionary lineages throughout the process.

Five Core Supporting Tools

* File System Tool: Supports listing, reading, writing, editing, copying, moving, and deleting files in the workspace, allowing the agent to write candidate codes, assistant scripts, and evaluation tools in the session sandbox 

* Environment Management and Code Execution Tool: Supports installing or deleting dependencies in the workspace and executing Python scripts

* Image Viewing Tool: Supports control functions such as floating-point image processing, logarithmic display scaling of high-dynamic-range images, and converting TIFF format to PNG format, so that the agent can identify subtle structures, light and dark changes, and abnormal problems that are difficult to detect under ordinary linear rendering 

* Search Status Tool: Supports the agent in setting core indicators, recording evaluation results, checking historical data, analyzing candidate results, and submitting new candidates to the structured query language retrieval records

* Web Search Tool: Grants access to arXiv, Semantic Scholar, and Tavily, facilitating the agent to iterate algorithm development with the help of external technical reference information

In addition, a multi-modal image follow-up middleware was added to the design to make up for the limitation that the large language model interface cannot directly transmit images. Specifically, after the tool returns the image path, the rendered image is automatically reinjected into the conversation as a follow-up message.

Core Underlying Execution Architecture

CVEvolve is implemented based on the LangGraph agent application. It uses a streamlined node graph during runtime and processes through four core processes: "message reception - model inference - tool call - image post-processing". After the tool returns the image path, the image processing node converts it into multi-modal observation data and sends it back to the model for use in the next round of inference, as shown in the following figure:

Execution architecture of CVEvolve based on LangGraph

Verify the Practicality of CVEvolve in Three Types of Scientific Image Processing Scenarios

To demonstrate the practical effects and generalization ability of CVEvolve, the research team specifically set up three groups of scientific image processing experiments with practical significance for verification. All experiments were completed using Claude Opus 4.6.

Fluorescence Microscope Image Registration

The researchers first demonstrated CVEvolve's task of finding a robust algorithm for translational registration of X-ray fluorescence microscope (XRF) images, which is used to solve the problem of image offset calibration after microscope focusing.

The baseline algorithms include two types: phase correlation with a Hanning window preprocessor and brute-force error minimization; the performance comparison indicator is the average Euclidean distance between the calculated and ground-truth shifts.

The research showed the error changes and performance characteristics after 20 rounds of search. In the initial benchmark round, the average Euclidean error of brute-force error minimization was 1.25, and the error of the phase correlation method with a Hanning window preprocessor was as high as 5.8. After the generate and evolve rounds, the registration error continued to decrease, reaching 0.8 and 0.43 successively, and the performance tended to be stable after the 9th round, as shown in the following figure.

Error changes and performance characteristics shown in 20 rounds of algorithm search

To screen out the running steps of the optimal registration algorithm, this algorithm adopted the idea of image registration from coarse to fine. The first step was to complete integer-pixel-level alignment and positioning through multi-scale normalized cross-correlation. The second step combined various preprocessing methods, including spline functions and optimization algorithms, to improve the accuracy to the sub-pixel level. The third step adaptively weighted and integrated multiple groups of estimation results according to the coordinates to output a stable and reliable final offset.

After testing on the holdout set and comparing with various baseline algorithms, the results showed that the error value of the optimal registration algorithm was 0.12. Compared with the relatively good brute-force error minimization, the error was reduced by nearly 8 times. Meanwhile, the researchers further compared the candidates discovered by CVEvolve with those discovered by OpenEvolve. After 500 iterations, the error tended to be stable at 0.23, significantly higher than the candidate algorithm discovered by CVEvolve. As shown in the following table:

Comparison between CVEvolve candidates and other baselines

Bragg Peak Detection

The task of this experiment was to find an algorithm for detecting Bragg peaks in X-ray diffraction images, aiming to develop a method to identify and locate Bragg peaks within and around the corresponding annular regions of a given lattice plane. The evaluation indicators were F1 score, Precision, and Recall.

Since the development dataset only had one image, the algorithm was prone to over-optimization (overfitting). Therefore, the generalization performance must be monitored using the holdout set. As shown in the following figure, the F1 score of the development set image continued to rise and finally approached a full score of 1, while the F1 score of the holdout set reached its peak around the 5th round and began to decline sharply after the 9th round.

Search record of Bragg peak detection based on the hotspot-detection workflow

Next, the research selected the optimal candidate from the 5th round. First, invalid regions were masked. After background subtraction in arc polar coordinates and local noise normalization, a signal-to-noise ratio map was generated. Then, peaks were found through multiple rounds of complementary algorithms. Finally, the peaks were merged, verified, and the center points were optimized to output the final peak coordinates.

The results showed that the optimal candidate solution could effectively reduce false detections, and the number of missed detections also decreased, enabling the identification of more labeled peaks. The performance of the optimal candidate improved in all indicators compared with the baseline: the F1 score increased from 0.298 to 0.788, the Precision score increased from 0.237 to 0.839, and the Recall score increased from 0.400 to 0.743 (corresponding to the missed detection situation), as shown in the following figure.

Bragg