4K Super Resolution Agent Photo Retoucher: Revive Blurry Photos with One Click!

Agents capable of analysis and reflection can enhance the quality of various pictures to 4K.

Whether it's an old photo with a patina, a low-resolution image generated by AI, or a remote sensing/medical image, it can now be intelligently restored and enlarged to a 4K ultra-clear resolution.

Improving image clarity has always been an "old problem" in computer vision. Facing complex degradations such as noise, blurring, and compression damage, as well as images in other fields like AI synthesis, remote sensing, and biomedicine, traditional single models often struggle.

The 4KAgent, an AI agent-based method jointly proposed by researchers from Texas A&M University, Stanford University, Snap Inc., the University of Colorado Boulder, the University of Texas at Austin, the California Institute of Technology, Topaz Labs, and the University of California, Merced, can intelligently restore and enlarge different types of images to 4K resolution according to different needs, bringing excellent visual perception effects. This work has been accepted by NeurIPS 2025.

Why are current image enlargement technologies insufficient?

Traditional image enlargement models usually perform well only on specific types of images. Once they encounter complex blurring in the real world, artifacts in AI-generated images, or professional images in remote sensing, medicine, etc., they seem inadequate.

Enlarging the resolution to 4K places extremely high requirements on both detail reconstruction and texture authenticity.

From the perspective of most users, it would be ideal to have a general and controllable framework to meet the needs of improving the resolution of various images. 4KAgent was born out of these real challenges and demands.

Based on the design of multi-agent, 4KAgent can design a path to 4K resolution for each image.

△

How does 4KAgent work? Dissecting the three major modules

1. Intelligent "reading" of images to diagnose problems

The Perception Agent analyzes the image content and the degradation information in the image to provide an execution plan for the Restoration Agent.

First, the Image Analyzer calls various image quality assessment tools to evaluate the quality of the input image and obtains multiple perceptual quality indicators QI = (Q1, Q2,...) of the input image.

Then, Degradation Reasoning uses a visual language model (VLM) to reason based on the input image and the perceptual quality indicators QI to obtain information such as the degradation information DI in the image and the preliminary restoration task list AI′, and configures the upscaling factor: calculates the upscaling factor s required to enlarge the image to 4K resolution and adds the corresponding image super-resolution task to the preliminary restoration task list AI′ to obtain the final restoration task list AI.

Finally, Task Planning will use a large language model (LLM) or a visual language model (VLM) to formulate a Restoration Plan PI for the input image based on the information obtained in the previous steps: the execution order of the restoration tasks.

2. "Execute - Reflect - Rollback", continuously optimize through trial and error

The Restoration Agent uses the "execution–reflection–rollback" mechanism when executing each task in the restoration plan PI:

During the Execution phase, 4KAgent will execute the restoration tasks in PI sequentially. 4KAgent mainly supports nine different restoration tasks and has collected state-of-the-art models for the corresponding tasks to build a toolbox. 4KAgent calls different models in the toolbox to obtain multiple candidate restored images.

△

During the Reflection phase, the Restoration Agent evaluates the candidate restored images based on the quality score QS and selects the one with the highest score as the output. The QS designed in 4KAgent combines no-reference image quality indicators (NIQE, MANIQA, MUSIQ, CLIPIQA) and human preference scores HPSv2. The overall process can be regarded as a quality-driven mixture of experts system Q-MoE: the input image is first generated into candidates by multiple restoration experts, and then the optimal result is selected by the reflection module.

When the quality score of the selected image is lower than the threshold η, the Rollback mechanism will be triggered: 4KAgent will generate context information and pass it to the Perception Agent to generate a new restoration plan PIadj and assign a new restoration task to the current step.

△

In addition, 4KAgent integrates a Face Restoration Pipeline: detects and crops the faces in the input image. For each face, 4KAgent applies different face restoration methods to obtain multiple restoration results and selects the face with the highest quality based on the designed face quality score Qsf and pastes it back into the original image.

4KAgent also sets up a Fast4K mode to control its running time. Specifically, when the image size exceeds the preset threshold St, 4KAgent will remove methods with longer inference times from the toolbox to accelerate the inference.

Flexible configuration to adapt to various scenarios

To handle different image restoration scenarios, the Profile Module is designed in 4KAgent, providing configurable usage preferences (e.g., prioritizing perceptual quality or fidelity, whether to activate the face restoration module, etc.), allowing 4KAgent to adapt to different image restoration scenarios without additional training.

Overall, 4KAgent divides the work of "analysis and decision-making" and "execution and reflection" among different agents and flexibly adapts to different restoration needs through the configuration module, achieving a general 4K super-resolution ability.

Actual test results

4KAgent has been extensively tested on 26 benchmark test sets for 11 different image super-resolution tasks, including classical image super-resolution, real-world image super-resolution, multi-degradation image restoration, large-scale image super-resolution (16x), and other image super-resolution tasks in fields such as AIGC images, remote sensing images, and biomedical images.

In the classical image super-resolution task (Classical Image SR) and the real-world image super-resolution task (Real-World Image SR), the images generated by 4KAgent show richer and more accurate details. For example, the fine stripes on the bark, the structure of the antlers, the texture of the down jacket, and the clarity of the numbers.

△

In the challenging 16x enlargement task, 4KAgent generates high-detail and realistic textures, such as the textures of rocks and grass, the hair, eyebrows, and eye details in face pictures.

△

In addition, the research also built the DIV4K - 50 test set (downsampling 50 high-quality images with a resolution of 4096×4096 to 256×256 resolution and adding complex degradations) to test the restoration and super-resolution ability from 256×256 → 4096×4096. In this scenario, 4KAgent can always reconstruct finer and more natural details, such as face details and hair textures.

△

A 4K super-resolution "AI photo retoucher" that can handle all scenarios

4KAgent is a controllable and general AI agent system for image restoration and 4K super-resolution, aiming to enhance various types of images to 4K resolution. 4KAgent has improved the image restoration quality in multiple fields, covering natural scenes, portraits, AI-generated content, and professional scientific modalities such as remote sensing, microscopy, and medical imaging. In comprehensive evaluations on standard benchmarks and dedicated datasets, 4KAgent shows excellent restoration performance in all scenarios without the need for retraining in specific domains, demonstrating its excellent generalization ability and providing practical value for its general deployment in consumer, commercial, and scientific research applications.

Project homepage: https://4kagent.github.io/

Code download: https://github.com/taco-group/4KAgent

Article link: https://arxiv.org/pdf/2507.07105DIV4K-50

Dataset: https://huggingface.co/datasets/YSZuo/DIV4K-50

Authors and research institutions:

First author: Yushen Zuo, research intern at Texas A&M University

Corresponding author: Zhengzhong Tu, assistant professor at Texas A&M University

Research institutions: Texas A&M University, Stanford University, Snap Inc., the University of Colorado Boulder, the University of Texas at Austin, the California Institute of Technology, Topaz Labs, the University of California, Merced

This article is from the WeChat official account "QbitAI", author: 4KAgent, published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。