StartseiteArtikel

Künstliche Intelligenz löst die Physik-Olympiade. Das Team um Wang Mengdi hat den Agenten Physics Supernova entwickelt, der den Durchschnitt der menschlichen Goldmedailleträger übertrifft.

学术头条2025-09-16 16:17
Die Goldmedaille ist kein Ende.

In the field of academic competitions, physics has long been regarded as one of the most challenging areas for artificial intelligence (AI) to conquer due to its complex questions and high - intensity reasoning requirements. Compared with language - related tasks, physics problems often involve multiple steps such as image recognition, unit conversion, formula derivation, and approximate calculation, which further tests whether the system has the ability to understand and model the real world.

As AI increasingly integrates into the real world and progresses towards general artificial intelligence (AGI) and even super artificial intelligence (ASI), the ability to understand the world through physical abstraction and solve problems is becoming the key to building high - level intelligent systems.

In the 2025 International Physics Olympiad held this year, an AI system named Physics Supernova achieved remarkable results: in the three theoretical questions, it scored a total of 23.5 points (out of 30), ranking 14th among all 406 participants. Moreover, it ranked in the top 10% of human participants in all three questions, exceeding the average score of human gold medalists.

This system was developed by the team led by Professor Mengdi Wang from Princeton University and their collaborators. The two first authors are Dr. Jiahao Qiu from Princeton University and Jingzhe Shi, a senior undergraduate student from the Yao Class of Tsinghua University (who won a gold medal in the 2021 International Physics Olympiad and ranked tenth globally).

Paper link: https://arxiv.org/abs/2509.01659

Different from the traditional method that relies on question banks, Physics Supernova combines the reasoning ability of large language models (LLMs) with tool modules such as image analysis and answer review to complete the entire process from question understanding to modeling and calculation. This result shows that an agent architecture with reasonably integrated tools can significantly improve AI's reasoning and problem - solving abilities in complex scientific problems. Its performance is approaching that of top human players, opening up new possibilities for AI in scientific exploration.

Industry experts pointed out that this achievement not only demonstrates a breakthrough in AI's ability to solve physics problems but also means that the application boundary of AI in the field of scientific reasoning is being redefined.

With tools, AI can solve problems like physicists

Physics Supernova is an AI agent system specifically designed to solve complex physics theoretical problems. It is based on the smolagents framework and adopts the CodeAgent architecture.

Different from the common fixed and manually - coded workflows in mathematical problem - solving, this system emphasizes the ability to plan flexibly on its own and can dynamically call different tools according to the current problem - solving progress.

Figure | Architecture and example reasoning trajectory of Physics Supernova

The research team equipped this system with two specialized tools for physics problems: ImageAnalyzer and AnswerReviewer.

For physicists, interpreting experimental results and extracting key data from images are very important skills. In some Physics Olympiad questions, this is even the core part of the problem - solving process. However, current LLMs still have deficiencies in the accurate measurement of visual data such as charts, images, and schematics. ImageAnalyzer passes high - resolution images to a specialized visual - language model to perform accurate numerical reading and measurement tasks.

In actual problem - solving, physicists also continuously evaluate whether their theoretical results make physical sense, which includes judging whether the results have the expected physical properties or violate basic physical principles. AnswerReviewer is used to identify the type of errors and locate incorrect expressions during the problem - solving process, thereby improving the system's self - correction ability.

To study the impact of various tools on the final score, the research team tested multiple tool combinations. The results show that in most problems (especially non - simple ones), removing AnswerReviewer leads to a significant decline in performance. Entrusting image processing tasks to ImageAnalyzer can effectively improve the overall score.

Figure | Impact of the ImageAnalyzer tool on part C of question 1 of the theoretical test

In addition, they also connected Physics Supernova to a Q&A tool for professional knowledge - WolframAlpha, a computational knowledge engine that can provide accurate answers to scientific questions, which helps improve the system's performance in dealing with professional knowledge.

The gold medal is not the end: the next stop for AI physics systems

Experiments are the foundation of physics research. The research team pointed out that this study mainly focused on the theoretical questions of the IPhO 2025 and did not cover instrument - based experimental questions, partly due to the limited availability of experimental instruments.

They hope that with the development of robotics technology, future LLM - based AI agents are expected to have the ability to perform experimental questions. Compared with operating physical instruments, programmed experiments can simulate more complex and advanced experimental processes. Program - based experimental exams may shift the evaluation focus from the ability to operate instruments to the ability to understand and apply physics.

In the long run, instrument - based experimental evaluations are also indispensable. Such experiments are closer to real - world research scenarios and can more effectively measure the robotic capabilities of AI systems and evaluate their performance under extreme or unexpected conditions.

In addition, they used an answer review tool to verify the derivation process. This tool operates entirely based on natural language. In the field of mathematics, significant progress has been made in automated verification, and LLMs can generate verifiable proofs in the Lean format. However, there is currently no reliable technical path for deriving physical formulas from natural - language questions and performing automatic verification. This remains an area that requires in - depth research.

The research team said that future directions worth exploring should include: constructing methods to verify the abstract conversion between formulas, physical expressions, and intuitive reasoning; establishing a more rigorous and verifiable physical calculation system; and enhancing the capabilities of the answer review system with tools that have more extensive and in - depth physical knowledge.

In general, the research team suggests that future work on AI physics problem - solving systems should continue to expand their capabilities in programmed or instrument - based experiments while enhancing their ability to generate verifiable and reliable physics answers.

Looking to the future, such systems are expected to further develop into advanced intelligent agents that can be embedded in the real world and perform complex physics tasks.

This article is from the WeChat official account “Academic Headlines” (ID: SciTouTiao), compiled by Xiaoyang, and published by 36Kr with authorization.