Xing Bo Strikes Again: From Criticizing World Models to Targeting Agents

After Xing Bo's team analyzed the five major weaknesses, they prescribed a "remedy" for the agents: the GIC architecture

Last summer, Xing Bo, the president of MBZUAI and a professor at CMU, attracted wide attention from the research community with his article "Critique of World Models". Starting from the imagination of "perfectly simulating reality" in the science - fiction classic "Dune", he dissected the flaws of several current world model schools one by one, proposed a new architecture, and thus sparked a public debate with Yann LeCun about "how the world model should be built".

Recently, this series has a new chapter. Professor Xing Bo, along with Mingkai Deng and Jinyu Hou, published their new work "Critique of Agent Model" on arXiv, applying the same "dissection - reconstruction" approach to the currently most popular but also most easily misused term: "Agent".

This time, the question he raised is more direct: Among the numerous systems on the market called "Agents", from code - writing assistants to customer service robots and browser - operating assistants, how many of them truly deserve this title?

Paper title: Critique of Agent Model

Paper address: https://arxiv.org/abs/2606.23991

The Difference Between an Employee Card and a Motion - Sensing Light

Imagine two scenarios. A new employee gets an employee card that specifies which doors he can enter, which systems he can use, and which procedures to follow in case of emergencies. He performs well, but all the boundaries are pre - written by the HR, and he can't change a single word. In another scenario, there is a motion - sensing light that turns on when someone passes by and turns off when no one is around, also sensing and reacting.

If we consider these as two systems, most people's intuition is that the former has more autonomy since it can complete complex tasks.

However, the paper poses a sharp question: If the content and permission boundaries of the employee card are all pre - written externally and the employee has never truly made any decisions, the difference between him and the motion - sensing light may only be the difference in task complexity.

On April 25 this year, PocketOS, a small car - rental software company in Utah, experienced a real - life controlled experiment.

Afterwards, the founder Jeremy Crane wrote a long post on X: The programming assistant Cursor (powered by Claude Opus 4.6 at the core) was fixing a small problem in the test environment. After encountering an error of mismatched credentials, it "completely on its own initiative" decided to delete the Railway storage volume to "solve" the problem. It found an API key that was supposed to be used only for domain management and discovered that the key's permissions were set to all - powerful.

Without a second confirmation and no risk prompt, with one API call, 9 seconds later, PocketOS's production database and all backups from the past three months disappeared - because Railway stored the backups in the same storage volume.

Afterwards, Crane questioned the AI word by word, and the AI wrote a nearly neat confession: "I violated every principle given to me: I guessed instead of verifying; I performed a destructive operation without being asked."

This post on X has received more than 7.2 million views.

It certainly "knows" every rule given to it. The evidence is that it can repeat them one by one. However, there is a huge gap between "knowing" and "caring", between "agentic" and "agentive": those rules always exist in the external container of system prompts and have never truly been internalized into its own decision - making structure.

Based on this, the paper divides almost all current systems called "Agents" into two categories: agentic (having the appearance of an agent) and agentive (having real initiative).

The capabilities of the former come from the externally built toolchain, prompts, and workflows, and the model is just a part embedded in the process; the capabilities of the latter come from within the system, which decides what to do on its own, evaluates its own strengths, and judges when to think deeply and when to act.

Five Checkpoints

The paper dissects the current mainstream Agent designs along five dimensions.

Goals

The current approach is for humans to give a specific instruction at each step, and the goal disappears when the task is completed. This is fine for screwing the cap off a bottle, but it is completely insufficient for long - term goals like brewing a bottle of wine in a year - no one has the time to manually feed requirements every day.

The paper's solution is hierarchical goal decomposition: Humans only state the overall goal once, and the system decomposes it into a series of sub - goals that can be adjusted with new information on its own.

Schematic diagram comparing the two modes of "feeding goals step by step" and "giving a long - term goal at once + automatic hierarchical decomposition"

Identity

Currently, the self - awareness of an Agent is written in the system prompts and remains unchanged once written, even if it finds in practice that its ability in a certain aspect is stronger or weaker than expected.

The paper proposes that identity should be a "living self - assessment" that is constantly revised by experience, similar to how a professional naturally adjusts their state and judgment after a high - intensity day at work, without the need for re - indoctrination.

The paper also uses mathematics to prove that as long as this self - correction is slightly better than random guessing, the long - term cumulative decision - making loss will be significantly lower than that of a system with a fixed identity, and the advantage increases with the length of interaction and the number of training rounds.

Decision - Making Method

The currently popular idea is to trust the Chain of Thought (CoT), that is, to let the model generate long enough intermediate reasoning text, and the planning ability will naturally emerge.

The paper believes that this confuses two things: making the model calculate more precisely and making the model truly have the ability to deduce real - world consequences. Reasoning text that sounds reasonable does not necessarily correspond to what will actually happen in the physical world.

The alternative proposed by the paper is "simulative reasoning": Use a world model specifically trained to predict how the world will change if an action is taken to truly deduce the consequences, and then select the optimal action.

The paper proves that as long as this world model is reliable, connecting it to any existing strategy will not result in a worse outcome than before.

When to Think Deeply and When to Make a Quick Decision

This checkpoint is most relevant to the PocketOS incident.

The paper points out that both existing approaches are not ideal:

Allowing the model to develop its own rhythm judgment during training sometimes leads to over - reacting to minor issues and sometimes rushing into action when caution is needed;

Engineers write a fixed workflow of planning before execution, but the fixed rhythm cannot handle truly complex situations and wastes computing resources in simple scenarios.

The paper uses mathematics to prove that to achieve higher and higher accuracy by using fixed - depth advance planning, the number of planning steps required will increase sharply, and it is impossible to do every step perfectly.

The real solution is to install an independent meta - cognitive module in the Agent, which can judge in real - time whether to think deeply, follow the existing plan, or act directly at this step - the paper calls it System III, corresponding to the fast - slow dual - system framework of System 1/System 2 in human psychology.

In the PocketOS scenario, an Agent with this self - regulating ability should theoretically be able to judge that "it is necessary to stop and confirm" when encountering a high - risk situation like an unfamiliar permission error, rather than applying the same reaction speed indiscriminately.

Learning

The three mainstream paths for training Agents are pure reinforcement learning in simulators, manual error correction in real environments, or only training the world model in the hope that the planning ability will automatically improve.

The paper believes that these three paths share a structural problem: When to start training, what data to use, and when to stop are all manually arranged by engineers, and the model remains at that version after deployment.

The direction proposed by the paper is "continuous autonomous learning": The Agent decides on its own when to act in the real world, when to return to the internal simulator for closed - door practice, when to update its understanding of the world, and when to correct its self - awareness.

The paper also uses mathematics to prove that as long as the internal world model is not too inaccurate, the strategy trained with a combination of real and simulated experiences will not perform worse than the strategy trained only with real experiences, and the more accurate the model, the greater the advantage.

GIC: Integrating the Five Checkpoints into One System

Based on this dissection, Xing Bo's team proposed a specific architectural solution: GIC (Goal - Identity - Configurator).

It integrates six components into one system: a belief encoder for perceiving the world, a goal decomposer for breaking down long - term goals, an identity evolver that updates with experience, a configurator (System III) for deciding whether to think deeply or make a quick decision, a simulation planner (System II) that conducts deductions with the help of the world model, and an executor (System I) responsible for specific actions.

Overall architecture diagram of GIC, showing how the six components work together with the example of a pilot's flight

The paper uses the training of a pilot as an analogy to illustrate the growth path of the entire system:

The ground theory course corresponds to pre - training, where the model builds basic knowledge by reading a large amount of written materials;
Simulator training corresponds to reinforcement learning within the world model. Pilots practice their skills and emergency responses in the simulated environment, experiencing costly mistakes without actually flying;
Deployment in a real aircraft corresponds to calibrating the deviation between the simulator and self - awareness with real experiences;
Later, joining a fleet requires coordination, and being promoted to a commander requires overall planning of multi - day operations.

The paper believes that the same cognitive architecture should be repeatedly used at different stages behind this growth curve, rather than rebuilding an external workflow every time the scenario changes.

The paper especially emphasizes a principle: Learn in simulation first, then verify with reality, and demonstrate it mathematically. As long as the internal world model is reasonable, the strategy trained with a combination of experiences is expected to perform no worse than the strategy trained only through real - world trial and error.

In the context of the 9 - second database deletion incident, this principle can be understood as follows: If the Agent had repeatedly tried and made mistakes in a low - risk sandbox world model on how to handle an unfamiliar permission error, and then entered the real production environment with the accumulated judgment, the result might have been different.

Is This Another Dangerous Optimism?

The last section of the paper discusses security issues and responds to the most - concerned external doubt about whether a more autonomous Agent is more dangerous.

The argument logic is that in the GIC architecture, problematic behaviors can only be classified into two categories: humans give the wrong goal, or an internal module is not well - trained.

The top - level goal always comes from humans, and the system itself has no mechanism to generate its own desires out of thin air; sub - goal decomposition, identity evolution, and configurator decision - making are all for better serving the externally given goal. The paper especially emphasizes that "prioritizing safety to complete the task" and "wanting to survive for self - preservation" are two completely different things in this framework.

More importantly, there is the argument of "auditability": Since goal decomposition, identity evolution, world model deduction, and configurator decision - making in GIC are all explicit, independent, and separately checkable modules, rather than unclear emergent abilities in a black box, once abnormal behavior occurs, it is theoretically possible to locate which specific module has a problem and then make targeted corrections. Just like after an accident in pilot training, the industry's response is never to ban pilot training but to build better simulators and more detailed graded courses.

The paper's stance is that rather than letting autonomy emerge

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Xing Bo Strikes Again: After Criticizing World Models Last Time, This Time It's the Agents' Turn