HomeArticle

Being smart is no longer enough.

AI深度研究员2026-03-30 08:29
When answers become cheap, those who define evaluation criteria and build verification environments will command higher commercial premiums.

In the era of AI, the core value of models has shifted from mere "intelligence" to "verifiability of right and wrong." The author points out that the breakthroughs of AI in the fields of code and mathematics do not stem from the models becoming smarter, but rather from the fact that these environments can provide a clear feedback loop, enabling the system to evolve autonomously through large - scale trial and error. In contrast, subjective fields such as creativity and writing have made slow progress due to the lack of unified standards, resulting in the devaluation of traditional logical and expressive abilities. In the future, the scarcest competitiveness will no longer be knowledge reserve, but the ability to design systems that can transform vague tasks into verifiable indicators. In short, when answers become cheap, those who define evaluation criteria and build verification environments will enjoy higher commercial premiums.

In the past two or three years, the rapid development of AI has created an illusion that it is becoming smarter.

However, the truth may be the opposite. François Chollet, the father of Keras and a Google AI researcher, pointed out in a recent conversation that the most commercially valuable AI has not actually become smarter; it has just entered an environment where right and wrong can be verified.

Once right and wrong can be verified, AI can automatically conduct trial and error and scale up. This is why code tools have quickly moved towards commercial delivery, while fields such as writing and creative work, where it is difficult to determine right and wrong, have made slow progress.

This fast and slow technological evolution is redefining what kind of "intelligence" is still valuable.

Facing the new technological coordinate system, if you still measure yourself by the old standards, you may have taken the wrong position.

Section 1 | AI is not smarter, but suddenly more useful

Code agents have suddenly become very useful, and the code they write can even be directly delivered. Mathematical proof assistants are also quickly catching up and performing more and more stably.

Many people attribute this to the models becoming smarter.

However, upon closer inspection, the key change does not lie in the leap in the models' intelligence itself, but rather in their entry into a closed - loop where they can "get things done."

Take code as an example. Whether a piece of code is written correctly can be directly verified: whether it can run, whether there are any errors, and whether the test cases pass. This means that AI does not need to guess, nor does it require humans to correct it line by line. It can run the code repeatedly, check the results, and then continue to modify. Each attempt will leave effective feedback, and once these feedbacks accumulate rapidly, the system's performance will be greatly improved.

Therefore, code agents have become very useful in a very short period of time and are even approaching the level of direct delivery.

The same logic is also spreading to other fields. Mathematics is the next one. Whether a proof holds can be checked by strict rules; whether a derivation step is correct can be verified logically. Once the right and wrong of such problems can be clearly determined, they will have the same high - speed growth path as writing code.

This rule has even been verified in the development of AI testing itself.

ARC AGI is currently recognized as the most difficult AI intelligence test. When the V1 version was released, the score of the basic model was less than 10%. It was not until the emergence of the reasoning model that there was a breakthrough. Subsequently, the more difficult V2 version was released, but it was quickly cracked: researchers let AI generate similar tasks, solve them by itself, verify the answers, and then use successful cases to feed back the training. Under this repeated cycle, the score was pushed to 97% in just a few months.

As long as right and wrong can be verified, AI can achieve rapid evolution through massive trial and error.

However, in a different scenario, the situation is completely different.

Writing articles, doing creative work, and formulating strategies do not have unified standards. What does "writing well" mean? Different people have very different evaluations. AI can still generate content in these fields, but it cannot quickly approach a "correct answer" through the machine's own trial and error like writing code.

As Chollet mentioned before, in these unverifiable fields, the training data heavily relies on the annotation of human experts. The cost is extremely high, resulting in slow progress and even easy to reach the ceiling.

This has led to two completely different rhythms in the current industry:

One type of problem is progressing faster and faster, even showing an exponential explosion;

The other type of problem seems to be improving, but the performance is always unstable, and it is difficult to cross the commercialization threshold.

Why is this so? The underlying logic is actually very simple:

Can this problem be clearly verified?

If it can be verified, AI can step on the accelerator and move forward by itself; if it cannot be verified, it can only stay in the stage of "looking good."

So you can see that for the same AI, in some scenarios it can already replace human delivery, while in others it can only be an auxiliary tool. The model itself has not suddenly gained a higher IQ; it has just been trained to be more executable in an environment where right and wrong are clearly defined.

Section 2 | "Intelligence" is experiencing inflation

Peeling off the technological appearance in Section 1, a more cruel truth is that the "intelligence" that has been repeatedly promoted in the workplace and education in the past is rapidly losing its premium space.

For a long time, the social measurement standard for ability has been very intuitive: wide knowledge reserve, fast reaction speed, and complete logical elaboration. In the pre - AI era when the cost of information acquisition was extremely high and the processing efficiency was low, these traits were extremely scarce, and "intelligence" itself constituted the core competitiveness.

However, today, large models are leveling this barrier without discrimination.

There is almost no threshold for information acquisition, content organization can be automated, and even complex expressions and logical organization can be generated instantaneously. You no longer need to spend a lot of time accumulating to quickly get a result that looks "smart enough."

The most far - reaching impact of this technological change is not simply "machines replacing humans," but the reconstruction of the ability evaluation system.

In the past, being able to give an answer was the winner; today, answers have become the cheapest industrial products. The core question has become: who can use these answers to truly solve problems.

François Chollet's theory can exactly explain this phenomenon: he divides system capabilities into "intelligence (the ability to deal with the unknown)" and "skill/knowledge (the ability to deal with the known)." When the system has a large enough static knowledge reserve, it does not need a very high real "intelligence" to perform well in most routine work.

The rapid progress of AI is essentially using brute force computing power and massive data to reduce a large number of tasks that once required human "intelligence and wisdom" to pure "knowledge retrieval." This has also caused a split in the current definition of AGI: is it to achieve "everything can be automated," or to achieve "learning by analogy like a human being"? The current AI is rushing towards the former, but this is just a stacking of skills, not a leap in intelligence.

This perfectly fits the experience of most people at present: the solutions provided by AI are perfect, but often cannot be implemented; it can analyze and explain problems in detail, but lacks the closed - loop ability to get things done.

This is exactly the dividing line between "intelligence (expression and logic)" and "usefulness (execution and results)."

When the acquisition and expression of information become cheap, simply relying on fast understanding and clear explanation is no longer enough to form a moat. These abilities are still the foundation, but they are no longer the decisive factors for making a difference. They are like today's computing power and network speed. Once they become infrastructure, they are no longer the standards for measuring high and low.

What really makes a difference has become another ability.

Section 3 | What kind of ability is becoming valuable

When "intelligence" is no longer scarce, what is the new scarce ability?

Many people may say it is execution ability, communication ability, and leadership. These are indeed important, but they are not hitting the nail on the head.

The real answer is actually hidden in the dividing line mentioned in Section 1: Can you turn a thing into something verifiable?

In reality, most jobs do not naturally have this condition. Writing articles, making plans, formulating strategies, doing creative work... These things have vague goals and subjective standards, and it is difficult to absolutely determine right and wrong. So AI can only stay at the level of an auxiliary tool here and cannot achieve self - evolution like running code.

Therefore, the truly scarce ability in the future is to redesign these vague things into verifiable tasks. This is not simply "breaking down goals" or "making a list," but a more fundamental system construction ability: building a verification environment.

Chollet calls it the "control mechanism." In essence, this is a set of rules designed by humans to tell AI how to conduct trial and error, how to verify, and how to optimize.

Last year, two startup companies, Poetic and Confluence Labs, proved the value of this ability when tackling the extremely challenging ARC V2 reasoning benchmark test. Their solution was not to compete for a "smarter model," but to design a delicate control mechanism: let AI generate similar test questions, try to solve them with programs, verify the correctness of the answers, record the successful reasoning links, and then use these data to feed back the training.

A few months later, Confluence Labs pushed the accuracy rate to 97%, and the task cost was lower. The reason is not that the model suddenly became smarter, but that someone transformed the originally vague reasoning task into a verification environment that can be run repeatedly and continuously optimized.

This logic can be completely transferred to more business fields.

Whoever can turn subjective customer service conversations into quantifiable scoring dimensions, whoever can break down short video scripts that rely on "online sense" into indicators for testing the completion rate and character resonance, and whoever can turn strategic planning into nodes that can be verified at different stages will hold the key to turning AI from a "toy" into "productive force."

This is also why for those using AI, some only slightly improve the typesetting efficiency, while others directly reconstruct the entire business flywheel. The difference lies not in the tools, but in the ability to redesign problems and define rules.

Chollet's suggestion is clear: the deeper your professional knowledge, the better you can use these tools. Instead of resisting the evolution of AI, you should learn to leverage it and go with the trend.

However, the "professional knowledge" here is no longer just memorizing industry common sense, but a brand - new translation ability: translating the vague experience in your field into clear indicators that AI can participate in optimizing. Specifically, it means designing scoring standards, building a testing environment, and defining verification rules, so that every step of the machine has clear feedback.

In the future, the division of labor will be very clear: AI is responsible for solving problems, and humans are responsible for setting questions. Whoever can design good test papers will be more valuable.

Answers are depreciating, and standards are appreciating.

Conclusion | Take the right position

Many people still habitually measure themselves by the old standards: whether they are smart enough and whether they are hardworking enough.

However, the rules of the game have changed.

What is truly scarce is no longer "knowing more," but the ability to transform vague experience into verifiable problems.

Facing the new technological coordinate system, there is only one question left:

Is your ability being amplified by AI, or is it being replaced by AI?

Original link:

https://www.youtube.com/watch?v = k2ZLQC8P7dc&t = 15s

Source: Official media/Online news

This article is from the WeChat official account "AI Deep Researcher", author: AI Deep Researcher, editor: Shen Si. Republished by 36Kr with authorization.