HomeArticle

Jason Wei, a core researcher at OpenAI who was just reported to have left the company, defined the boundaries of RL.

36氪的朋友们2025-07-16 19:13
Meta has poached core researchers from OpenAI, including Jason Wei.

On July 16th, a senior AI journalist from foreign media Wired broke the news that two sources confirmed that Jason Wei, a well - known researcher at OpenAI, and another research scientist, Hyung Won Chung, are about to leave OpenAI and join Meta.

This time, Meta has poached truly core members from OpenAI.

Meta has poached OpenAI's core members

Both of the two people reported to be leaving are deeply involved in OpenAI's core projects. Jason Wei is a core thought - leader in OpenAI's transformation, and Hyung is a core architect at OpenAI.

Jason Wei: A leading figure in OpenAI's reinforcement learning

Jason Wei is very good at discovering and promoting revolutionary concepts that seem simple but can profoundly change the capabilities of large language models, thus opening up new research paradigms for the entire field.

For example, Jason Wei's most well - known contribution is the "Chain - of - Thought, CoT" prompting method, which has basically become the cornerstone of all subsequent AI reasoning.

His work at OpenAI is a natural extension of this trajectory. He participated in the research and development of the o1 and deep research models, the core of which is reasoning ability.

In addition, he is also an early explorer of the fine - tuning technology we are familiar with now, and he tried fine - tuning in the NLP era. He is one of the core figures in the FLAN (Finetuned Language Models Are Zero - Shot Learners) project.

This research shows that fine - tuning a model on a large collection of NLP tasks described by natural language instructions can significantly improve its zero - shot performance on unseen tasks.

As for the "Emergent Abilities" that have been widely discussed since ChatGPT, it was also pioneered in a paper he authored as the first author in 2022.

The paper points out that certain abilities do not exist in small models but appear unpredictably in large models, which means that continuing to scale up the model may unlock more unknown abilities.

This paper provides an important conceptual framework for the entire field to focus on "scaling".

Just based on these few studies and his status as a fundamental contributor in o1, we can see Jason Wei's unique position. He can accurately identify research directions with the highest leverage. Whether it's CoT, instruction fine - tuning, or emergent abilities, they all perfectly embody this principle.

Therefore, his departure not only makes OpenAI lose a researcher capable of executing complex projects but also a "visionary" with the ability to change the entire field's landscape.

Hyung Won Chung: A core figure in OpenAI's Agent program

Hyung Won Chung is more like a "full - stack" AI architect.

His professional abilities cover every aspect from the underlying training system to the high - level model capabilities and then to the agent applications. He is a key bridge connecting theory and practice.

Before joining OpenAI, he was involved in the infrastructure construction for large - scale training at Google Brain.

He is one of the core contributors to T5X, a training framework based on JAX, which was used to train Google's PaLM model.

In OpenAI, Chung quickly became a key figure in its most core projects. His personal website shows that he is a "Foundational Contributor" to the o1 - preview, o1, and Deep Research models. In addition, he is also one of the authors of the GPT - 4 technical report.

More importantly, his current research focus at OpenAI is "reasoning and agents". It was he who led the training of OpenAI's most important Agent Codex mini model.

As for his understanding of agents, we can get a glimpse from his lecture at Stanford in June 2024. The lecture started from his observation of the evolution of the Transformer from a dual - structure of encoder - decoder to a single - structure of decoder and presented some ideas on AI product construction.

The following is the content from his tweet after the lecture, which is basically the original version of the current golden conclusion of Agent construction, "Less Structure, More Intelligent".

For artificial intelligence, there is a prominent driving force: the exponential decline in computing costs and the gradual expansion of more end - to - end models to utilize these computing resources.

However, this does not mean that we should blindly adopt the most end - to - end approach, because this approach is simply not feasible. Instead, we should find an "optimal" structure under the current 1) computing power, 2) data, 3) learning objectives, and 4) architecture level. In other words, what is the most end - to - end structure that is just starting to show signs of life? These structures are more scalable and will eventually outperform those with more complex structures when scaled up.

Later, when one or more of these four factors are improved (for example, we obtain more computing resources or find a more scalable architecture), we should re - examine the previously added structures, remove those that hinder further expansion, and repeat this process.

As a community, we like to add structures but are not very willing to remove them. We need to do more cleaning up.

Therefore, his departure will undoubtedly be a heavy blow to the development of OpenAI's Agent.

On the day the news of his departure was reported, Jason Wei published a personal blog on the 15th, redefining the boundaries of AI capabilities. This allows us to take a look at some detailed judgments of this OpenAI thinker focused on reasoning and reinforcement learning about the future.

In the article, he calls the framework for predicting the future boundaries of AI capabilities the Verifier's Law.

In the past, we generally talked about AI "getting smarter", but this is very vague. This article provides a clear judgment criterion: the difficulty of an AI to solve a task does not depend on how difficult it is to solve, but on how easy it is to verify the result.

This understanding may be a common one among most students doing reinforcement learning. But the Verifier's Law conducts a more in - depth exploration of verification through five verifiable criteria.

Moreover, based on such a boundary framework where AI can quickly achieve success, combined with the success of AlphaEvolve, this article can even extend to the core of future human - AI collaboration, that is, transforming a complex and vague real - world problem into a clearly verifiable task that AI can understand and optimize.

The following is the full text of this blog. The underlined content is the explanatory text added by the editor:

Asymmetry of Verification

The Asymmetry of Verification refers to the fact that for some tasks, it is much easier to verify whether a solution is correct than to solve the problem from scratch. As reinforcement learning (RL) gradually matures in a general sense, the asymmetry of verification is becoming one of the most important concepts in the field of artificial intelligence (AI).

Understanding the Asymmetry of Verification through Examples

If you pay attention, you will find that the asymmetry of verification is everywhere. Here are some typical examples:

Sudoku and crossword puzzles: Solving these puzzles takes a lot of time because you have to try many candidate answers under various constraints. However, checking whether a given answer is correct is a piece of cake.

Website development: Writing the operating code for a website like Instagram requires an engineering team to spend several years. But verifying whether the website is working properly can be quickly done by any ordinary person.

Web browsing comprehension tasks (BrowseComp): Solving such problems usually requires browsing hundreds of websites, but verifying any given answer is usually much faster because you can directly search whether the answer meets the constraints.

Some tasks have near - symmetry of verification: the time required to verify the answer is similar to the time required to solve the problem itself. For example, the effort required to verify the answer to some math problems (such as the addition of two 900 - digit numbers) is almost the same as solving the problem by yourself. Another example is some data processing programs; following someone else's code and verifying its correctness takes about the same time as writing the solution by yourself.

Interestingly, for some tasks, the verification time may far exceed the time to propose a solution. For example, fact - checking all the statements in an article may take longer than writing the article itself (this reminds people of Brandolini's Law). The same is true for many scientific hypotheses; verification is more difficult than proposal. For example, it is easy to propose a new diet plan ("only eat bison and broccoli"), but it takes several years to verify whether this diet is beneficial to the general population.

// Editor's note

Brandolini's Law, also known as "The Bullshit Asymmetry Principle", states that the energy required to refute a rumor or bullshit is an order of magnitude higher than that required to create it. This exactly describes those tasks where verification is more difficult than solving (or creating).

Improving the Asymmetry of Verification

One of the most important understandings about the asymmetry of verification is that the asymmetry can be improved through some preliminary research on the task. For example, for a competitive math problem, if you have the answer at hand, checking any submitted final answer is a trivial matter. Another good example is some programming problems: although reading the code and checking its correctness is cumbersome, if you have test cases with high enough coverage, you can quickly check any given solution. In fact, this is exactly what programming practice platforms like LeetCode do. For some tasks, the verification process can be improved, but not enough to make it effortless. For example, for a question like "name a Dutch football player", having a list of well - known Dutch football players helps, but in many cases, verification still requires some effort.

Verifier’s Law

Why is the asymmetry of verification so important? Looking back at the history of deep learning, we have seen that almost anything that can be quantified can be optimized. In the terms of reinforcement learning, the ability to verify a solution is equivalent to the ability to create a reinforcement learning environment. Therefore, we conclude:

Verifier's Law: The difficulty of training AI to solve a task is proportional to the verifiability of the task. All solvable tasks that are easy to verify will eventually be solved by AI.

More specifically, the ability to train AI to solve a task is proportional to whether the task has the following attributes:

Objective truth: Everyone has a consistent view on what a good solution is.

Fast to verify: Any given solution can be verified within a few seconds.

Scalable to verify: Many solutions can be verified simultaneously.

Low noise: The verification result is as closely related to the quality of the solution as possible.

Continuous reward: For a single problem, it is easy to rank the quality of multiple solutions.

It is not hard to believe that the Verifier's Law holds: most of the benchmarks proposed in the field of AI in the past have been easy to verify and have been solved so far. Note that almost all popular benchmarks in the past decade have met the first 4 criteria; benchmarks that do not meet these criteria are difficult to become popular. In addition, although most benchmarks do not meet the 5th criterion (a solution is either completely correct or completely wrong), you can calculate a continuous reward value by averaging the binary rewards (0 or 1) of many examples.

Why is verifiability so important? In my opinion, the most fundamental reason is that when the above criteria are met, the amount of learning that occurs in the neural network is maximized; you can take a large number of gradient steps, and each step contains a large amount of effective signals. The speed of iteration is crucial - this is why progress in the digital world is much faster than in the physical world.

// Editor's