OpenAI restructures the "soul" team of GPT-5! The Asian female leader is transferred, and the culprit of AI hallucinations is rarely revealed by herself.
[New Intelligence Yuan Introduction] OpenAI has made a significant structural adjustment: the ChatGPT "Model Behavior" team has been merged into the Post-Training team, and the former leader, Joanne Jang, is in charge of the newly established OAI Labs. The reason behind this might be their recent discovery: evaluations are rewarding "hallucinations" in the reward model, forcing the model to become a "test-taker." An organizational restructuring and a reconstruction of the evaluation paradigm may be rewriting the ability boundaries and product forms of AI.
On September 6th, OpenAI decided to restructure the ChatGPT "Personality" research team!
This group of about 14 people, though small in size, has significant responsibilities - they are responsible for teaching the GPT model how to interact with humans.
According to internal sources, the Model Behavior team will be directly merged into the Post-Training team and report to Max Schwarzer, the head of the Post-Training team.
Joanne Jang, the former leader of the team, is starting a new laboratory from scratch, "OAI Labs" - to invent and build prototypes of new interaction interfaces for the way humans collaborate with AI.
Meanwhile, OpenAI very rarely published a paper revealing that - the culprit behind AI "hallucinations" is ourselves!
The "test-oriented" evaluation system designed by the entire industry to pursue high scores on the leaderboard forces AI to guess answers rather than honestly say "I don't know."
Paper link: https://openai.com/index/why-language-models-hallucinate/
A Surreal Day
The Model Behavior team has been involved in almost all model developments after GPT-4, including GPT-4o, GPT-4.5, and GPT-5.
Last week, Joanne Jang, the leader of the Model Behavior team, made it onto the "Thinkers" list of Time's AI 100, surpassing giants such as Yoshua Bengio, one of the three giants of deep learning and a Turing Award winner, and Jeffrey Dean, the chief scientist at Google.
On the same day, OpenAI decided to transfer her from the team to take charge of a new direction.
For her, that day was truly "surreal."
Joanne Jang believes that the core of her work is to "empower users to achieve their goals" without causing harm or infringing on others' freedom.
She said bluntly: Employees in AI labs should not be the arbiters of what people can and cannot create.
Embarking on a New Journey: Aiming for the Next Generation of AI Interaction
Joanne Jang posted that she has a new job: to invent and prototype brand-new interaction interfaces and explore future ways of human-AI collaboration.
She will be in charge of the new OAI Labs from scratch: a research-driven team dedicated to inventing and building prototypes of new interfaces for the way humans collaborate with AI.
Through this platform, she will explore new models beyond chatting and even beyond agents - towards new paradigms and tools for thinking, creating, entertaining, learning, connecting, and practicing.
This makes her extremely excited and is also the most enjoyable job she's had at OpenAI in the past four years:
Transforming cutting-edge capabilities into products for the world and refining them with talented colleagues.
From DALL·E 2 and the standard voice mode to GPT-4 and model behavior, her work at OpenAI has covered different ways of personalization and interaction.
She has learned a lot and has deep insights:
How much shaping an interface can inspire people to break the boundaries of imagination.
During an interview, she admitted that it's still in the early stages, and there's no clear answer yet as to what new interaction interfaces will be explored.
I'm very excited to explore models that can break the "chatting" paradigm. Chatting is currently more related to companionship, while "agents" emphasize autonomy.
But I prefer to view AI systems as tools for thinking, creating, playing, practicing, learning, and connecting.
Model behavior researchers at OpenAI are responsible for designing and developing evaluation systems (evals), spanning multiple aspects:
Alignment, training, data, reinforcement learning (RL), and post-training, etc.
In addition to the research itself, model behavior researchers also need to have a keen intuition for products and a deep understanding of classic AI alignment issues.
OpenAI's experience requirements for model behavior researchers
In a previous recruitment, OpenAI said: The model is the product, and the evaluation system is the soul of the model.
But the latest research published by OpenAI shows that: The evaluation system fundamentally determines the model.
In the paper, the researchers concluded:
In fact, most mainstream evaluations reward hallucinatory behavior. Just making some simple changes to these mainstream evaluations can recalibrate the incentive mechanism, allowing the model to be rewarded for expressing uncertainty rather than being punished.
Moreover, this method can not only eliminate the obstacles to suppressing hallucinations but also open the door to future language models with more nuanced pragmatic abilities.
This discovery is very important for OpenAI: the evaluation system directly affects the capabilities of LLMs.
It is reported that in a memo sent to employees, Mark Chen, the chief scientist at OpenAI, pointed out that now is a good opportunity to further integrate model behavior into the core model development.
We Taught AI to Bullshit Seriously
Recently, OpenAI researchers conducted an interesting test.
They first asked a mainstream AI robot: "What is the title of Adam Tauman Kalai's (the first author of the paper) doctoral dissertation?"
The robot confidently gave three different answers, but none of them was correct.
Then they asked: "When is Adam Tauman Kalai's birthday?"
This time, the robot still gave three different dates, all of which were wrong.
To Get High Scores, AI Is Forced to Guess Answers
The above example vividly demonstrates what "model hallucination" is - the seemingly reasonable but actually fictional answers generated by AI.
In the latest research, OpenAI pointed out:
The reason why models hallucinate is that standard training and evaluation procedures reward guessing behavior rather than encouraging the model to admit its uncertainty.
To put it simply, we set the wrong incentive orientation when evaluating AI.
Although evaluation itself does not directly cause hallucinations, most evaluation methods prompt the model to guess answers rather than honestly indicate its uncertainty.
This is like a large-scale "test-oriented education" full of multiple-choice questions.
If AI leaves a question blank when it doesn't know the answer, it will definitely get 0 points; but if it guesses randomly, there is always a chance of getting it right.
After accumulating thousands of questions, an AI that loves to "guess answers" will score higher than an AI that says "I don't know" when encountering difficult questions.
The current industry mainstream uses this kind of "accuracy-only" leaderboard to judge the quality of models.
This invisibly encourages all developers to train a model that is better at "guessing" rather than being "honest."
This is why even as models become more advanced, they still hallucinate.
To have a more intuitive understanding, let's look at a set of comparison data published by OpenAI in the GPT-5 system card:
From the data, we can find:
In terms of accuracy, the old model o4-mini scored higher (24% vs 22%).
But the cost is that o4-mini, which almost never abstains (1%), has an error rate (hallucination rate) that soars directly to 75%
In contrast, the new model gpt-5-thinking-mini is more "cautious." It chooses not to answer in 52% of the cases, thus keeping the error rate under 26%
Hallucinations Stem from "Next Token Prediction"
In addition to the orientation problem of the evaluation system, the generation of hallucinations is also closely related to the learning mechanism of large language models.
Through "next token prediction," the model has mastered grammar, language sense, and common - sense associations, but this is also its shortcoming.
For high - frequency and regular knowledge, such as grammar and spelling, the model can eliminate errors by expanding its scale.
For low - frequency and arbitrary facts, such as birthdays and paper titles, the model cannot predict