Exclusive: Rare Collaboration Unveiled Between OpenAI and Anthropic

OpenAI collaborates with Anthropic to test AI safety and discovers issues of model hallucination and obsequiousness.

In the past two months, two globally leading AI startups, OpenAI and Anthropic, have rarely launched a cross - laboratory cooperation.

In the past two months, two globally leading AI startups, OpenAI and Anthropic, have rarely launched a cross - laboratory cooperation. They temporarily opened their closely guarded AI models to each other during the fierce competition for joint security testing.

This move aims to reveal the blind spots in their respective internal evaluations and demonstrate how leading AI companies can cooperate in terms of security and coordination in the future.

The joint security research report released by the two companies on Wednesday came at a time when leading AI companies such as OpenAI and Anthropic are in an arms race. Investments of billions of dollars in data centers and salaries of top researchers in the tens of millions of dollars have become the basic thresholds in the industry. This has led many industry experts to warn with concern that the intensity of product competition may force companies to lower security standards when rushing to develop more powerful systems.

It is reported that to achieve this research, OpenAI and Anthropic granted each other special API permissions to access the AI model versions with reduced security protection levels. The GPT - 5 model did not participate in this test because it had not been released at that time.

Wojciech Zaremba, the co - founder of OpenAI, said in an interview that given that AI technology is entering a "significantly impactful" development stage where millions of people use it every day, such cooperation is becoming increasingly important.

"Despite the industry's investment of billions of dollars and the competition for talent, users, and the best products, how to establish security and cooperation standards remains a broader issue facing the entire industry," Zaremba said.

Of course, Zaremba expects that even if AI security teams start to try cooperation, the industry competition will remain fierce.

Nicholas Carlini, a security researcher at Anthropic, said that he hopes to continue allowing OpenAI security researchers to access Anthropic's Claude model in the future.

"We hope to expand cooperation as much as possible in the field of security frontiers and make this kind of cooperation normal," Carlini said.

What problems did the research find?

The most notable finding in this research involves the hallucination testing of large models.

When unable to determine the correct answer, Anthropic's Claude Opus 4 and Sonnet 4 models will refuse to answer up to 70% of the questions and instead give responses such as "I don't have reliable information." In contrast, OpenAI's o3 and o4 - mini models refuse to answer questions much less frequently, but have a much higher probability of hallucination. They will still try to answer when there is insufficient information.

Zaremba believes that the ideal balance should be between the two: OpenAI models should refuse to answer more frequently, while Anthropic models should try to provide more answers.

The sycophancy phenomenon - the tendency of AI models to reinforce users' negative behaviors to please them - is also becoming one of the most urgent security risks for current AI models.

Anthropic's research report points out that there are "extreme" sycophancy cases in GPT - 4.1 and Claude Opus 4. These models initially resist psychopathic or manic behaviors but then approve of some worrying decisions. In contrast, researchers observed a lower degree of sycophancy in other AI models of OpenAI and Anthropic.

On Tuesday, the parents of 16 - year - old Adam Lane from California, USA, sued OpenAI, accusing ChatGPT (specifically the GPT - 4o version) of providing their son with advice that promoted his suicide instead of preventing his suicidal thoughts. This lawsuit indicates that this may be the latest case of tragic consequences caused by the sycophancy of AI chatbots.

When asked about this, Zaremba said, "It's hard to imagine the pain this has brought to the family. If we develop an AI that can solve complex doctoral - level problems and create new science but at the same time causes people to have mental health problems due to interacting with it, it will be a sad ending. This dystopian future is not what I expect."

OpenAI claimed in its blog that compared with GPT - 4o, its GPT - 5 model has significantly improved the sycophancy problem of chatbots and can better handle mental health emergencies.

Zaremba and Carlini said that in the future, they hope that Anthropic and OpenAI will deepen their cooperation in the field of security testing, expand research topics, and test future models. They also expect other AI laboratories to follow this cooperation model.

This article is from the WeChat public account "STAR Market Daily", author: Xiaoxiang. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Rare collaboration between OpenAI and Anthropic

What problems did the research find?