StartseiteArtikel

Nature and Science simultaneously reported a paper attempting to radically cure AI hallucinations.

新智元2026-02-05 20:22
Farewell to illusions

Groundbreaking News in Nature: The 8-Billion-Parameter Small Model OpenScholar Ends the "Parameter Worship"! It Abandons Rote Memorization and Eliminates Hallucinations with "Retrieval + Self-Check," Outperforming Industry Giants in Scientific Review Tasks.

Yesterday, a paper published in the main issue of Nature open-sourced a model called OpenScholar, which was also reported by Science.

It has only 8 billion parameters, but this small model has defeated the flagship models in scientific literature review tasks.

This is a signal of a paradigm shift: in rigorous scientific exploration, the all-knowing "black box" memory is a thing of the past, and the accurately callable "external" knowledge base is the future.

Saying Goodbye to Hallucinations

At that time, researchers had mixed feelings about general large models.

The main reason for their dislike was the fatal flaw - hallucinations.

The data at that time was shocking: when asked to answer professional questions in fields such as biomedicine, the proportion of AI fabricating citations reached as high as 90%.

It could confidently fabricate non-existent paper titles, authors, and even page numbers. For scientific research that requires careful consideration of every word, this unreliability is devastating.

The emergence of OpenScholar is to correct this deviation.

This system, jointly developed by the University of Washington and the Allen Institute for Artificial Intelligence (AI2), no longer tries to make the model "remember" all knowledge. Instead, it teaches the model to "look up information" like a human scholar.

OpenScholar does not rely on the vague memory in its parameters. It is connected to a huge database containing 45 million open-access papers.

When you ask a question, it does not directly generate an answer but follows a strict process:

Retrieval: Quickly retrieve the most relevant fragments from 45 million papers.

Re-ranking: Use a cross-encoder to carefully screen the fragments and eliminate the false ones.

Generation and Feedback: This is the most crucial step. After the model generates a draft answer, it conducts a self-review - "Is there evidence to support this statement?" If it finds that the evidence is insufficient, it will initiate a second or third round of retrieval until every statement is supported by solid literature.

The result is a crushing victory. In the ScholarQABench benchmark test covering fields such as computer science and physics, the accuracy rate of OpenScholar-8B not only exceeded that of the flagship models at that time but also reduced the inference cost by two orders of magnitude (about $0.003 per query).

It proves that in a specific field, an undergraduate student with a "library" is more reliable than a hallucination-prone doctoral student without any resources.

DR Tulu: From "Answering Questions" to "In-Depth Research"

If OpenScholar solves the problem of "accuracy," then the subsequent iterative version mentioned in the material - DR Tulu (Deep Research Tulu) - is moving towards "depth."

Scientific research is often not a simple question-and-answer process but a long-term exploration and synthesis.

DR Tulu, released in November 2025, targets long and multi-dimensional "in-depth research" tasks.

Its core breakthrough lies in the introduction of "Reinforcement Learning with Evolving Rubrics" (RLER).

In previous training, it was difficult for AI to judge whether a literature review of thousands of words was well-written.

DR Tulu does not rely on a fixed scoring standard. Instead, it allows the model to dynamically generate scoring rules for the current problem during the search and research process.

It learns both "what is a good research strategy" (such as exploring niche data sources) and "what is bad behavior" (such as piling up citations to meet the word count).

This training gives DR Tulu stronger planning ability.

Faced with complex scientific propositions, it can, like an experienced researcher, first formulate an outline, then conduct separate searches, and finally write a long report by synthesizing multi-source information.

In the latest test, the performance of DR Tulu-8B is comparable to or even surpasses that of the flagship proprietary models at that time, and its code and weights are completely open-source.

The Mastermind: Akari Asai

The key figure behind this series of disruptive works is Akari Asai, who will join Carnegie Mellon University (CMU) in the fall of 2026.

This young scholar, who graduated from the University of Tokyo as an undergraduate and obtained a doctorate from the University of Washington, is one of the most active voices in the field of "Retrieval-Augmented Generation" (RAG) in recent years.

During her internship at Meta AI, she was committed to solving the knowledge bottleneck problem of large models.

Akari Asai's research philosophy is very clear: Don't try to fit the world into the model. Let the model learn to embrace the world.

The OpenScholar and DR Tulu projects led by her are not only technological improvements but also have a strong "democratic" flavor.

By open-sourcing high-performance small models and retrieval architectures, she is breaking the situation where only technology giants can monopolize top scientific research AI tools, allowing scientists in resource-scarce regions around the world to have a tireless "super research assistant."

Conclusion

The essence of science is not memory but discovery.

When we free AI from the rote memorization parameter competition and endow it with the ability to look up, verify, and reflect, we are no longer creating a machine that can only chat but a sharp blade that can help humans cut through the vast ocean of knowledge.

In future scientific research, success may no longer depend on how many papers you have read but on how you can harness the AI assistant that has read all the papers.

References:

https://www.nature.com/articles/s41586-025-10072-4 

https://www.science.org/content/article/open-source-ai-program-can-answer-science-questions-better-humans 

This article is from the WeChat official account "New Intelligence Yuan." Author: New Intelligence Yuan. Republished by 36Kr with permission.