HomeArticle

Nature research is questioned: Is it a bit "absurd" for AI to think like humans?

学术头条2025-07-03 19:44
Or it may provide a new perspective for "human health research".

If artificial intelligence (AI) could "think" like humans, it might help us understand the human way of thinking, especially how people in different psychological states (such as depression or anxiety) make decisions, thus providing a new perspective for human health research.

Now, a research paper published in the authoritative scientific journal Nature has made the above hypothesis possible.

Paper link: https://www.nature.com/articles/s41586-025-09215-4

The research team from the Helmholtz Munich believes that their human cognitive foundation model, Centaur, can not only predict the decisions people will make in various situations such as gambling, memory games, and problem-solving, but its performance even outperforms the classic theories in psychology used to describe human choices.

In their view, Centaur has opened up a new research path for in-depth understanding of the human cognitive mechanism and improvement of psychological theories. Its potential application scope covers from analyzing classic psychological experiments to simulating individual decision-making processes in clinical environments, such as those with depression or anxiety.

"You can basically conduct experiments in a computer simulation environment instead of on actual human participants," said Marcel Binz, the first and corresponding author of the paper, a cognitive scientist and postdoctoral fellow at the Helmholtz Munich. This might be helpful when traditional research progresses too slowly or when it's difficult to recruit children or patients with mental illnesses.

"Building theories in the field of cognitive science is very difficult," said Giosuè Baggio, a psycholinguist at the Norwegian University of Science and Technology in Trondheim. "It's really exciting to see what new ideas we can come up with with the help of machines."

"This result indicates that data-driven discovery of general models in the cognitive field is a promising research direction. The next step in research should be to transform this general computational model in the field into a unified theory of human cognition."

Human Cognitive AI: "Retrained" Based on 10 Million Human Decisions

For decades, the field of psychology has been committed to explaining the full complexity of human thinking. However, traditional models cannot simultaneously clearly explain how people think and reliably predict their behavior.

Centaur has achieved a breakthrough in this field by combining two previously independent areas - interpretable theories and predictive capabilities - so that it can identify common decision-making strategies and flexibly adapt to changing scenarios - even predicting reaction times with surprising accuracy.

"We created a tool that can predict (and simulate) human behavior in any situation described in natural language - like a virtual laboratory," Binz said.

It is reported that Centaur was built by fine-tuning Llama using the LoRA method on a dataset called "Psych-101". This dataset contains more than 10 million independent decisions made by more than 60,000 participants in 160 psychological experiments, covering a wide range of human behaviors from risky behaviors, reward learning to moral dilemmas. In the future, the research team will further expand this dataset by adding demographic and psychological characteristics.

Figure | Overview of Psych-101 and Centaur. a) Psych-101 includes trial-by-trial data from 160 psychological experiments, with a total of 60,092 participants, making a total of 10,681,650 choices, involving 253,597,411 text tokens, covering areas such as multi-armed bandits, decision-making, memory, supervised learning, and Markov decision processes; b) Centaur is a human cognitive foundation model fine-tuned on the Psych-101 dataset using the LoRA method.

For each experiment, the research team used 90% of the human data to train the model and then tested the model output with the remaining 10% of the data. In the experiments, they found that Centaur had a higher level of alignment with human data than more task-specific cognitive models. For example, in a two-armed bandit decision-making task, the data generated by the model was closer to the choices made by participants on the bandit machines than models specifically designed to capture how humans make decisions in this task.

Figure | Evaluation on different held-out datasets. a) Mean negative log-likelihood of responses in a two-step task based on a modified cover story (n = 9,702). b) Mean negative log-likelihood of responses in a three-armed bandit experiment (n = 510,154). c) Mean negative log-likelihood of responses in a logical reasoning experiment based on the Law School Admission Test (LSAT) (n = 99,204).

Centaur also produced human-like outputs on modified tasks not included in its training data, such as adding a third bandit machine in a two-armed bandit experiment. Binz said that this means researchers can use Centaur to develop experiments in computer simulations and then apply them to human participants, or use it to develop new theories of human behavior.

In a case study, the research team demonstrated how to use Psych-101 and Centaur to guide the development of predictable and interpretable cognitive models. Each step of this procedure is general, so it can serve as a blueprint for future model-driven scientific discovery in other experimental paradigms.

Moreover, Centaur is also suitable for more application scenarios in the field of automated cognitive science. For example, it can be used for computer simulation prototyping in experimental research. In this context, researchers can use the model to determine which experimental designs can produce the largest effect sizes, how to optimize experimental designs to reduce the number of required participants, or estimate the effect of a certain step.

"We've just started, and we can already see great potential," said Eric Schulz, the director of the Human-Centered AI Institute at the Helmholtz Munich.

Next, the researchers plan to conduct a more in-depth analysis of Centaur: Which computational patterns correspond to specific decision-making processes? Can they be used to infer how people process information - or how the decision-making strategies of healthy individuals differ from those with mental health problems?

The researchers firmly believe that "these models have the potential to fundamentally deepen our understanding of human cognition - provided that we use them responsibly."

A Bit "Absurd"?

Although Centaur has shown unexpected capabilities in accurately predicting human behavior and is expected to open up new possibilities for scientific research and practical applications in fields such as medicine, environmental science, and social science, it has also been questioned by several cognitive scientists.

"I think a large part of the scientific community will be skeptical of this paper and will criticize it harshly," said Blake Richards, a computational neuroscientist at McGill University and the Quebec Artificial Intelligence Institute (Mila). He pointed out that the model doesn't really simulate the human cognitive process, and there's no guarantee that the results it generates will match human behavior.

"Even more, in the view of Jeffrey Bowers, a cognitive scientist at the University of Bristol, this model seems a bit 'absurd'. He and his team tested Centaur and found that its behavior was significantly non-human. In a short-term memory test, the model could recall up to 256 digits, while humans usually can only remember about 7 digits. Bowers pointed out that in a reaction time test, the model could be triggered to respond at a'superhuman' speed (1 millisecond). He thus concluded that the model cannot generalize beyond the training data.

Bowers also said that Centaur cannot explain any aspect of human cognition. Just as an analog clock and a digital clock can show the same time but have completely different internal operating principles, although Centaur can produce human-like outputs, the mechanisms it relies on are completely different from human thinking.

Federico Adolfi, a computational cognitive scientist at the Max Planck Institute for Human Cognitive and Brain Sciences, agreed. He pointed out that further rigorous testing is likely to show that the model "is very prone to failure". He also noted that although the Psych-101 dataset is impressively large, 160 experiments are just "a grain of sand in the infinite ocean of cognition."

However, some people have expressed affirmation of this research. Rachel Heaton, a visual scientist at the University of Illinois at Urbana-Champaign, said that although the model doesn't provide a useful tool for understanding human cognition, the Psych-101 dataset itself is of great value because other researchers can use it to test the effectiveness of their own models. At the same time, Richards also believes that future research on the internal operating mechanism of Centaur may also be of great significance.

In addition, in the view of Katherine Storrs, a computational visual neuroscientist at the University of Auckland, although the paper makes some unfounded general conclusions, a lot of time and effort have been invested in the dataset and the model, and this work "may pay off scientifically in the long run."

Reference links:

https://www.nature.com/articles/d41586-025-02095-8

https://www.helmholtz-munich.de/en/newsroom/news-all/artikel/ai-that-thinks-like-us-and-could-help-explain-how-we-think

https://www.science.org/content/article/researchers-claim-their-ai-model-simulates-human-mind-others-are-skeptical

This article is from the WeChat official account "Academic Headlines" (ID: SciTouTiao). Author: Academic Headlines. Republished by 36Kr with permission.