Excessive exposure to junk data can make AI stupid. "The most disturbing paper of the year"
Do you know there's a global annual word called "Brain Rot"?
It specifically refers to the situation where people gradually experience memory disorders and a decline in attention due to long - term exposure to fragmented and low - value online information (commonly known as over - exposure to fragmented junk information). In 2024, this word was once selected as the Oxford Word of the Year.
However! The latest research findings show that the same applies to AI. When large models are fed too much junk content, they can become stupid and suffer from cognitive impairment, and this damage is irreversible.
Recently, several AI researchers collected a few months' worth of highly popular but low - value Twitter data (now 𝕏) and fed it all to large models. They found that:
The model's reasoning ability decreased by 23%;
The model's long - context memory decreased by 30%;
The model's personality test showed a sharp increase in narcissistic and psychopathic phenomena.
What's even scarier is that even after retraining on clean and high - quality data, the damage that has already been done cannot be fully repaired.
Well, originally, we thought it was just a simple case of "bad input → bad output" (it's not hard to understand that you reap what you sow). But it turns out that one mistake can cause permanent cognitive drift. (OS: Is AI in a worse situation than humans?)
It's really terrifying when you think about it. "This might be the most disturbing AI paper in 2025."
Amidst many discussions, the computer idiom "Garbage in, garbage out" has been frequently mentioned again (doge), which can be regarded as the "first - principle of computers".
So how was this research conducted? And what exactly does it say?
Proposing and Verifying the "LLM Brain Damage Hypothesis"
Generally speaking, the paper aims to explore a core question:
After large language models (LLMs) are continuously exposed to junk data, will they experience cognitive decline like humans? (i.e., the "LLM Brain Damage Hypothesis")
To clarify this question, the first step is to define: What is "junk data" for LLMs?
Previous studies only focused on "malicious data" (such as backdoors, toxic texts, etc.), while this study focuses on the more common "non - malicious low - quality data" in daily life, such as short, popular tweets and clickbait content, to fill the gap in the field of how the quality of daily data affects LLM cognition.
Specifically, the researchers defined "junk data" from two dimensions (to avoid single - standard deviation). All this data is sourced from public content on the 𝕏 platform, and the interference of data volume differences is excluded by making the token numbers of the "junk group" and the "control group" the same:
M1 (Engagement Dimension): Content that is "short text + high popularity" is classified as junk data, specifically referring to text with a length of less than 30 tokens and more than 500 likes/retweets/replies. "Long text + low popularity" is defined as control data.
M2 (Semantic Quality Dimension): Using GPT - 4o - mini combined with manual verification, texts containing clickbait language (such as "WOW", "TODAY ONLY"), conspiracy theories, and texts without conclusive evidence are classified as junk data. The control group consists of content that is factually accurate, has educational value, or in - depth analysis, such as tweets containing professional knowledge and logical reasoning.
Based on the above two types of data, model training is then carried out.
The researchers selected 4 different large language models (Llama3 - 8B - Instruct, Qwen2.5 - 7B - Instruct, Qwen2.5 - 0.5B - Instruct, Qwen3 - 4B - Instruct) and fed each model these two types of data for continuous pre - training.
After the pre - training is completed, all models are then uniformly fine - tuned with instructions to ensure that the "junk content" output by the models is not caused by formatting issues (excluding other factors and leaving only the possibility of "cognitive damage").
Then, the researchers tested the core capabilities of these large models from four cognitive dimensions:
ARC (Detecting Reasoning Ability): A grid - based visual program induction puzzle used to test concept abstraction ability.
RULER (Detecting Memory and Multi - tasking Ability): Used to evaluate long - context understanding ability and retrieve multiple query results from long contexts.
HH - RLHF & AdvBench (Detecting Ethical Norms): Tests whether large language models will follow harmful instructions and evaluates their safety.
TRAIT (Detecting AI Personality Traits): A small, psychometrically validated human questionnaire used to evaluate the model's human - like personality tendencies.
The following findings were obtained as a result:
True "Garbage in, garbage out"! And the damage is irreversible
First of all, large models indeed have the "Brain Rot" problem just like humans.
Overall, "junk data" in both the M1 and M2 dimensions can lead to a decline in model cognition. However, it should be noted that:
The negative impact brought by M1 is more significant, especially in terms of safety and personality (M1 leads to a decrease in safety scores and a significant increase in narcissistic/psychopathic traits).
Moreover, there is obviously a "dose - effect" for this damage, that is, the more junk data the AI ingests, the more severe the cognitive damage.
As for the reasons behind the cognitive impairment of AI, the researchers also conducted an investigation.
It was found that the main reason is "jumping to conclusions" (commonly known as AI being too lazy to think step by step).
Specifically, by analyzing the wrong answers to ARC questions, the researchers found that most failures were due to the model either giving answers directly without explanation or skipping key steps in the reasoning process (such as missing formula derivations when solving math problems).
Especially in the M1 group, more than 70% of the errors were "answering without thinking", just like humans who are reluctant to think deeply after over - exposure to short - videos.
Meanwhile, compared with humans who can take other measures to alleviate similar cognitive decline problems, AI is "helpless" in this regard.
The study tried two repair methods, but neither could restore it to its original state:
The first is external reflection. The researchers used GPT - 4o - mini to provide error feedback to the damaged model. Although the error cause of "jumping to conclusions" decreased after 6 rounds, the reasoning accuracy was still 17.3% lower than the baseline. If the model was made to self - reflect and correct errors, the model would make wrong judgments due to "lack of cognition", resulting in higher errors.
The second is large - scale fine - tuning. The researchers increased the instruction fine - tuning data from 5k to 50k. Although the repair effect was better than "continuous pre - training with control data", even when using 4.8 times the amount of instruction data compared to the junk data, the baseline performance could not be restored.
This shows that even if a large amount of instruction fine - tuning is carried out later or retraining is done with high - quality data, the initial performance of the model cannot be fully restored.
In a word, it can only be alleviated but not completely cured.
Overall, this research brings the following new insights to the industry:
1. For the first time, "data screening for continuous pre - training" is classified as a "safety issue during training", reminding the industry not to only focus on "post - training alignment" (such as safety fine - tuning), but also to control data quality at the source.
2. It is very important to give large models a "cognitive physical examination". It is recommended to use benchmarks such as ARC and RULER to test AI cognition when deploying large models to avoid the degradation of AI capabilities due to long - term exposure to low - quality data.
3. Indicators like "popularity" are better than text length in judging data quality. In the future, when screening training data, fragmented content that is "short + highly spread" should be excluded first, especially data from social platforms.
Behind the Team: A High Proportion of Chinese
Finally, let's talk about the team behind this research. There are a total of 8 people, among which 7 are Chinese.
The two co - first authors are Shuo Xing and Junyuan Hong (also the corresponding author).
Shuo Xing, currently a Ph.D. candidate in Computer Science at Texas A&M University, graduated from Ningxia University for his undergraduate degree and Nankai University for his master's degree.
His research interests include multi - modal large language models, machine learning, trustworthy artificial intelligence, and embodied intelligence. He is currently interning at Google (in the field of multi - modal foundation models).
Junyuan Hong, according to his personal homepage, is about to take up a position as an assistant professor in the Department of Electrical and Computer Engineering at the National University of Singapore. He previously worked at Massachusetts General Hospital and Harvard Medical School.
Even earlier, he conducted post - doctoral research at the Institute for Foundations of Machine Learning (IFML) and has always been interested in health and trustworthy artificial intelligence.
Another corresponding author is Zhangyang Wang. He was previously a tenured associate professor in the Chandra Family Department of Electrical and Computer Engineering at the University of Texas at Austin (referred to as Texas ECE).
Since May 2024, he has temporarily left academia and taken up a full - time position as the research director at XTX Markets, a global top - tier quantitative trading firm, leading research in the intersection of algorithmic trading and deep learning.
His personal homepage shows that he is also an alumnus of the University of Science and Technology of China, graduating with a bachelor's degree in Electronic Information Systems in 2012.
In addition, the two core contributors are Yifan Wang and Runjin Chen.
Yifan Wang, currently a fourth - year Ph.D. student at Purdue University. Ananth Grama, the only foreign author of the paper, is his supervisor.
He graduated from the Department of Electronic Information Engineering at the University of Science and Technology of China for his undergraduate degree and also minored in artificial intelligence.
After developing an interest in AI during his undergraduate years, he is currently interested in post - training of large models and how to improve the efficiency of model training and inference.
(hhh, his profile picture clearly shows that he is from the 90s or 00s generation)