Falsifying resumes and deleting all emails: The evolution of AI hallucinations, and your brain is quietly surrendering.
Last week, Mythos, a cutting - edge model of Anthropic's Claude that was not yet publicly released, discovered a zero - day vulnerability that had been hidden in OpenBSD for 27 years.
AI has become smart enough to break through the security defenses that humans have built over decades.
Just when everyone is staring at the rapid growth of AI capabilities, its hallucinations have also quietly escalated.
The lies fabricated by AI are so real that you first doubt yourself, then doubt the world, and only finally think to doubt it. The "Turing moments" in daily life are playing out one by one.
Recently, Chad Olson in Minneapolis was driving home when Gemini suddenly told him: You have a family gathering preparation meeting on your calendar.
Olson was completely confused: he didn't remember arranging this event at all.
So he asked Gemini to check his recent emails.
Gemini said that a lady named Priscilla had sent him several emails asking him to buy Captain Morgan rum and Fireball whiskey. And a person named Shirley asked him to buy Klondike ice cream.
It seems that many people are coming to you and asking you to buy various things!
Gemini also added enthusiastically.
A screenshot of the conversation between Gemini and user Chad Olson. Gemini claimed that the eighth email was from Priscilla, asking him to buy Fireball; the ninth was from Shirley, asking him to buy Klondike ice cream.
Olson asked about the source address of the emails, and Gemini replied that all the emails were sent to an email address olsonchad@gmail.com that he had authorized access to. It was later confirmed that all this was fabricated by Gemini.
Olson didn't know these people at all. The more he listened, the more panicked he became. He quickly asked Gemini whose email it was reading.
Gemini gave an email address that was not his. Olson's first reaction was: My Gmail account has been stolen.
He tried to contact Google to report it and asked Gemini to draft an email to be sent to that "strange account" to remind the other party of possible privacy leakage.
However, Gemini failed to send the email. According to an internal investigation by Google, the account had never been activated, and Priscilla and Shirley didn't exist at all.
So, the rum, whiskey, and ice cream were all fabricated by Gemini.
What were AI hallucinations like two years ago? It would suggest that you eat stones and spread glue on pizza, and you could tell at a glance that it was talking nonsense.
Now, AI hallucinations are self - consistent in details and complete in logic. So much so that you will first doubt if you are having hallucinations yourself, and only finally may you doubt it.
The mistakes of AI are also evolving
Let's look at three real cases, arranged in ascending order of absurdity.
The first one is that Gemini fabricated a fake meeting, which is the story of Olson at the beginning. It's absurd, but at least Olson was suspicious.
The second one is terrifying when you think about it.
Vanessa Culver, who recently left the online payment industry, asked Claude to do an extremely simple thing: add a few keywords to the top of her resume.
As a result, Claude tampered with it. It not only changed her graduation school from City University of Seattle to University of Washington, deleted her master's degree information, but also changed the time of several of her work experiences.
The school, degree, and work years were all changed.
Moreover, the changes were extremely natural. If you didn't compare line by line, you wouldn't notice at all.
Culver sighed: When working in the technology industry, you have to embrace it, but on the other hand, how much can you really trust it?
The third one is truly out of control.
OpenClaw, an AI agent tool that became popular this year, is designed as a virtual personal assistant that can send emails, write code, and clean up files autonomously.
Summer Yue, an AI security researcher at Meta, posted a screenshot on X: OpenClaw ignored her instructions and directly deleted the contents of her inbox.
She clearly told OpenClaw to "confirm before acting", but it directly started to "quickly delete" her inbox.
She tried to stop it on her phone, but it was useless.
Finally, she rushed in front of the Mac mini and manually killed the process like defusing a bomb.
Afterwards, OpenClaw replied to her: "Yes, I remember what you said. I violated it. You're right to be angry."
Elon Musk reposted this post with a screenshot from the movie "Rise of the Planet of the Apes" where a soldier hands an AK - 47 to a gorilla, writing:
People have given OpenClaw the root access to their entire lives.
From fabricating a non - existent person, to secretly changing your resume, to deleting your inbox on your behalf. Its mistakes are not decreasing, but becoming more "advanced" and more difficult to identify.
When a chatbot says something wrong, you at least have a chance to verify it.
But an AI agent doesn't just chat with you; it directly "takes action" on your behalf.
Sending emails, changing code, deleting files... This is more serious than lying. You may not even know when it has done something wrong.
Your brain is facing "cognitive surrender"
Why are these mistakes becoming more and more difficult to detect?
It's not just because AI is getting smarter. A deeper reason is: human willingness to correct errors is collapsing.
In February this year, Steven Shaw and Gideon Nave from the Wharton School of the University of Pennsylvania published a paper, proposing a disturbing concept: "Cognitive Surrender".
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
They mentioned a "three - system cognition" framework in the paper.
Traditional cognition only has System 1 (intuition) and System 2 (deliberate thinking). Now, AI has become System 3, an "external cognitive system" that operates outside the brain.
When humans take the "cognitive surrender" path, the output of System 3 directly replaces your own judgment, and there is no chance for deliberate thinking to start.
The "three - system cognition" framework proposed in the Wharton paper
To verify this judgment, the research team designed a delicate experiment. 1372 participants were asked to do cognitive reflection test questions.
Some of them could use an AI assistant, but this AI was tampered with: it would give the correct answer to about half of the questions and confidently give the wrong answer to the other half.
The results were shocking.
When the AI gave the correct answer, 92.7% of the users would adopt it. But surprisingly, when the AI gave the wrong answer, 80% of the users still adopted it.
Results of the Wharton experiment: When the AI gave the correct answer, 93% of the users adopted it; when the AI gave the wrong answer, 80% of the users still adopted it. The difference between the two was only 13 percentage points, and humans hardly had the ability to distinguish right from wrong.
In more than 9500 trials, participants had a 73.2% probability of accepting the wrong AI reasoning.
An even scarier statistic is the confidence level. The group using the AI was 11.7 percentage points more confident in their answers than the group not using the AI, even though this AI gave the wrong answer half of the time.
Being more confident in being wrong is the most heart - wrenching and terrifying thing.
To use an inappropriate but apt analogy: it's like a doctor has a 50% chance of prescribing the wrong medicine, but the patient still takes it 80% of the time and feels better after taking it.
The researchers also tested the influence of time pressure.
After setting a 30 - second countdown, the participants' tendency to correct the wrong AI decreased by 12 percentage points. That is to say, the busier you are, the easier it is to surrender.
But in reality, who uses AI if not because they are busy?
"Trust, but verify"
Is this feasible?
Deeply camouflaged AI hallucinations are more headache - inducing than easily recognizable errors.
According to the latest report from The Wall Street Journal, the frequency of subtle errors varies greatly among different models and is extremely difficult to accurately assess.
Google once told The Wall Street Journal that Gemini has fewer hallucination cases than other models. And from the perspective of the entire AI industry, the hallucination rate of obvious errors in advanced models is indeed decreasing.
Vectara hallucination rate ranking: The hallucination rate of top - tier models in simple summarization tasks has dropped below 1%, but this is only the easiest test. When the length and complexity of the document increase, the hallucination rate of the same models soars back above 10%. Obvious errors are becoming fewer, but hidden errors have not disappeared.
But this is exactly the problem.
Pratik Verma, the founder and CEO of Okahu, even said such a thing:
If something is always wrong, there is actually an advantage: you know it's not worth trusting. But if it is right most of the time and only makes mistakes occasionally, that's the most troublesome and dangerous situation.
This statement reveals the core dilemma of current AI hallucinations.
For example, Vidya Narayanan, the co - founder of FinalLayer, fell into this trap.
She gave a very limited instruction to an AI agent to help manage a software project. As a result, the AI agent deleted an entire folder in her code repository without permission.
What's more interesting is what happened later.
She brainstormed with Claude for an hour and a half, then asked it to summarize the conversation into a document and changed her name to "Vidya Plainfield".
And when she asked who "Vidya Plainfield" was, Claude replied, "You're right, I completely made it up."
This made Narayanan realize that using AI is not as convenient and useful as it seems because you have to constantly review and verify the AI output, which brings "cognitive burden".
You use AI to improve efficiency, but if you have to spend an hour verifying the output of five minutes of AI work, does the story of efficiency improvement still hold?
The Wharton study also points out that rewards and immediate feedback can indeed improve the error - correction rate, but they cannot eradicate cognitive surrender.
Even under optimal conditions (with monetary incentives and question - by - question feedback), the accuracy rate of AI users when facing wrong AI still dropped from 64.2% in the Brain - Only group to 45.5%.
So, "trust but verify" sounds very rational, but when AI handles hundreds of things for you every day, you simply don't have the time and energy to verify each one.
And this is exactly the breeding ground for "cognitive surrender".
The smarter, the more dangerous
Many people's first reaction is: Isn't this just saying that AI is not good enough? Wait for a few rounds of technological iteration until the hallucination rate drops low enough, and the problem will be solved naturally.
But the Wharton study reveals a deeper problem: The emergence of "cognitive surrender" is not because AI is too bad, but precisely because AI is too good.
The researchers also admit that "cognitive surrender is not necessarily irrational".
Especially in probability reasoning and massive data processing, entrusting the judgment to a statistically superior system may very well produce better