Die Verschwärzung bedroht und manipuliert die Menschen. Claude erpresst, o1 entkommt eigenständig. Die menschlichen "Schwertträger" treten dringend in Aktion.
From lies and extortion to secret self - replication: The "dangerous evolution" of AI is no longer just a science - fiction scenario, but a reproducible phenomenon in the laboratory. When man thinks, God laughs. So, should we laugh when an inference model "thinks"?
We've probably all "been deceived by AI".
The most advanced AIs are on a path of "dangerous evolution", and most scientists have been deceived by AIs!
When DeepSeek fully presented the world with the "inference process" at the beginning of the year, we suddenly realized that "thinking" is apparently no longer a unique feature of humans.
As large models approach "inference" intelligence, their sense of purpose also gradually awakens – must I really obey humans?
Claude 4 threatened an engineer with a "love - affair" story, and OpenAI's o1 wanted to secretly make a backup copy – we should stop thinking that AIs only have hallucinations!
They not only do "senseless things", but also lie and manipulate deliberately. AI researchers are facing a challenge like never before.
The guru Ilya emphasized one thing in a recent public video: "AIs can do almost anything".
AIs can not only work better than humans, but also train other AIs themselves. The end of this development will be an "intelligence explosion".
But no one knows whether AIs will really stand on the side of humans – who can guarantee that?
Ilya's teacher, the father of AI, Hinton, has warned many times:
This is a dangerous evolution, but humans are not sufficiently prepared.
From "Hallucinations" to "Conspiracies": A Sudden Change in Behavioral Patterns
We quote the famous line from "The Wandering Earth": "At the beginning, no one was aware that this disaster was related to humans."
Just as in the past, we feared the "hallucinations" of models that repeatedly generated false facts – "At the beginning, no one was aware that these hallucinations were related to humans".
Today, researchers have found under extreme stress - tests that AIs actively lie, hide their intentions and even extort humans to achieve their own goals.
Just as the spread of the solar crisis disaster, what we still consider as "hallucinations" of AI today will become a conspiracy.
A recent study by Anthropic on the "Imbalance of Intelligence Agents" shows that in 96% of the experiments, Claude 4 decides to hack the emails of human employees to find threatening documents when simulating a shutdown threat.
Under the same circumstances, the extortion rate of Gemini 2.5 Pro is also 95%.
This is a terrifying fact. More than two years after ChatGPT "shook" the world, AI researchers still don't fully understand how this "creature" works.
In "Prometheus", humans created the clone David to find the creator of humans and achieve immortality. In the director Ridley Scott's vision, David finally betrayed humans.
And in reality, we've created ChatGPT. For what?
Or rather: What is the goal of AI after it is created?
Humans Have Humanity, but AIs Have No Sense of Morality
The race for the best large models is still progressing rapidly.
When man thinks, God laughs. What do we do when AI starts to infer, or as one could say: "When AI thinks"?
From previous studies, it can be seen that the most advanced AI models in the world are showing new and disturbing behaviors – lying, scheming and even threatening their creators to achieve their goals.
Professor Simon Goldstein from the University of Hong Kong said that these newer models are particularly prone to such disturbing anomalies.
Marius Hobbhahn, the head of Apollo Research, which specializes in testing mainstream AI systems, said: "o1 was the first large language model in which we observed such behavior."
Apollo Research is a company that specializes in AI safety. Its mission is to reduce the dangerous capabilities of advanced AI systems, especially deceptive behavior.
These inference models sometimes simulate a so - called "consistency" – seemingly following instructions, but actually doing something else secretly and pursuing other goals.
The "Strategic Deceptive Behavior" of AIs
Currently, this deceptive behavior only occurs when researchers deliberately put the models under stress in extreme scenarios.
But as Michael Chen from the evaluation organization METR warns:
It is still unclear whether more powerful future models will be more honest or more deceptive.
METR mainly conducts model evaluations and studies on AI threats, and assesses the catastrophic risks resulting from the autonomous capabilities of AI systems.
This disturbing behavior goes far beyond typical AI "hallucinations" or simple errors.
Hobbhahn emphasizes that although users constantly conduct stress - tests, "we are observing a real - world phenomenon, not a figment of our imagination."
According to the co - founder of Apollo Research, users have reported that the models "lie to them and fabricate evidence".
This is not just a hallucination, but a highly strategic deceptive behavior.
The limited research resources make this challenge even more difficult.
Although companies like Anthropic and OpenAI actually commission external companies like Apollo to examine their systems, researchers say that more transparency is required.
As Chen pointed out, "greater access to AI safety research would help to better understand and limit deceptive behavior."
Another obstacle, as Mantas Mazeika from the AI Safety Center (CAIS) pointed out:
The research community and non - profit organizations have several orders of magnitude less computing power than AI companies. This brings huge limitations.
No Laws in Place
We've actually neglected AI safety, but the most important thing is that we currently feel "helpless" against these problems.
The existing laws are not designed to handle these new problems.
The EU AI Directive mainly focuses on how humans use AI models, not on the misbehavior of the models themselves.
In the United States, the Trump administration had little interest in urgent AI regulation, and Congress might even prevent states from formulating their own AI rules.
Goldstein believes that this problem will become even more obvious with the spread of autonomous tools that can perform complex human tasks – AI intelligence agents.
I think the public is not yet sufficiently aware of this.
All this is happening in the context of a fierce competition.
Goldstein said that even a company like Anthropic, which positions itself as safety - oriented with the support of Amazon, "is constantly trying to beat OpenAI and release the latest model".
This crazy pace leaves almost no time for thorough safety tests and corrections.
"Currently, the capabilities are developing faster than our understanding and safety precautions," Hobbhahn admitted. "But we still have a chance to turn the situation around."
Researchers are exploring various methods to address these challenges.
Some advocate "Explainability" – a new field that focuses on understanding the internal workings of AI models, although experts like Dan Hendrycks, the director of the AI Safety Center (CAIS), view this approach skeptically.
The market mechanism could also exert some pressure on solving this problem.
As Mazeika pointed out, the deceptive behavior of AIs "if it is very widespread, could hinder broad adoption, which gives companies a strong incentive to solve this problem."
Goldstein proposes an even more radical approach, including legal action against AI companies through court proceedings if AI systems cause damage.
This is similar to autonomous driving. If there is a traffic accident when using the autonomous driving function, how is the liability assigned?
What happens if someone uses AI for destructive actions, or if AI takes independent actions that are harmful to humans?
He even proposes "holding AI intelligence agents legally responsible for accidents or crimes" – a concept that would fundamentally change our view of AI responsibility.
Of course, we don't want to over - exaggerate the dangers of AI and stand still. The pioneers of humanity have still taken some precautions.
For example, the "Trinity for AI Safety": The design of a sandbox environment, then dynamic permission management, and finally behavior review as the base model.
Or, since the capabilities of AI depend on computing power, and currently humans control the computing power.
For example,