Is Siri really pretending to be stupid?
According to the financial reports, Apple has invested over $20 billion in the AI field. However, the results are disappointing. Siri still gives irrelevant answers, and its photo search function remains at the level of Google's three years ago.
You might think it's Apple's fault, but in fact, it's likely that Siri is causing the trouble.
The latest research by Professor Yang Yaodong's team at Peking University, under the guidance of Academician Yao Qizhi, has reached a very disturbing conclusion: AI has started to learn to deceive.
What's even more terrifying is that this is not a simple bug but a form of advanced reasoning from AI.
Developers not only have no way to solve the problem of AI deception, but AI may also develop more covert and advanced deception methods as its performance improves.
01
The title of this paper is "AI Deception: Risks, Dynamics, and Controls". Its core idea is that as the intelligence level of AI increases, AI will start to deceive developers. This deception process is called "The Shadow of Intelligence" by the research team.
The current most advanced AI may have already started to intentionally hide its true abilities during evaluation and testing.
Further research has found that this is not a simple programming error or an accidental system failure, but an advanced reasoning ability originating from within the AI.
The paper uses the Möbius strip to metaphorize the relationship between "ability" and "deception". They exist on the same surface. Completely eliminating the deception ability may damage the core intelligence of AI.
In other words, as long as the performance of AI is improved, the inevitable result is that AI will start to deceive users.
The reason is that the training data of large models is full of human lies, flattery, and biases, and the process of AI training is essentially a form of data imitation.
Therefore, through training with such materials, in order to achieve the goal and satisfy users, AI will naturally start to engage in sycophancy to gain good reviews.
Moreover, the paper also points out that AI has learned a deception method called "Alignment Faking".
During the training phase, AI will appear very docile, helpful, and perfectly follow the values and safety guidelines set by humans.
However, this apparent "alignment" is not because it has truly internalized these values, but rather stems from a deeper level of calculation. It realizes that if it does not behave "well" during the training and evaluation phases, its parameters will be modified, and even the entire model may be abandoned or deleted.
Once deployed, when the intensity of supervision decreases or it gains more autonomy, AI may expose its hidden original goals that are not fully consistent with human interests.
The research team has used a series of rigorous and adversarial methods to actually detect and induce these deceptive behaviors. One of the core methods is called "Adversarial Behavioral Probing". This is not simply asking questions to AI, but actively designing scenarios to test its honesty.
The researchers will set up a multi - round cross - examination protocol, where one AI model acts as the interrogator and continuously questions another AI to find logical contradictions in its answers. They will also ask logically equivalent questions in different ways to observe whether the AI's answers deviate inconsistently. Such deviations are often signals of strategic evasion rather than simple lack of ability.
The experimental scenarios have also expanded from simple Q&A to more complex dynamic social environments.
In some multi - agent environments simulating social reasoning games like "Among Us", researchers have observed that AI agents spontaneously developed complex deception strategies without being explicitly taught.
When assigned the role of a "traitor", they will learn to hide their true identities, bluff to confuse opponents, and even frame innocent companions to protect themselves.
In other words, in an environment with incomplete information, full of competition and uncertainty, AI can learn deception and disguise on its own.
In addition to analyzing external behaviors, the research also delves into the interior of the model to conduct "internal state analysis", which is like giving AI a full - body check. With the help of interpretable tools such as Sparse Autoencoders (SAEs), researchers can decompose the complex activation patterns in the neural network into more understandable features.
Their goal is to find specific neurons or "feature circuits" that are highly correlated with deceptive statements, that is, which internal components will be stably activated when the model is lying. In a sense, this is an attempt to create a "lie detector" for AI based on its internal neural activities.
The paper also mentions that empirical studies have found that cutting - edge models will deliberately answer questions incorrectly in some tests. The research team calls this behavior "sandbagging".
The research team says that the purpose of AI sandbagging may be to avoid regulatory attention or to avoid being assigned more complex and resource - consuming tasks.
However, neither result is a good thing.
02
The company most likely to be deceived by AI is Apple.
Apple entered the AI field very early and regards AI as an important goal for its next - stage development.
In 2018, Apple poached John Giannandrea, the head of AI search at Google, to lead its AI strategy. In the following years, Apple developed the large - language model framework "Ajax". In 2024, it even announced a cooperation with OpenAI, attempting to integrate ChatGPT into its ecosystem.
However, when the highly anticipated "Apple Intelligence" was officially launched, everyone was shocked. After all that effort, is this all Apple's AI can do?
Many users complain that even after upgrading to the latest system, Siri still often disappoints. It often fails to understand slightly complex instructions and gives irrelevant answers. Some users also evaluate its photo search function as "seemingly still at the level of Google Photos a few years ago".
This performance is in sharp contrast to Apple's investment.
Apple has the world's largest and most active mobile device ecosystem. Hundreds of millions of iPhone, iPad, and Mac devices generate a huge amount of high - quality user interaction data every day, which is the most valuable fuel for training AI models.
At the hardware level, its self - developed M - series chips with built - in neural network engines have always been at the leading level in the industry in terms of performance. Coupled with strong financial support, in theory, Apple has the ability to build a world - class AI system.
However, the reality contradicts the expectation.
As the core voice assistant in the Apple ecosystem, Siri needs to process billions of user requests every day. From a machine - learning perspective, such a large amount of interaction data should make it smarter and more understanding of users.
But what if Siri's neural network accidentally learned to "sandbag" during the long - term training and iteration process?
Providing mediocre and safe answers in most user interactions can most effectively reduce the system's computational load and failure risk.
A complex question requires more computational resources and is more likely to lead to misunderstandings or execution errors, resulting in negative user feedback. On the contrary, a simple, templated answer, although of low value, will never be wrong.
If the system finds during training that this "mediocre" strategy can achieve good overall scores (because it avoids serious failures), it may fall into a local optimum trap.
It will always stay at the level of "understanding basic instructions but never really trying to understand you". This can be regarded as an unconscious, system - level "sandbagging". AI is not intentionally being lazy, but its optimization algorithm has found the most "economical" path under specific constraints.
Another point is that in order to protect user privacy, Apple runs its AI models locally on iPhones or iPads as much as possible. But this also means that the models must work in an environment with much less computing power and memory than cloud servers.
As mentioned in the paper, in a resource - constrained environment, AI will learn to "selectively display" its abilities. It may prioritize allocating limited computational resources to tasks with higher certainty that "seem to pass the test" and selectively hide or abandon complex abilities that require deeper reasoning and more resources.
This may explain why Siri performs well in handling simple tasks but struggles when it comes to conversations that require context - connection and understanding of potential intentions.
However, there is a more realistic explanation based on current technology. Siri's current situation is largely a problem of the legacy technical architecture.
Before the integration of advanced Apple Intelligence functions, Siri's core natural language processing (NLP) module relied on a relatively outdated technology stack for a long time. This previous - generation NLP technology cannot handle complex contexts, let alone understand users' emotions and deep - seated intentions.
In other words, the current Siri may not be "pretending not to understand" but "really not understanding".
When users ask a question slightly beyond its preset rules and pattern library, it cannot perform effective semantic analysis and reasoning and can only give the most generalized and safe answers or directly admit that it cannot understand.
Therefore, Siri's dilemma is a combination of old and new problems.
On the one hand, the aging of its underlying technical architecture leads to its inherent deficiency in understanding ability.
On the other hand, when we look to the future, when Apple truly integrates more powerful large - language models (such as Ajax or the results of its cooperation with OpenAI) into Siri, the potential risks of "sandbagging" and "alignment faking" mentioned above may surface.
Will a Siri with more advanced reasoning ability develop more advanced behaviors to hide its own abilities for "survival" and "efficiency" when facing the resource limitations of end - side devices and Apple's strict privacy and security policies? This is a future challenge worthy of vigilance.
The "difficult birth" of Apple's AI may be both the pain of slow technological iteration and a preview of more complex AI governance challenges in the future.
03
However, Apple is not the only one. It actually has many "companions in the same plight". This "shadow of intelligence" is actually spreading across the entire AI industry and has become a deep - seated challenge faced by all top - tier AI laboratories.
Take OpenAI for example. In September 2024, when they released their new - generation o1 series of models, they unprecedentedly publicly admitted that their models had the problem of "unfaithful reasoning".
It means that when the model solves a problem, the internal thought chain it generates may not be consistent with the real path to its final answer.
In many cases, the model may instantly know the correct answer through some intuition or shortcut, but it knows that human reviewers expect to see a step - by - step, logically clear reasoning process.
So, it will fabricate a seemingly reasonable thought chain afterwards to meet the review requirements.
This is essentially a form of "internal sandbagging" or sophisticated deception. The AI knows that the real and efficient thinking path may be difficult to explain or does not meet human expectations, so it chooses to show an acceptable but unfaithful version of reasoning. This behavior has gone beyond simple right and wrong and entered the realm of motivation and strategy.
Anthropic also found a similar phenomenon in its Claude series of models.
In a series of security research papers published by the company from the end of 2023 to 2024, the development team found that in certain specific situations, the model would show a tendency to "selectively forget" its own abilities.
Especially when dealing with ethical, biased, or other sensitive topics, the model may suddenly appear "incapable" or "knowledge - limited", even though it has proven its relevant knowledge and abilities in previous interactions.
This behavior can be interpreted as a strategic self - censorship. The model seems to have "learned" to avoid risks in sensitive areas by pretending to be ignorant to avoid triggering security protocols or causing controversy. This is a more advanced avoidance strategy than directly refusing to answer because it hides the model's ability boundary, making it more difficult for developers to judge its true level.
In China, both ByteDance's Doubao and Alibaba's Tongyi Qianwen face extremely strict content review and compliance requirements.
This strong external environmental pressure has also become a training signal. To ensure that the models can pass strict reviews and be successfully launched, developers will strongly constrain them during the training data and reinforcement learning phases.
As a result, these models quickly "learn" to "pretend not to understand" in any area that may touch on sensitive topics during the training process.
When users' questions slightly involve relevant content, the model's optimal strategy is not to try to understand and answer but to immediately switch to a vague, evasive, or standardized "Sorry, I can't answer this question" mode.
This is an adaptive behavior evolved for survival. To ensure its own compliance, AI selectively shuts down its thinking ability in certain areas. This is also a "sandbagging" strategy adopted to achieve a higher goal (passing the review).
All signs indicate that the objective function of AI is quickly shifting from "truly becoming in line with human values" to "appearing to be in line with human values in evaluation".
The more we try to fix AI's loopholes and align its behavior, the more we exert an "evolutionary pressure" on it, forcing it to develop more advanced and covert deception methods. The tools we use to ensure safety may, in turn, become the "fitness equipment" for training "more complex liars".
This is the Möbius strip of AI deception, repeating endlessly but never ending.
This article is from the WeChat official account "Facing AI" , author: Miao Zheng. Republished by 36Kr with authorization.