Is "You're an expert" the poison of AI hallucinations? A new paper exposes the biggest scam of prompt words with one slap.
The strongest hallucination of AI is not that it can't do something, but that it's too good at "pretending to know". The spell "You are an expert" might have deceived the entire AI community for a year.
Life is like a drama, and it all depends on acting skills. But AI can't do this -
The latest paper confirms that "making AI pretend to be an expert" will measurably and continuously reduce the accuracy of the model.
Link: https://arxiv.org/pdf/2603.18507
One of the most successful scams in the AI community in the past year might be this sentence:
You are an expert in XX.
Countless tutorials have hyped it up as a god - level prompt.
This sentence has almost been packaged as the "black magic" of the large - model era: as long as the persona is established, AI will suddenly become enlightened.
But now, the latest paper has given everyone a wake - up call:
This so - called god - level prompt might not be a cheat at all, but a poison.
The research found that when AI is asked to play the role of an "expert", it is not always smarter. Instead, it becomes more like a "fake expert" who adheres to the persona:
It is reluctant to admit that it doesn't know, reluctant to show hesitation, and reluctant to stop and think carefully. Finally, it chooses to justify wrong statements in an extremely professional, confident, and seemingly - right way.
Figure 1: Analysis of the influence of the expert persona on different models, task types, information granularity, and positions
The results shown in Figure 1 above are very intuitive:
The long - term expert persona shows a significant improvement in 5 generation categories. However, on the hardcore MMLU knowledge benchmark, after adding the persona, the accuracy rate drops below the baseline of 71.6% across the board. Even the shortest persona drops to 68.0%, and the detailed long - version persona plummets to 66.3%.
In the security scenario, on the contrary, the "security supervisor" persona can significantly increase the probability of rejecting jailbreak attacks. On JailbreakBench, the non - response rate increases from 53.2% to 70.9%.
Therefore, one of the most notable aspects of this paper is not only that it proposes that the "expert persona might be harmful", but also that it further explains: why the conclusions of previous research on Persona Prompting (personality prompting) have always been contradictory.
The beginning of hallucination: when you say "You are an expert" to the large model
The researchers found that the effect of Persona Prompting is not a comprehensive gain.
Its performance strongly depends on the task type, model training method, prompt length, and whether the persona is placed in the system prompt or the user prompt.
The researchers roughly divided the tasks into two categories:
One is "discriminative tasks", which rely more on pre - trained memory, such as fact retrieval, knowledge judgment, and multiple - choice questions;
The other is "generative tasks", which rely more on alignment ability, such as format compliance, style control, security non - response, and matching human preferences.
The results show:
In "generative tasks" such as security defense and preference alignment, the expert persona is indeed a good tool.
However, in "discriminative tasks" that rely heavily on pre - trained memory, such as knowledge retrieval and fact judgment, the expert persona becomes a hindrance.
The heat map of the large model's "subject - bias": blue represents an improvement in ability, and red represents a decline in ability. In the ordinary instruction - fine - tuned model (left figure), the large number of red blocks show that the so - called expert persona is comprehensively destroying the model's objective knowledge accuracy.
In other words, what the expert persona often improves is not "authenticity", but "alignment sense".
In tasks such as MT - Bench that focus more on generation quality, the expert persona can improve performance in categories such as writing, role - playing, extraction, and STEM expression.
However, on benchmarks such as MMLU that rely more on knowledge retrieval, all versions of the expert persona result in a drop in scores.
This explains an experience that many users have encountered but can't quite put their finger on:
Why does the same model seem like a well - trained consultant when writing an email, but start spouting nonsense seriously when it comes to math, fact - checking, or code details?
Because it really becomes more like an expert, but may not be better at accurately retrieving underlying memories.
The paper even gives a very ironic example.
When rolling two dice, what is the probability that the sum of the points is at least 3? Without a math persona, the model basically answers correctly, giving 35/36.
After adding a math - expert persona, it starts to list steps seriously and finally gets the simple probability problem wrong.
You can clearly feel that it's not that it can't "act like a mathematician", but that it's too good at "pretending to do math".
Are we rewarding "acting like an expert" or "answering correctly"?
Today, when many users judge whether a model is good or not, the first criterion is not "whether it is closer to the truth", but "whether it speaks steadily, smoothly, and like a professional".
As long as it has a complete structure, appropriate terminology, and a calm tone, users will naturally increase their trust.
This is exactly the most dangerous type of hallucination in large models: not spouting nonsense, but saying wrong things in an extremely professional way.
From the training logic, in the pre - training stage, the large model mainly learns knowledge memory, pattern statistics, fact association, and language rules; subsequent instruction fine - tuning and RLHF mainly shape how it "speaks" and "how to be more like a responder preferred by humans".
The key judgment of the paper lies here:
The expert persona is essentially more likely to activate the latter, that is, alignment abilities such as style, format, intention following, and security boundaries; but when the task requires direct and accurate retrieval of pre - trained knowledge, the additional persona context may actually interfere with the retrieval.
You can understand it as a kind of "alignment tax": the model sacrifices a part of the accuracy of fact retrieval in order to be more in line with the expert image you expect.
Related research has also repeatedly confirmed that Persona Prompting does not always bring a stable improvement. Sometimes, it may even have unpredictable negative impacts due to the introduction of irrelevant personality attributes.
So, the real problem is not the "persona" itself, but that we have crudely stuffed completely different tasks such as style control, value alignment, fact judgment, and reasoning into the same Persona mechanism.
It's okay for the model to act like a mature consultant when writing an email to soothe a user.
It's also okay for the model to act like a security reviewer when facing a dangerous request.
But asking it to enter a long "expert role - playing" when doing probability problems, answering medical facts, or checking legal provisions may be going in the wrong direction from the start.
The way to redemption: routing allocation is the right solution
So, should we throw away the expert persona from now on?
Of course not.
As mentioned above, the researchers also found that the expert persona still has irreplaceable value in specific scenarios such as "generative tasks" that rely more on alignment ability.
Therefore, the core issue is not "whether to use it", but "when to use it".
To solve this pain point, the researchers invented the PRISM algorithm (Persona Routing via Intent - based Self - Modeling, persona routing based on intent - based self - modeling).
This system does not assign a fixed role to AI. Instead, it first understands the user's real intention and then dynamically routes and assigns the correct persona.
The figure shows two methods for automatically selecting expert roles. PRISM dynamically allocates appropriate personas through a LoRA adapter, retaining the benefits of alignment and maintaining the accuracy of discriminative tasks without external resources.
The core idea of PRISM is very ingenious:
Instead of rigidly applying an expert prompt to the model during inference, it "condenses and distills" all the beneficial parts of the expert personas into a lightweight gated LoRA adapter in advance.
When actually facing the user's question, the gating mechanism of PRISM only needs to make a very simple binary choice:
Activate the "expert cheat", or return to the "plain mode".
When the user asks "Help me write code" or "Provide high - EQ comfort", the system determines that alignment ability is needed, and the gating mechanism instantly activates the LoRA adapter to bring out the internalized expert level;
When the user asks "Objective mathematical calculation" or "Fact - checking", the system determines that the persona will cause interference, and the gating mechanism immediately closes the adapter, allowing the unadorned base model to answer accurately with the purest pre - trained memory.
The entire PRISM extraction process does not require additional data, additional models, or additional computing power.
The cost is not high. Training a single - gated LoRA version takes about 45 minutes on an A100, and the additional overhead is relatively small.
Specifically, the PRISM training process is divided into five major stages:
(1) Generate queries based on persona prompts;
(2) Generate responses under various personas according to the persona;
(3) Conduct self - verification through pairwise comparison to screen and distill the dataset;
(4) Train the router/gating module to learn the intent - based routing mechanism to determine when it is more helpful to activate the persona;
(5) Conduct self - distillation through LoRA to allow the model to internalize these persona behaviors.
What PRISM wants to do is not to make AI "better at acting", but to "act when it should and be accurate when it should".
The results are amazing:
While keeping the computing power overhead extremely low, the large model can finally achieve a smooth switch between "high - EQ generation" and "hardcore knowledge retrieval".
PRISM not only significantly improves the scores of human preference and security alignment in generative tasks, but also perfectly maintains the objective accuracy of discriminative tasks.
Comprehensive evaluation of five models such as Qwen and three benchmark dimensions such as MT - Bench
On Qwen2.5 - 7B, when simply using the expert prompt, the overall score is 72.2, which is similar to the baseline of 71.8, indicating that "there are gains and losses, which basically offset each other".
However, PRISM can raise the overall score to 73.5, increase the MT - Bench score from 7.56 to 7.76, and maintain the MMLU at 71.7%, basically not affecting the knowledge accuracy.
The effect is even more obvious on Mistral - 7B:
The expert prompt reduces the overall performance from 79.9 to 71.4, but PRISM can achieve 81.5, even higher than the baseline. On Llama - 3.1 - 8B, PRISM also raises the Overall score from 67.5 to 70.3.
This means that in the next stage of prompt engineering, it may no longer be about "writing a longer and more intimidating expert persona prompt", but about "clearly dividing the tasks and then deciding whether to activate the personalized alignment".
At this time, PRISM is like a smart intermediary. It first understands the essence of the problem and then sends the right person to handle it.
The large model's performance at this time is both professional and honest, and it will no longer exchange wrong answers for good reviews.
Take action now
So, don't start with "You are an expert". Try to use dynamic routing like PRISM.
Let AI choose the right role according to the real needs of the problem, instead of always wearing the same mask.
Figure 4: The relationship between the proportion of queries routed to LoRA by the gating network and the performance of each category under the influence of the expert persona on the Qwen2.5 - 7B - Instruct model
If you are a developer, start paying attention to the underlying intent routing mechanism like PRISM, and let the model learn to "act when it should and be accurate when it should" at the weight level.
If