Why does AI "lose its voice"?
Recently, fans of TNT Boys posted a set of screenshots on social media platforms. They asked MiniMax, "Who is the leader of TNT Boys?" The model hesitated and came up with completely wrong answers like "Ma Jiaxuan" and "Ma Siqi." However, when they rephrased the question to "What experiences does the leader of TNT Boys have?", the model responded fluently. It seems that the AI knows who Ma Jiaqi is, but just can't say his name.
Image source: Xiaohongshu
All along, we are familiar with two types of AI glitches.
One is called hallucination. The model seriously spouts nonsense, fabricating non - existent papers and non - existent names in great detail. 1
The other is called sycophancy. It always agrees with whatever you say and follows your preferences. Especially when facing questions with a stance, it sacrifices accuracy to please. 2
The Ma Jiaqi incident reveals a new glitch: the model knows the answer, can describe it indirectly, and can prove that it knows from a dozen angles, but just can't directly say the answer itself.
This phenomenon actually has an established term, called under - trained tokens. Researchers scanned the vocabularies of a batch of mainstream open - source models such as GPT - 2, Llama, and Mistral. The so - called vocabulary is a fixed list pre - compiled by the model before reading words, listing all possible character combinations. Each item is called a token, and every word the model says must be selected from this list. What the researchers wanted to see was whether there were some tokens in this list that the model actually "didn't really learn." Finally, through scanning, it was found that such "not really learned" tokens are widespread, with thousands in each model's vocabulary. 3
People familiar with this field may have heard of an earlier version: SolidGoldMagikarp. In early 2023, a group of users in the LessWrong community accidentally found that whenever GPT - 3 encountered this string of characters, it would start outputting garbled characters, talking to itself, or even insulting users. At that time, everyone shared it as a wonder. This is the prequel to the Ma Jiaqi incident.
Tokenizer: The "Granularity" of AI's View of the World
To understand why the model can't say a certain name, we first need to see how it reads words.
Large models process text not by characters but by tokens. Before a piece of Chinese text enters the model, it will first be cut into several tokens by a component called a tokenizer. The model only performs calculations on tokens and then reassembles the tokens into text.
The basis for segmentation is the frequency of occurrence in the pre - trained corpus: high - frequency combinations are merged into one token, and low - frequency characters are split. The algorithmic basis for this cutting method is BPE (Byte Pair Encoding) proposed in 2016. 4 Essentially, it is a data - driven merging process, that is, based on the co - occurrence statistics of characters in the corpus, it decides which combinations should be treated as "one part."
In MiniMax's vocabulary, "Ma Jiaqi" is cut into two tokens: "Ma" and "Jiaqi." The two characters "Jiaqi" appear frequently enough as an idol's name and are merged into an independent token by the tokenizer, while "Ma" is a single token.
Whether a word is processed as a whole or as parts is completely different for the model.
Incidentally, the granularity of tokenizers actually varies greatly between different languages. A study compared the number of tokens that a text with the same meaning is cut into in different languages, and the difference can be up to 15 times. 5 That is to say, a Chinese news article may be cut into a hundred tokens when entering the model, but after being translated into Burmese or Amharic, it has to be cut into more than a thousand tokens.
This sounds abstract, but it has several practical impacts.
One is the difference in cost. Most large - model APIs are billed by the number of tokens. The cost of writing the same article in one language may be more than ten times that in another language.
The second is the difference in context length. The model has an upper limit for the context window. A ten - fold increase in the number of tokens means that the amount of content that can be accommodated in the same window is reduced by an order of magnitude.
The third is the difference in understanding quality. The more fragmented the cutting, the more likely the model is to disperse the semantics of a word across multiple tokens, making it difficult to process.
Languages with scarce resources are at a disadvantage in every aspect. Although this is another type of systematic bias, it shares the same underlying structure as the aphasia mechanism in the Ma Jiaqi incident: the tokenizer determines the starting point of everything.
"Jiaqi" is an independent token. Next, we'll see the fate of this token in the model's "brain."
Well - Learned in Pre - training, Pushed Out in Post - training
The training of large models is divided into two stages.
Pre - training uses a massive amount of Internet text, on the order of trillions of tokens. At this stage, the model learns basic language skills and world knowledge. It has seen Wikipedia, news, forums, and fan fictions. The three characters "Ma Jiaqi" have probably appeared hundreds of thousands of times in the corpus.
Post - training uses carefully selected dialogue data, with the quantity dropping sharply to millions to tens of millions. This stage teaches the model how to chat, how to follow instructions, and how not to use profanity. This paradigm was established by the OpenAI team in the 2022 InstructGPT paper. 6 Supervised fine - tuning (SFT) plus reinforcement learning based on human feedback (RLHF) has since become the industry standard.
MiniMax's engineers investigated and found 7 that the token "Jiaqi" was seen during the pre - training stage, and its vector distribution was normal. That is to say, the model recognized Ma Jiaqi at the end of pre - training.
The problem lies in post - training. In the carefully selected SFT dialogue data, there are less than 5 samples containing "Jiaqi." Throughout the post - training stage, this token was hardly practiced.
Here, the second key concept appears - catastrophic forgetting.
This term can be traced back to a foundational paper published in "PNAS" in 2017. 8 Neural networks lose the ability of the original task when learning new tasks because the parameters are repeatedly rewritten by new data. This phenomenon has been re - examined seriously in the era of large models: an empirical study specifically targeting the continuous fine - tuning stage points out that catastrophic forgetting is widespread in large models and will worsen as the model size increases. 9
What exactly happens in the vector space?
During the post - training stage, high - frequency tokens such as tool - call markers, code symbols, daily dialogue words, and safety - related rejection templates appear repeatedly. The vector parameters of these tokens are continuously updated, squeezing the positions of low - frequency tokens in the high - dimensional space like plate tectonics.
The vector of "Jiaqi" is pushed out of the originally correct generation probability area. When the model wants to output "Jiaqi," it either can't find this token, or its probability is overwhelmed by "Jiaqi" or "Qiqi" with similar pronunciations, or "Jiaxuan" or "Siqi" with similar glyphs. Thus, a series of answers that make people laugh and cry are produced.
In the academic community, there is a corresponding concept called alignment tax, which means that the model will lose a part of its pre - training ability during the alignment process, and the accuracy, knowledge breadth, and generation diversity will all be compromised to different degrees. How to reduce this part of the "tax" has become a research direction. 10
So, the AI doesn't not recognize Ma Jiaqi. It just forgets how to say the two characters "Jiaqi" in the process of being taught how to speak.
Understand AI's "Aphasia" through Human "Hesitation"
By now, the mechanism of AI aphasia is clear: the semantic pathway is intact, but the surface - generation pathway is broken. The model has an internal representation of Ma Jiaqi, but this representation can't reach the output end.
This kind of "having internally but not externally" fault pattern already has a mature research paradigm in cognitive science that can be borrowed - the tip - of - the - tongue (TOT) phenomenon.
In 1966, Brown and McNeill turned the tip - of - the - tongue phenomenon into a repeatable experimental paradigm. 11 They read some dictionary definitions to the subjects and asked them to report which word they thought of. When the subjects got stuck on the word "sextant," they would blurt out "secant" and "sexton." They could accurately report the first letter, the number of syllables, and the number of "s" in the word, but just couldn't say "sextant" itself.
Looking back at AI aphasia through the framework of the "tip - of - the - tongue" phenomenon, we can clarify three originally vague points.
Aphasia Doesn't Equal Amnesia
When humans get stuck on "sextant," they still know the existence, use, and approximate pronunciation of the word. When MiniMax gets stuck on "Jiaqi," it can still describe his identity, debut time, variety shows, and representative works.
This is a counter - intuitive conclusion: the model doesn't forget in this kind of fault; it just can't retrieve the information. When evaluating the model, "whether it can output" and "whether it knows" should be measured separately.
The engineering implication is straightforward. The common evaluation method based on "right or wrong output" is to give a question and see if the model answers correctly. The model may make repeated mistakes at the output end, but actually knows very well in its internal representation. This kind of fault requires a special probe - style evaluation to see the internal activation of the model and whether the relevant representation is complete.
Frequency and Connection Strength Are More Critical
The Transmission Deficit Hypothesis proposed in 1991 12 is used to explain why the elderly are more likely to experience the tip - of - the - tongue phenomenon than the young. The key of this theory is not that the word hasn't been learned, but that the word hasn't been used recently, and the connection strength between nodes has decayed.
Applying this framework to AI aphasia, it almost corresponds exactly:
"Jiaqi" was seen during the pre - training stage, and its vector distribution was normal. It was squeezed by high - frequency tokens during the post - training stage, and its connection strength was relatively weakened. This is the same process as that of words heard but not commonly used in the elderly's minds.
The engineering implication is also straightforward. The solution doesn't lie in adding more corpus, after all, it has been seen hundreds of thousands of times in pre - training. The repair path chosen by MiniMax is to arrange a minimum training opportunity for each token in the vocabulary, which is exactly the idea of protecting the connection strength of low - frequency tokens.
Alternative Outputs Are Diagnostic Signals
The approximate words that humans blurt out in the TOT state are not random. "Secant" and "Sexton" pop up because they share the first syllable and the word - form structure with "sextant." This is a by - product of spreading activation, that is, the neighboring nodes of the target word are partially activated, but the target word itself is not activated enough.
When the AI is aphasic and blurts out "Jiaxuan" and "Jiaqi," the mechanism is isomorphic. These are the wrongly activated neighbors in the vector space near "Jiaqi": either they are close in pinyin to "Jiaqi," or they share a character, or they frequently co - appear in the naming habits of Chinese idol names.
In terms of engineering implications, observing the form of the model's wrong outputs can better locate the fault level than simply counting the error rate. How it goes wrong determines whether the problem is at the tokenizer layer, the representation layer, or the decoding layer. Summarizing all errors into an accuracy number is like throwing the diagnostic signals into the trash.
The human tip - of - the - tongue phenomenon provides a set of ready - made diagnostic terms polished by half a century of research, such as semantic nodes, transmission deficit, and spreading activation, and these terms are just suitable for this new object of AI aphasia.
A New Entry in the Genealogy of AI Errors
Looking at "aphasia" in the overall picture of AI error research, we can find that it is not an isolated wonder, but a new position on a map that has taken initial shape.
The following table lists several main types of AI errors that have been located, named, and systematically studied by the academic community.
Putting these errors side by side, we can see similar situations and structures.
First, these errors cannot be captured by a single accuracy indicator. Each type requires a special evaluation set. XSTest is specifically designed to measure excessive rejection. Lost in the Middle uses the needle - in - a - haystack method to measure middle - section loss. Magikarp automatically finds under - trained tokens through the statistical characteristics of the token embedding space, and a single question - answer accuracy rate cannot detect these faults.
Second, each type of error corresponds to a specific link in the training process. The root of hallucination lies in the deviation between pre - training knowledge and the generation mechanism. The roots of sycophancy and excessive rejection lie in preference modeling or safety alignment in the RLHF stage. The root of the reversal curse lies in the directionality of autoregressive training.
Third, errors with the same or similar coordinates will form a kinship with each other. For example, aphasia and the reversal curse both belong to the category of "knowing the knowledge but unable to retrieve it." The former is stuck in the sparsity of token exposure, and the latter is stuck in the directionality of representation. The mechanisms are different, but the symptoms are adjacent.
Finally, a note on the research process: almost every grid in this map was first discovered by users and the community, and then located, named, and quantified by researchers. The reversal curse was first discovered by the community when GPT - 4 couldn't answer "Who is the son of Tom Cruise's mother" - it knew who the mother was, but was confused in the reverse. Excessive rejection was first complained about by users when ChatGPT refused to answer how to "kill a Python process" because it misinterpreted "kill process" as real killing. To some extent, aphasia was discovered by fans of idol groups.
AI error research has the nature of "discovering while using." Researchers usually don't predict how the model will malfunction. Instead, after users report malfunctions, they go back to locate the mechanism, name the phenomenon, and make repeatable evaluations.
The Ma Jiaqi incident adds a clearly marked position to this map. The map is not closed, and new errors will be discovered, named, and added in the future.
This problem could be located because of the high - density questions from the fan group. An idol's name was repeatedly tested, and the problem was exposed prominently. If it were an ordinary word, people would just let it go after the model made a few mistakes, and no one would go back to ask why.
MiniMax's repair plan is also straightforward: in the post - training stage, arrange a minimum training opportunity for each token in the vocabulary to prevent the connection strength of low - frequency tokens from being continuously squeezed by high - frequency tokens.
The technical report is very calm. In the new version, the two characters "Jiaqi" can already be said.
The health of the AI system is quietly defined by the intensity of use. In areas with high - frequency and high - intensity applications, faults will be quickly detected. In areas used by low - frequency or marginal groups, faults are silently accumulating.
After fixing the issue of "Jiaqi," the next squeezed token is still somewhere in the vocabulary.
Footnotes:
[1]Ji, Z., et al. (2022). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, arXiv:22