Can AI hallucinations be solved?
Recently, the startup Axiom has shaken the entire Silicon Valley AI circle and caught the attention of the tech world. Led by 25-year-old Hong Letong from Guangdong, China, this company, which has been established for less than two years and has a team of only over 20 people, has carved out a path in the niche field of using mathematics to solve AI hallucinations. It has secured $200 million in Series A financing and has become a unicorn valued at $1.6 billion.
This is not only an inspiring startup myth but also touches on the most painful nerve in the current AI industry. While everyone is rushing forward on the track of AI large model applications, the "AI hallucination," which lurks in the dark and can easily lead to catastrophic consequences, is often selectively ignored. Facing this hidden danger that could potentially shake the foundation of trust in AI at any time, Tan Yinliang, a professor of Decision Science and Management Information Systems at CEIBS, has conducted an in - depth analysis of the generation mechanism of AI hallucinations and practical solutions to break the deadlock.
While the entire AI industry is fanatically pursuing larger parameter models and more realistic generation capabilities, Hong Letong and her startup Axiom have turned to do the most niche and difficult thing in the AI circle - solving AI hallucinations.
The method adopted by Axiom is called "formal verification." Simply put, it means transforming the vague and probabilistic inference process of AI into a deterministic process where each step can be checked, proven, and held accountable using mathematics and logic.
However, to truly understand why Axiom is so popular, one needs to first establish a perception: AI hallucination is not just a simple "flaw" but a "feature."
Correctly Understanding AI Hallucinations
When many people find that large language models are "talking nonsense seriously," their first reaction is that the AI is no good, the algorithm is wrong, or the training data is insufficient. In fact, AI hallucination is an inevitable product of the current working mechanism of AI models. It can even be said to be the source of their "creativity."
Let's first take a look at humans themselves. There is a very classic optical illusion picture, the checker - shadow illusion: In the shadow cast by a green cylinder, squares A and B are marked on the checkerboard. The question is, which of the background colors of A and B is darker?
Most people, at first glance, will firmly believe that A in the bright area is dark gray, while B in the shadow is light - colored. However, if you use the color picker on your computer to extract their physical pixels, you will be shocked to find that A and B are exactly the same shade of gray.
Under the interference of the green cylinder's shadow and the surrounding black - and - white checkerboard, our brains deceive us and "stubbornly" process the information according to our life experience. No matter how close you look, this optical illusion still exists.
Why do humans have such "hallucinations"? In fact, the human brain itself is an extremely advanced "prediction machine." When processing visual information, the brain does not record pixels rigidly like a camera. Instead, it automatically "fills in the blanks" and "constructs" the most logical reality based on the surrounding light and shadow, context, and historical experience.
In 99% of cases, this "filling in the blanks" of the brain helps us quickly understand the three - dimensional world. But in that 1% of special graphics, the brain makes a wrong guess, and thus a visual illusion is formed.
The working principle of AI models is surprisingly similar to this prediction mechanism of the human brain. Its essence is a "probabilistic black box." AI learns the statistical laws of human language from a vast amount of data, and its core task is always to predict the next most likely token based on the context.
AI generates text "smoothly." When the problems it encounters are strongly related to the training data, it can output amazing correct answers. However, when it encounters blind spots, its mechanism still drives it to "smoothly" produce a sentence that sounds the most reasonable according to probability, which leads to AI "confidently" fabricating information.
Therefore, AI hallucinations, just like humans misjudging the colors of A and B in the shadow, are a natural and fundamental operating mechanism.
Many top AI researchers have similar views on this. Andrej Karpathy, the former co - founder of OpenAI, put forward a very brilliant view: We should not regard large language models as some absolutely rigorous search engines or knowledge bases. Instead, we should see them as "dream - making machines."
In his view, large models are essentially "dreaming." When we input prompts, we are actually guiding the direction of its dreams. The so - called "hallucination" is not an accidental error in the normal operation of the model but its default operating state.
When the "dreams" of the large model happen to match reality, we call it a "correct answer." When its "dreams" deviate from the factual basis and just slide along the inertia of language probability, we call it a "hallucination."
It is precisely because of this "filling in the blanks" ability based on probability that AI can write poems in the style of Shakespeare and help us conceive the plots of science - fiction novels. If we completely eliminate hallucinations and require it to output 100% based on existing facts, then AI will also lose its creativity, generalization ability, and "human - like spirituality" at the same time and degenerate into a rigid traditional database.
The Necessity of Solving AI Hallucinations
Since hallucinations are a feature of AI and the source of creativity, why is the entire industry so eager to solve them?
If AI is just used for chatting, writing articles, or making emojis, it doesn't matter if there are hallucinations. However, when AI is used for financial quantitative trading, autonomous driving assistance, medical judgment, or the design of high - precision semiconductor chips, the tolerance for errors is almost zero.
The most dangerous thing about AI hallucinations is that they can package errors into seemingly well - developed conclusions using fluent, complete, and even authoritative language.
For experts, errors may still be identifiable. However, for users who have not developed professional judgment, especially students, new practitioners in the industry, or those using AI across different fields, these "seemingly correct" answers are the most likely to be directly accepted.
This creates a new cognitive risk. In the past, when people looked up information, they could at least see the source, version, and context. Now, with the answers re - organized by AI, it is unclear which parts come from real data and which are just "fabricated" by the model following language probability.
Students may take wrong explanations as knowledge frameworks, young practitioners may bring wrong suggestions into their work processes, and ordinary users may make wrong decisions on high - risk issues such as law, health, and finance. These are the most worrying hidden dangers and harms of "AI hallucinations."
This is exactly an unavoidable threshold for large models to move towards high - level applications. A black - box system that can only give answers but cannot prove its process is difficult to be truly trusted.
Exploration of Different Paths to Solve AI Hallucinations
It is true that this does not mean that there are no other routes in the industry apart from Axiom's formal verification. However, at present, different routes address different levels of problems.
The first type is "making AI traceable." The most typical example is RAG, which is retrieval - augmented generation. Before the model answers, it first retrieves the enterprise knowledge base, papers, legal provisions, or product documents, and then generates answers based on the retrieval results. However, the retrieved data may be incomplete, the model may misinterpret the data, and the citations may not truly match the conclusions.
Therefore, RAG is more like connecting an external memory to AI rather than fundamentally changing the model's reasoning mechanism.
The second type is "teaching AI not to answer randomly." For example, through training and evaluation mechanisms, the model is punished for giving wrong answers and rewarded for saying "I don't know" when it is uncertain, asking for more information, or actively providing confidence levels.
However, this method only improves honesty and calibration ability and still cannot guarantee that every conclusion is strictly proven.
The third type is "letting AI check each other." For example, self - reflection, multi - round verification, cross - review of multiple models, Verifier models, Critic models, and tool - calling checks.
However, this is essentially using one probabilistic model to check another probabilistic model, which can only reduce the error rate at most.
The fourth type is "connecting AI to external tools and structured systems." For example, calculators, databases, knowledge graphs, code interpreters, simulation platforms, compilers, search engines, and rule engines. Digital calculations are handed over to calculators, and code execution is handed over to compilers, etc.
This is a very practical engineering route and is also a more common direction for the implementation of many enterprise - level large models in China at present.
In summary, in the face of AI hallucinations, these traditional methods are more like "patching up." The underlying probabilistic black box still exists, and they only treat the symptoms rather than the root cause. So far, there is no general solution that can achieve the certainty of "formal verification" in all open scenarios.
Why Formal Verification?
Axiom's approach is completely different. They have brought out the most niche but also the most hardcore weapon in computer science - formal verification, which requires writing each step of reasoning into a rigorous logical chain that can be checked by machines.
This means that it is not simply about making AI make fewer mistakes but trying to make AI turn "sounding right" into "being verifiably right" on key issues.
Axiom has introduced a magical self - verifiable mathematical programming language - Lean language.
The ingenious part of this system is that it does not try to deprive the large model of its "intuition" and "creativity." Instead, it equips this "dream - making machine" with an impartial "logical referee."
In Axiom's system, the AI model is still responsible for leveraging its strengths: making conjectures, searching, and generating intuition and inspiration. Each step of the resulting reasoning must be written in the formal mathematical language of Lean and submitted to the background system for "checking."
As long as there is a logical leap or negligence in one step, the verification system of Lean language will not pass it and will directly send it back for the large model to re - derive. As long as the conclusion is legally derived step by step from the premise, the answer has absolute and falsifiable certainty.
It is worth noting that this route is not only being explored in Silicon Valley. In recent years, China has also started to promote "verifiable AI" as an important direction for the reliability of large models. Some teams have carried out research and competition practices on the automatic conversion from natural - language mathematical problems to formal proof codes.
In early April 2026, the AI4Math team at Peking University proposed a framework consisting of two agents: one called Rethlas, which is responsible for finding proof ideas in natural language like a mathematician; the other called Archon, which is responsible for converting these ideas into Lean 4 formal proof codes and submitting them to the machine - proof system for step - by - step checking.
This model also takes into account two core requirements: making AI "sound more like the correct answer" and ensuring that AI's mathematical reasoning must pass the logical test of the formal system.
However, this field is still in its early stage. The threshold for formal verification is very high, requiring knowledge of mathematics, logic, programming language theory, and a large - scale formal knowledge base. In the short term, it is unlikely to replace all methods for managing AI hallucinations. Instead, it is more likely to make breakthroughs in high - value and high - risk fields such as mathematics, code, scientific research, chips, and security protocols first.
For more extensive daily applications, the more realistic approach at present is still to combine RAG, knowledge graphs, domain data, manual review, and tool calls to reduce the hallucination rate to a controllable range.
Hallucinations come from the underlying mechanism of large models. It is precisely this probabilistic generation ability that makes it possible for AI to have associations, migrations, and creations. At the same time, we must also see that when AI changes from a "talking tool" to a "decision - making system," hallucinations may become social, educational, and industrial risks.
This is exactly the significance of Axiom. Solving AI hallucinations means making AI undergo more rigorous verification than human intuition on key issues. Accepting AI's creativity while using mathematics, logic, tools, and systems to constrain its behavior may be the only way for AI to truly become trustworthy, usable, and reliable.
Professor Introduction
Dr. Tan Yinliang is a professor of Decision Science and Management Information Systems at CEIBS. He was a visiting scholar at the Stanford Artificial Intelligence Research Institute and the Digital Economy Laboratory. Before joining CEIBS, he was a tenured professor, Bauer Chair Professor, Director of the DBA program, and Chair of the Supply Chain Management Department at the C.T. Bauer College of Business at the University of Houston. He also served as the Deputy Director of the Artificial Intelligence Research Center at the University of Houston and a senior researcher at the Hewlett - Packard Enterprise (HPE) Data Science Research Center. Prior to that, he was an assistant professor, associate professor, and the Executive Director of the Goldring Center for International Education at the Freeman School of Business at Tulane University and was granted tenure and a chair professorship.