Building Large Language Models (LLMs): The Essential Knowledge Graph Foundation for Every AI Project

Lawyers' false belief in AI-fabricated cases reveals the need for large language models (LLMs) to be enhanced with knowledge graphs for credibility.

"Mr. Schwartz, I've reviewed your objection," said Federal Judge Kevin Castel, his tone steady yet emphatic. "You cited six cases to support your client's position. I'd like to discuss the case of Varghese v. China Southern Airlines."

Steven Schwartz, a lawyer with decades of experience, sat up straight in his chair. "Yes, Your Honor. This is a 2019 decision from the Eleventh Circuit Court, which directly supports—"

"I can't find it," the judge interrupted. "The citation number you provided—925 F.3d 1339—doesn't appear in any database my clerk has checked. Can you provide the court with a complete copy of the judgment?"

Schwartz felt a twinge of worry. "Of course, Your Honor. I'll submit it immediately." Back in his office, Schwartz contacted his source of information again. He typed into ChatGPT: "Is the case of Varghese v. China Southern Airlines, 925 F.3d 1339 (Eleventh Circuit Court of Appeals, 2019) a real case?" The other side replied confidently: "Yes, the case of Varghese v. China Southern Airlines, 925 F.3d 1339 is a real case. You can find it in authoritative legal databases such as LexisNexis and Westlaw."

After feeling relieved, Schwartz asked ChatGPT for more details about the case. The AI cooperated and generated some content that seemed to be excerpts from the judgment, including convincing legal arguments and correctly formatted citations.

He submitted these materials to the court.

Three weeks later

Judge Castel's order was stern: "The court is facing an unprecedented situation. The six cases submitted appear to be fabricated judicial decisions, containing forged citations and internal references."

These six cases were completely fabricated. No court had ever heard them. They simply didn't exist.

In a subsequent affidavit, Schwartz admitted that he "had never used ChatGPT for legal research before and therefore didn't know that its content might be inaccurate." He told the court that he thought ChatGPT "was like a super search engine"—a seemingly reasonable but fundamentally wrong assumption that is now misunderstood by millions of professionals using legal tools in various industries.

What went wrong?

The Schwartz case reveals a fundamental misunderstanding of the capabilities of Large Language Models (LLMs). There is a world of difference between asking ChatGPT "What is the Taj Mahal?" and asking it "What legal precedents support my client's position in an aviation accident case?"

The first type of query requires general knowledge—information that is widely available and relatively stable. The second type of query requires consulting a specific, authoritative, and ever - evolving legal precedent database, which has been accumulated over centuries of jurisprudence, where accuracy is crucial and every citation must be verifiable.

We know that Large Language Models (LLMs) can generate hallucinations. This is not new, and a lot of effort has been put into alleviating this problem. Technologies such as Reinforcement Learning from Human Feedback (RLHF), improved training data management, and confidence scoring have all played a role. But context is crucial. LLMs may perform well when asked general topics, but they may fail miserably when dealing with specific - domain queries that require authoritative sources.

The Retrieval - Augmented Generation (RAG) method—dividing documents into chunks and retrieving relevant paragraphs on demand—can partially solve this problem. When you need to process text content and get specific answers based on it, RAG works quite well. However, when your knowledge base is the result of years of practice—such as legal precedents, medical norms, financial regulations, and engineering standards—simple chunk - based retrieval cannot provide the required accuracy and context understanding. You not only need to know the content of the case but also its relationship with other cases, scope of application, jurisdiction, and whether subsequent judgments have changed its validity.

However, hallucinations and retrieval limitations only represent one aspect of the problem. The architectural challenges go far beyond this:

• Their knowledge is opaque: Information is stored in the form of billions of parameters, which cannot be inspected or explained. You can't audit what the model "knows" or verify its sources of information.

• They can't be easily updated: Incorporating new information—new legal precedents, updated regulations, or revised medical guidelines—requires expensive retraining or complex fine - tuning.

• They lack domain foundation: General LLMs lack expert knowledge, business rules, and regulatory requirements, which determine whether their results are truly useful in a professional environment.

• They don't provide an audit trail: You can't trace how they reach their conclusions, which makes them unsuitable for environments that require accountability.

These are not insignificant technical issues but architectural issues that determine the success or failure of AI projects. According to Gartner's prediction, by 2027, more than 40% of intelligent agent AI projects will be cancelled due to the mismatch between domain knowledge and return on investment. The reason is simple: enterprises have deployed powerful Large Language Model (LLM) technology but lack the knowledge infrastructure needed to make it trustworthy.

The Schwartz case clearly shows that unless Large Language Models (LLMs) can access real, consistent, and verifiable data, they cannot be reliable Q&A tools for critical applications on their own. And there is no shortcut. Simply submitting more documents to the LLM through the RAG system or hoping that better prompts can make up for the deficiencies ignores the root of the problem.

Knowledge must be organized in a way that is easy to manage, always up - to - date, well - maintained, and—crucially—structured to support the type of reasoning required by your application. The real question is not whether the LLM is powerful enough, but what structure the knowledge should have and how we can create processes around it to correctly build, maintain, and access knowledge.

This is where knowledge graphs come in.

What is a knowledge graph?

A knowledge graph is not just a database. A knowledge graph is an evolving graph data structure composed of a set of typed entities, their attributes, and meaningful named relationships. Knowledge graphs are built for specific domains, integrating structured and unstructured data to create knowledge for humans and machines.

Therefore, the knowledge system is built on four pillars:

1. Evolution: Continuously updated information that can seamlessly integrate new data without structural adjustments.

2. Semantics: Representing meaningful data through typed entities and explicit relationships to capture domain knowledge.

3. Integration: Being able to flexibly coordinate structured and unstructured data sources from multiple sources.

4. Learning: Supporting humans and machines to conduct queries, visualizations, and reasoning.

Crucially, the knowledge in a knowledge graph is auditable and interpretable—users can accurately trace the source of information and verify it against authoritative sources.

Intelligent Advisor Systems and Autonomous Systems

Before discussing how to combine these technologies, we need to understand a key difference in the deployment methods of intelligent systems.

Not all intelligent systems are created equal. Intelligent Autonomous Systems can operate independently, making decisions and performing actions on behalf of users with minimal human intervention—for example, self - driving cars that must operate in real - time without human intervention.

In contrast, Intelligent Advisor Systems (IAS) are designed to assist rather than replace human judgment. The role of Intelligent Advisor Systems is to provide information and advice. Their main functions include decision support, situation awareness, and user interaction. These systems are designed to facilitate user interaction, allowing users to explore various options, ask questions, and get detailed explanations to assist their decision - making.

a) Intelligent Autonomous Systems. b) Intelligent Advisor Systems.

For critical applications such as legal research, medical diagnosis, financial analysis, and compliance monitoring, advisor systems that can enhance rather than replace human expertise are not only a better choice but also crucial. The system architecture must fulfill the gate - keeping responsibility rather than bypass it.

Hybrid approach: LLM + KG

When we combine the knowledge system and learning logic, we create a system whose whole is greater than the sum of its parts:

1. KG provides the foundation:

• Structured, verified knowledge that can serve as a factual basis.

• Explicit representation of domain rules and constraints.

• Audit trails record how conclusions are reached.

• Dynamic updates can be made without retraining the model.

2. LLM provides the interface:

• Natural language query processing.

• Automatic extraction of entities from unstructured data to build a knowledge graph.

• Translating complex graph queries into easy - to - understand language.

• Summarizing results into easy - to - understand reports.

Think about how Schwartz's disaster could have been avoided with such a hybrid system. The hybrid system can:

1. Use the LLM to process natural language queries.

2. Query the knowledge base for verified information with real citations and sources.

3. Present the results with background information: "Twelve verified cases were found from authoritative databases, with citations."

4. Provide verification links to the actual sources.

5. Mark uncertainties: "No cases exactly matching this pattern were found. Please consider the following alternatives."

Most importantly: When asked "Does this case really exist?", the system would answer: "This case citation cannot be verified in the authoritative database. Status: unverified."

Research by industry - leading companies consistently shows that hybrid systems can solve the core challenges that lead to the failure of AI projects:

• Hallucinations can be mitigated by basing LLM responses on facts organized by a verifiable knowledge graph.

• Knowledge is always up - to - date through dynamic knowledge base updates. LLMs can access the latest information through an evolving knowledge base without retraining.

• Interpretability is achieved through transparent information paths.

• Since the knowledge graph encodes the expert knowledge, regulations, and relationships that general LLMs lack, the accuracy in specific domains is improved.

Building trustworthy AI systems

The judge in the Schwartz case pointed out that "technological progress is common, and there is nothing wrong with using reliable AI tools for assistance in itself," but emphasized that "the current rules impose a gate - keeping role on lawyers to ensure the accuracy of the documents they submit."

This principle is universally applicable: Every professional deploying AI has a gate - keeping responsibility. The question is whether your AI system architecture can support this responsibility or undermine it.

The future of AI in critical application areas (covering all industries) depends on building intelligent advisor systems that combine the structured knowledge and interpretability of knowledge graphs with the natural language understanding and pattern recognition capabilities of language learning models. It's not about choosing between technologies but recognizing that language learning models alone cannot build trustworthy AI. Knowledge graphs provide exactly this foundation.

If an organization deploys low - level intelligent technology without this foundation, the project will fail—not because the technology itself is not powerful enough, but because powerful technology without a foundation is unreliable. If used properly—complementing the advantages of various technologies and making up for each other's deficiencies—we can create systems that truly enhance human intelligence.

This article is from the WeChat official account "Data - Driven Intelligence" (ID: Data_0101), author: Xiaoxiao. It is published by 36Kr with authorization.