AI is rewriting the rules of data governance, yet you are still using methods from a decade ago
In the AI track in 2026, the extensive competition of "scaling up model parameters" in large models has long since passed. The core focus has shifted to the implementation capabilities of AI agents. AI that can solve real - world problems is the core direction that can truly create long - term business value.
At the end of last year, a data governance project of a large financial institution was completed. This project took two years and cost over 30 million. Finally, a data standard system and a quality monitoring platform covering the entire bank were delivered.
It sounds good. However, the actual situation is that three months after the project was accepted, the business department began to complain about inconsistent reports, inconsistent customer data, and frequent errors in regulatory submissions. Although the governance system was "established", the data quality did not really improve.
This is not an isolated case. According to industry research, more than 70% of enterprises' data governance projects fail to achieve the expected results. An average medium - sized enterprise data governance project requires more than ten people and takes over six months, but the results are often a mess. A lot of money is spent, but the data remains chaotic, and the required data is still hard to find.
Where lies the problem?
The answer is simple: If data governance is done in a purely "human - driven" way, the ceiling is too low. Relying on manual rule - writing, manual dictionary - checking, and endless meetings to unify the standards, you can never keep up with the growth of data volume and the speed of business changes.
But in 2026, a fundamental change is taking place. The intervention of AI is not just "patching" the old methods, but directly changing the underlying logic of data governance.
Let's discuss them one by one.
I. From "Human - Driven" to "Intelligence - Driven": The Generational Gap in Efficiency
The essence of traditional data governance is the "handicraft workshop" model.
You need to conduct a full - scale inventory of data assets. A data engineer opens Excel and labels each of the tens of thousands of tables in the database one by one: What is the purpose of this table? What do the fields mean? Which systems are they associated with? The whole process takes two to three months, and there is also a need to repeatedly communicate and confirm with the business department. The results may not even be correct because the business department itself is not sure whether some fields in the old systems are still in use.
This is just the metadata inventory. Moving forward, data standard setting, quality audit rule configuration, data lineage tracking, and master data encoding - each step is a labor - intensive operation. A statistics from Shulie Technology shows that in a typical data governance project, more than half of the cost is consumed in the data processing stage.
After the introduction of AI, the efficiency improvement is generational.
In June this year, IT Home released a selection and evaluation report of China's data governance platforms in 2026, which made a horizontal comparison of the Data Agent capabilities of six mainstream manufacturers. Among them, the AI - DG product of Baifendian Technology, through the architecture of "vertical large model + multi - agent collaboration", has increased the data integration efficiency by 80% compared with the traditional model and shortened the governance delivery cycle by an average of 70%.
How is it achieved? The core logic is to use large models to replace humans in "understanding" and "judging", and use Agents to "execute".
For example, in the traditional model, to standardize the data of a new business system, the process is as follows: The data engineer first looks at hundreds of fields, then searches through industry standard documents, and then matches each field with appropriate standards according to the enterprise's existing data dictionary - the whole process is similar to a person translating while referring to three books, which is slow, tiring, and error - prone.
The way AI - DG works is: The resource inventory Agent automatically scans the source system to generate a ledger, the standard design Agent automatically recommends data element definitions according to industry specifications, and the model planning Agent directly generates suggestions for the data warehouse hierarchical architecture - you only need to say in natural language "Govern this system according to the financial industry standards", and the whole process will run automatically.
This is not a gradual improvement in efficiency, but a complete reconstruction of the production mode. It's not like changing from manual calculation to using a calculator, but from using a calculator to using a spreadsheet - the entire operation paradigm has changed.
II. Data Agent: A "Super Employee" Appears in Data Governance
If using large models to assist in data governance was just the story in 2025, then the keyword in 2026 is: Data Agent.
In the "Data Agent Market Map" released by IDC in the first quarter of this year, a set of judgments was given: By 2028, 60% of China's top 500 enterprises will deploy enterprise - level Data Agents. By 2026, half of the enterprises will deploy data analysis Agents to automate daily tasks and accelerate strategic decision - making.
Data Agent is not simply an "AI assistant answering questions one by one". It is a closed - loop system of "perception - decision - execution - learning".
The perception layer collects database logs, API call records, and user operation behaviors in real - time, like installing a 24/7 nervous system for the data system; the decision - making layer uses large models combined with rule engines and private knowledge bases to judge whether the data is compliant, whether it needs to be repaired, and whether there are standard conflicts; the execution layer automatically triggers repair actions - sending alarms, blocking operations, and scheduling cleaning tasks; the learning layer continuously optimizes strategies from historical events, making the Agent smarter over time.
A data governance practitioner with ten - year experience on Zhihu shared a real implementation path: A bank first started with the single scenario of "intercepting the external distribution of sensitive data", launched it in three weeks, and blocked 12 illegal operations in the first month, reducing the compliance risk by 90%. Then it gradually expanded to data quality anomaly detection and automatic metadata change notification, and finally covered the full - link data governance.
Here is a key understanding: The value of Data Agent does not lie in "showing off skills", but in transforming the data governance team from "firefighters" to "strategic planners". In the past, you spent your time running SQL to find problems, manually repairing data, and writing emails to urge the business department to change the standards. Now the Agent does all these jobs, and what you really should do is to define governance strategies, design data architectures, and explore data value with the business department.
Yonyou also released a data governance multi - Agents collaboration platform this year. Its core logic is more inclined to "source governance": While financial vouchers are generated in the ERP, the governance Agent can automatically verify whether the data items meet the standards, fully integrating "pre - prevention - in - process control - post - traceability". This idea of pushing the governance ability back to the business source is essentially using AI to spend the "money" on governance in the most valuable way - preventing dirty data from being generated, rather than cleaning it up after it is generated.
A point worth remembering: The implementation of Data Agent should not pursue one - step success. Select a high - frequency, high - pain, and relatively well - defined scenario to start with, get the first batch of real business feedback, and then gradually expand. Data governance is never about "building it and it's done", but about "using it effectively".
III. From "Post - Mortem Accountability" to "Pre - Event Prevention": The Fundamental Transformation of the Governance Paradigm
Traditional data governance has an embarrassing setting: It is "reactive".
When there are data quality problems, reports are inconsistent, or regulatory authorities are coming for inspections, only then do people start to rush to "conduct governance". A lot of rules are established, a lot of documents are written, and a lot of monitoring is configured. After all the work is done, the data remains chaotic.
Why? Because human attention is limited. A data governance team can only monitor dozens of core indicators at most simultaneously, while in the data environment of a medium - sized enterprise, there may be hundreds or thousands of new data anomalies generated every day. You can't keep up.
AI has turned this situation around.
The principle is not complicated: AI can scan the data stream in real - time without rest, identify, give early warnings, and even automatically repair problems as soon as they occur. In the industrial circle, this is called an upgrade from "post - event inspection" to "real - time supervision".
Let's take a very specific scenario - data lineage management. People who have done data development know that if a field in an upstream table is changed, dozens of downstream reports and hundreds of ETL tasks may be affected. The traditional approach is either to rely on documents (which are often outdated) or to conduct manual investigations (which are extremely inefficient). As a result, often the downstream reports go wrong after people have left work, and the problems are only discovered the next morning.
The reasoning ability of large models can directly analyze the table connection relationships in SQL statements and automatically generate a full - link data lineage map. As soon as there is a change in the upstream, the system immediately analyzes the downstream impact and automatically sends a change notice. The upgraded data operation and maintenance Agent of Alibaba Cloud DataWorks this year has implemented this logic to "automatic diagnosis + online execution", integrating the dependency link, resource level, and historical operation trends to automatically generate a structured diagnostic report.
A government - related case from Yixinghuachen can well illustrate the problem: For the approval of a major investment project, in the past, it completely relied on manual document review, and the approval cycle could take three to six months. They used large models to conduct structured extraction of approval rules and document materials, built an approval knowledge base, and the system can automatically extract key information and generate summaries to assist approval personnel in making quick decisions. During the trial operation, the document review cycle was shortened to less than a week, and the overall review speed was more than doubled.
This is not "AI replacing humans", but "AI doing the things that humans should not do". It liberates humans from mechanical, repetitive, and large - scale reading tasks to do truly valuable work that requires judgment, decision - making, and creativity.
IV. AI for Data and Data for AI: They Are Inseparable
What has been mentioned above is all about "how AI helps improve the efficiency of data governance". But if we only talk about this half, we only see one side of the coin.
The other half is equally important: High - quality data governance, in turn, is the fundamental prerequisite for AI to deliver real value.
The old rule of "garbage in, garbage out" still holds true in the era of large models. No matter how large your model parameters are or how many training rounds you conduct, if the data fed to the model has inconsistent standards, missing core fields, and poor annotation quality, the output will definitely be unreliable. The depth of application of large models in vertical fields directly depends on the solidness of data governance.
This is what is often referred to in the industry as a "mutual promotion" - AI helps improve the efficiency of data governance, and high - quality data governance helps make AI more reliable. The two form a self - enhancing positive feedback loop.
Tencent Cloud WeData echoes this logic at the product level. Its Unity Semantics technology supports "defining indicator standards in one place and reusing it in multiple places". Whether it is humans looking at reports or AI conducting analysis, the same set of data and the same standards are used. This means that the analysis results produced by AI and the manual reports are based on the same "factual basis", and there will be no absurd situation like "AI says 1 million were sold, but the report says 800,000 were sold".
Looking further, the core goal of future data governance is changing: Data should not only be "understandable and traceable" to humans, but also a high - quality data supply system should be built for AI. Especially the governance of unstructured data - PDF documents, meeting recordings, email exchanges, operation logs - these things were basically "blind spots" in traditional data governance, but in the AI era, they are precisely the most critical fuel for the value release of large models.
A national - level key research institution carried out such a project: It used large models and RAG technology to conduct structured extraction and knowledge base construction of decades of accumulated unstructured text data, and then quickly developed several intelligent assistant Agents in professional fields on this basis. In the past, it was basically impossible to scale the governance of unstructured data. Now, with AI, it can not only be done, but also be very valuable.
V. Now Is the Best Time to Take Action
There are two sets of numbers in IDC's "2026 China Data Governance Market White Paper" that are worth paying attention to.
The first set: In 2025, the scale of China's data governance market exceeded 35 billion, with a year - on - year growth of nearly 29%. The second set: Among them, the proportion of AI - driven intelligent data governance solutions exceeded 50% for the first time.
Combining these two numbers, the meaning is clear: The data governance market itself is growing rapidly, and the main force driving the growth is AI - driven products. Traditional "handicraft workshop - style" data governance has entered the countdown to a rapid exit.
For enterprises that are currently conducting or planning to conduct data governance, here are three action suggestions.
First, stop fantasizing about achieving good results by simply increasing the number of human resources. The complexity of data governance is growing exponentially - the data volume is increasing, the data sources are becoming more diverse, and the business is changing faster. No matter how many people you invest, you can't keep up. You must embed AI tools into the core links of data governance, not as "icing on the cake", but as the "engine itself".
Second, start with a small cut - in point and run through the closed - loop first. Don't start with a "group - level full - link data governance platform". It is very likely that you will spend two years compiling a lot of documents, and then return to the starting point six months later. Select the most painful scenario - sensitive data compliance, unification of core indicator standards, master data quality - and use Data Agent to solve it first. Get real efficiency improvement and business feedback, and then use this confidence to promote a larger - scale implementation.
Third, change the end - goal of data governance from "for human viewing" to "for AI use". If your data governance system is only for humans to check reports and conduct analysis, you have only unlocked half of its value. Use the well - governed data as high - quality training and reasoning materials for large models, so that AI can truly understand your business, and the value generated will be exponentially different.
Data governance is never a one - time project that is "done and over with", but a capability system that requires continuous operation. In the past, the operation cost of this system was too high, so most enterprises chose to "abandon it after completion". The introduction of AI has, for the first time, made "continuous governance" economically feasible.
This is the turning point that the data governance industry has been waiting for for twenty years. The window period for taking action is right now.
This article is from the WeChat official account "Data - Driven Intelligence" (ID: Data_0101), written by Xiaoxiao, and is published by 36Kr with authorization.