HomeArticle

Differences between China's data governance and the models of Western countries

王建峰2025-12-23 11:51
Differences between China's data governance and the models of Western countries

China is building unprecedented infrastructure to transform data into tradable "production factors" – strategic economic resources on par with land, labor, capital, and technology. This approach is fundamentally different from Western data governance: Instead of primarily viewing data as a privacy issue, China sees it as a national economic asset that requires a state-led market mechanism. For international AI practitioners, understanding this framework is crucial as it influences compliance requirements, technical architectures, and access to Chinese data for model training.

The Policy Logic of Data as National Infrastructure

China's data element ecosystem stems from a specific diagnosis: Massive data resources are scattered across various sectors such as government agencies, state-owned enterprises, and private technology platforms, forming information silos, causing market failures, and hindering economic development. The policy response treats this as a resource allocation problem that requires government intervention, positioning the state as a coordinator of data transactions and playing an important role in what policymakers call the "socialist market economy with Chinese characteristics."

In December 2022, the Central Committee of the Communist Party of China and the State Council issued the landmark "Twenty Provisions on Data," establishing the four pillars of data governance. The most innovative one is the first pillar, which addresses data property rights issues through "structural separation" – this framework does not confer data ownership but distinguishes between holding rights, processing rights, and operating rights. This ingenious workaround avoids the philosophical problem of data "ownership" (data itself is non-rivalrous and can be infinitely replicated) and promotes data market transactions. The remaining pillars establish a data circulation and trading system, a revenue distribution mechanism, and a security governance system through classified data management.

The second milestone occurred in August 2023 when the Chinese Ministry of Finance issued the world's first national accounting treatment regulations for data assets, which took effect in January 2024. Enterprises can now recognize eligible data resources on their balance sheets as inventory (data for sale) or intangible assets (data for providing services). Although the implementation of this standard is still in its early stages, this accounting innovation marks China's commitment to financializing data and making it an economic infrastructure.

In October 2023, the National Data Administration was established under the leadership of the National Development and Reform Commission, and its institutional framework is gradually taking shape, integrating the coordination among previously scattered local management departments. The National Data Administration is responsible for data development and circulation, while the Cyberspace Administration of China is responsible for data security – this division of labor reflects the ongoing contradiction between open data flow and maintaining data control.

The Technical Infrastructure of "Data is Available but Invisible"

China's data infrastructure operates based on a key principle: performing calculations on data without exposing the underlying records. The technical architecture to achieve this goal consists of three integrated layers.

State-supported data exchanges serve as intermediary platforms responsible for listing, pricing, and trading data products. The Shanghai Data Exchange (launched in November 2021) is a national model, and its international section was established in April 2023 to promote cross-border transactions and establish partnerships with international data providers. The Beijing International Big Data Exchange and the Shenzhen Data Exchange form the main hub network. These exchanges are responsible for product registration, including metadata and usage rights, implementing the separation of three rights, requiring buyers to clarify usage scenarios before approval, and integrating third-party certification, security verification, and compliance checks.

Privacy-preserving computing platforms form the enabling layer. WeBank's FATE (Federated AI Technology Enabler), hosted by the Linux Foundation since 2019, provides industrial-grade federated learning and secure multi-party computing protocols. Ant Group's SecretFlow, open-sourced in 2022, covers almost all mainstream privacy computing technologies. These platforms deploy federated learning for collaborative model training without centralizing raw data, secure multi-party computing for joint functions without revealing inputs, trusted execution environments for hardware-based isolated enclaves, and homomorphic encryption for computing on encrypted data.

Trusted data spaces represent the production infrastructure for China's secure data flow. The action plan issued by the National Data Administration in November 2024 aims to build more than 100 trusted data spaces by 2028, and pilot projects in enterprise, industry, city, and cross-border application areas were announced in 2025. These spaces are embedded with digital contracts, featuring automatic compliance execution, real-time monitoring, complete audit trails, and multi-party coordination functions, and connecting data providers, users, and regulators through blockchain-based traceability.

The Intersection of Data Infrastructure and AI Model Training

China's data element ecosystem is closely related to large language model development through various mechanisms. The Beijing International Big Data Exchange Center launched the "AI Alchemy Project" to collect global training datasets. In March 2024, Shanghai established the government-led Shanghai Kupasi Technology Co., Ltd., specifically responsible for collecting AI corpora. Shanghai's "5+6" vertical corpus project targets sectors such as finance, manufacturing, education, healthcare, cultural tourism, and urban governance.

The compliance framework for AI training data stems from the Interim Measures for the Administration of Generative AI Services (August 2023) – the world's first administrative regulation specifically targeting generative AI. Training data must come from legal sources, not infringe on intellectual property rights, and when personal information is involved, consent must be obtained or a legal basis provided in accordance with the Personal Information Protection Law. The initially strict requirement of "ensuring" data quality has been significantly relaxed to "improving" data quality – this reflects a pragmatic consideration of the challenges of large-scale implementation.

Three binding national standards will take effect in November 2025 (GB/T 45652–2025 for pre-training data, GB/T 45654–2025 for service security, and GB/T 45674–2025 for annotation). These standards codify detailed annotation requirements, including trained personnel, content validity spot checks, and standardized supervision.

Privacy-preserving computing technologies provide a compliant way to access sensitive data. Federated learning enables hospitals to collaboratively train medical AI models without sharing patient records. Model computing (MPC) allows financial institutions to jointly develop risk models without revealing proprietary data. The DeepLink technology stack of the Shanghai AI Laboratory demonstrates the forefront of this field: Hybrid training now covers Beijing, Shanghai, and Qinghai through telecommunications networks. This distributed architecture – partly influenced by US export controls that mandate the integration of multi-vendor GPUs – shows how privacy-preserving distributed training can enable compliant model development across jurisdictions.

China and GDPR: Fundamentally Different Operational Assumptions

To understand China's approach, one must grasp the conceptual differences between it and the EU's GDPR framework. These differences are not only reflected at the regulatory level but also at the ideological level.

The EU views data governance as rights protection: Data privacy protection extends individual autonomy, shielding individuals from excessive interference by enterprises and state power. This people-centered tradition holds that personal data essentially belongs to individuals, and relevant regulations aim to maintain this relationship. The resulting framework focuses on consent mechanisms, purpose limitation, and data minimization – all aimed at protecting individuals from exploitation.

China views data governance as industrial policy: Data is a strategic national resource whose value is underestimated and currently distorted in an unregulated market. The state's main role is not protection but allocation – guiding data flow to maximize national economic benefits. The Personal Information Protection Law still focuses on individual interests, but these interests must operate within a framework that prioritizes data production potential.

This explains several notable features of China's data processing approach: mandatory localization of certain categories of data, security assessments for cross-border data transfers, state support rather than a purely private market, and a clear push for enterprises to reflect data on their balance sheets. The underlying logic is that data resources – like land or mineral resources – require coordinated development and utilization rather than fragmented individual control.

For international practitioners, this creates a "dual-stack" reality: Many multinational companies currently operate independent global IT architectures and localized IT architectures in China. Three legal frameworks – the Cybersecurity Law (2017), the Data Security Law (2021), and the Personal Information Protection Law (2021) – result in overlapping compliance requirements. Cross-border data transfers require government security assessments, third-party certifications, or the use of Chinese standard contract clauses based on data type and volume.

Recent signs indicate a policy adjustment: Regulations that took effect in March 2024 relaxed some strict cross-border requirements, but the basic framework – viewing data as a production factor managed by the state – remains unchanged.

Technical Architecture Requirements for AI Companies

Companies developing or deploying AI systems in China must build multiple functions into their technical architectures.

Data traceability systems must record all sources of training data, maintain consent records for personal information, and mark data that may be classified as "important data," which requires government approval for cross-border transfer. The definition of important data varies by industry and is constantly evolving, so continuous monitoring is required.

Content security infrastructure includes pre-trained corpora for filtering prohibited content, real-time output auditing, and model optimization functions to handle violations within the three-month time limit stipulated by regulations. The algorithm filing and registration system must comply with the requirements of the CAC.

Privacy-preserving data pipelines can access sensitive Chinese data in a compliant manner: Federated learning is used for distributed training, differential privacy for corpus anonymization, and TEE/MPC integration for secure multi-party scenarios. These are not just compliance mechanisms but also key to enhancing competitiveness – they provide access to data resources that would otherwise be unavailable.

As US export controls force domestic accelerators such as Huawei Ascend, Biren, Moore Threads, and Cambricon to adopt multi-vendor GPU integration, heterogeneous computing infrastructure has become standard. Long-distance training across data centers has been verified, demonstrating how privacy-preserving distributed training can operate on a large scale.

Summary: Data Governance Requires Different Models

China's data element initiative represents a new data governance architecture that international AI practitioners cannot ignore. This framework solves practical coordination problems through mechanisms that are very different from Western models – breaking data silos, promoting cross-organizational collaboration, and establishing pricing mechanisms.

Technological innovation is significant: Industrial-grade federated learning platforms, blockchain-based trusted data spaces, state-supported exchanges with integrated compliance, and distributed heterogeneous training infrastructure. These are not just regulatory measures but also practical tools for AI development in line with Chinese characteristics.

For practitioners, the key is that China has built and will continue to build a parallel data infrastructure optimized for different assumptions about the relationships between individuals, enterprises, and the state. Whether as partners, competitors, or observers, participating in China's AI development requires an understanding of this infrastructure, which is not a deviation from universal standards but a coherent alternative system with its own logic, capabilities, and constraints.

This article is from the WeChat official account "Data-Driven Intelligence" (ID: Data_0101), author: Xiaoxiao. It is published by 36Kr with authorization.