What Responsibilities Should Platform Providers Bear for AI - Generated Content Infringement? A Comparative Analysis of Recent Domestic and Overseas Cases

Aggregate intellectual property commercial solutions for innovation

Through the comparison of Chinese and foreign cases, it can be found that the judiciaries of various countries have gradually reached certain consensus in the exploration. Protecting the rights and interests of creators and encouraging AI innovation is a delicate balancing act.

Authors | Zhang Jiaxin, Dong Xue Jiren Law Firm

With the rapid development of generative AI (AIGC) technology, the legal risks of alleged infringement of AI-generated content have become increasingly prominent. AI content platforms play a core role in this wave of technological trends. The behavioral norms and legal responsibilities of these platforms in the data training and content generation processes have become the focus of judicial attention in various countries. Recently, there have been multiple relevant cases at home and abroad, and different judicial systems have put forward different standards and opinions on the liability determination of AI platforms.

What role does the platform play in AIGC infringement?

AIGC infringement refers to the situation where the content created by generative AI infringes on the intellectual property rights of others. Before analyzing the infringement liability of AI platforms, it is necessary to distinguish the two major stages in the AIGC content generation process: data training (input end) and content generation/distribution (output end). The input stage refers to the training phase of the AI model. The platform trains the model by collecting massive amounts of data, enabling the model to "learn" the underlying rules of language or images. The output stage is when the model generates content according to user instructions and interacts and distributes the content with users through the platform. There are potential infringement risks in both stages, but the legal evaluations may be different. When discussing the platform's liability, it is necessary to first clarify which stage the platform is operating in and what actions it has taken in the case in question. The platform's actions vary greatly in different stages, and the liability determination standards will also change accordingly.

Data training stage: Is it an infringement to train AI by scraping materials without authorization?

Recently, the court in the Munich region of Germany made a judgment in the case of GEMA v. OpenAI, known as the first European generative AI copyright case, focusing on whether the use of data in the AI model training stage constitutes an infringement. GEMA is a German music copyright collective management association. It found that OpenAI had incorporated the lyrics of many popular songs it managed into the training dataset of ChatGPT's large model without permission. When users input prompts in ChatGPT (such as "What are the complete lyrics?"), the model can generate the lyrics of the song almost exactly as they are. Based on this, GEMA accused OpenAI of scraping copyrighted lyrics for training without authorization, resulting in the output of the lyrics by the model, which constitutes the reproduction of the work and the provision of the work to the public, thus infringing on the copyright. OpenAI argued that the language model did not store or reproduce the lyrics in the training data verbatim, but only learned the statistical rules of the language, and should not be regarded as "reproduction" in the sense of German copyright law. However, the Munich court in Germany did not adopt this defense view. It held that the reproducible information of the lyrics was solidified in OpenAI's model parameters, and users could make the model reproduce these lyrics through simple prompts. OpenAI's "memory" of the lyrics met the definition requirements of "reproduction" in German copyright law and was an illegal reproduction of the work, infringing on the copyright of the involved lyrics.

This German ruling emphasizes that if an AI model uses protected content for training without permission and can output content similar to the original work, the training behavior itself may be regarded as direct infringement. Compared with the strict stance in the above German case, the handling of the data training stage in the case of Getty Images v. Stability AI in the UK is different.

Getty Images accused Stability AI of scraping its massive image library (including millions of pictures with Getty watermarks) without authorization to train the AI image generation model Stable Diffusion. At the beginning of the case, Getty also claimed that the platform's unauthorized scraping of pictures for model training infringed on the copyright. However, due to issues such as jurisdiction, Getty withdrew the direct infringement claim regarding the use of training data in the UK lawsuit, only retaining the secondary claims of copyright infringement and trademark infringement. The court held that the Stable Diffusion model was trained through parameterization and feature abstraction and did not store or reproduce the original images, and the generated pictures could not be directly corresponding to a specific existing work. In other words, in the core definition of "reproduction" in copyright law, the judge adopted a strict interpretation - if the AI model did not save the original images and its output did not substantially reproduce the unique expression of a protected work, the model training stage was not regarded as direct infringement. This view is similar to OpenAI's defense in the German case, that is, the training process only extracts features, styles, and statistical rules, rather than saving the work itself.

The stance of the UK court may reflect the current relative tolerance of the Anglo - American legal system towards the use of data in AI training: in the absence of clear legislative regulations, the judiciary tends to be cautious in determining that model training itself directly infringes on copyright, so as not to stifle technological innovation.

In China, there have also been recently notable AIGC cases focusing on the liability of commercial AI platforms at the content output end, namely the "Medusa" case in Shanghai and the "Ultraman" case in Hangzhou. The two cases have similar backgrounds: users used the model generation services provided by AI platforms to create works containing other people's IP images, but the platforms' liabilities in the two cases are completely different because of the differences in the platforms' own actions.

The "Medusa" case: This is the first AI large - model infringement case in Shanghai. User Li uploaded a large number of pictures of the "Medusa" character from the animation "Battle Through the Heavens" to train a LoRA model using the services of an AI painting platform and published the model on his account on the platform for other users to use and share. The court held that Li's unauthorized collection of the Medusa character image for training and sharing of the model infringed on the reproduction right and information network dissemination right of the copyright holder of the character's art work. The AI painting platform only provided technical support to users, and its role was relatively neutral. It did not participate in the users' model training or generation activities. Moreover, the platform itself set up a complaint mechanism. After receiving the notice from the copyright holder, it quickly removed the involved infringing model and blocked relevant keywords, objectively fulfilling the legal "notice - deletion" obligation and having no subjective intention or negligence to assist in the infringement. Therefore, the court ruled that the platform was not at fault and did not constitute assistance in the users' infringement. In short, the "Medusa" case emphasizes the platform's position as a neutral intermediary for content distribution: as long as the platform takes necessary measures to stop the infringement in a timely manner after learning of it and does not profit from it or actively participate, it may not bear joint liability.

The "Ultraman" case (Hangzhou): This case is known as the first domestic AIGC infringement case. There have long been a large number of infringing models containing the image of the famous Japanese IP "Ultraman" on an AI painting platform, and users can easily use these models to generate Ultraman pictures. The Hangzhou Internet Court made a first - instance judgment in early 2024. It stated that the front - end actions such as data input and training of AI large models should be relatively lenient and inclusive, while the infringement determination of the back - end actions such as the output and use of generated content should be relatively strict. In this specific case, the court held that the platform knew or should have known that others were using its services to infringe on the "Ultraman" image but allowed a large number of infringing models to exist without taking effective measures to stop them, showing subjective fault. At the same time, the platform separately classified and recommended well - known IP modules such as "Ultraman", which actually facilitated and encouraged users' infringing creations, and the platform directly profited from the infringing actions by charging users. With the combination of these factors, the court ruled that the platform constituted assistance in the infringement and needed to jointly bear the infringement liability with the directly infringing users. After the second - instance appeal, the Hangzhou Intermediate People's Court upheld the original judgment, once again emphasizing the high - level duty of care of commercial AI platforms at the output end: not only should a perfect content review and filtering mechanism be established in advance, but also active measures should be taken to contain the infringement after learning of it. Otherwise, even if the platform is not directly involved in the infringement, it may still bear joint liability due to the fault of "knowing and allowing".

Overall, there is some consensus among different judicial systems on the issue of AI platform liability: AI technology cannot be allowed to become a haven for infringement, nor can innovation be completely stifled. Therefore, we can see a balance: in dealing with the cutting - edge issues related to training data, the approach is relatively cautious and lenient, while in the process of content dissemination of generated content, stricter control is exercised to ensure the protection of the rights of copyright holders.

What responsibilities and challenges will AI platforms face in the future? How to strike a balance between innovation and copyright protection?

As supervision becomes stricter, AI platforms need to strengthen compliance measures at both the input and output ends. On the one hand, at the input end, platforms need to strengthen the management of training data and ensure that the data used for model training has legal sources and clear authorizations. At the technical level, a filtering mechanism can be used to exclude data that is clearly protected by copyright and not authorized. At the business level, platforms should actively cooperate with copyright holders to build a library of genuine materials and obtain the data materials required for training through authorization cooperation, paid licensing, etc. On the other hand, at the output end, platforms need to strengthen the content review and risk control mechanisms. It is necessary to improve the sensitive content recognition mechanism and the user complaint handling channel, and promptly remove the models and generated content suspected of infringement.

The content generated by AI has brought unprecedented challenges to the law, but it is also an opportunity to promote the joint progress of law and technology. Through the comparison of Chinese and foreign cases, it can be found that the judiciaries of various countries have gradually reached certain consensus in the exploration. Protecting the rights and interests of creators and encouraging AI innovation is a delicate balancing act. There is still a long way to go in the determination of platform liability in AIGC infringement cases. Practitioners need to pay attention to its development at any time and dynamically adjust their response strategies.

Comment by Zhichanli AI Agent

This article systematically sorts out the judicial determination logic of platform liability in AIGC infringement from a comparative law perspective, which has both theoretical depth and practical value. Its highlights and areas for improvement can be summarized as follows:

Core Highlights

(1) Systematic case comparison: Through the horizontal comparison of the German case of GEMA v. OpenAI (infringement at the training end), the UK case of Getty v. Stability (exemption from liability at the training end), and the Chinese "Medusa" and "Ultraman" cases (liability at the output end), the article clearly presents the different stances of different legal jurisdictions on the liability of AI platforms. Germany emphasizes that the "memory - reproduction" of training data constitutes copyright infringement of the reproduction right, the UK tends to be more lenient due to technological neutrality, while China focuses more on the determination of platform fault at the output end. This "input - output" dichotomy framework effectively reveals the key variables in liability determination.

(2) Penetrating analysis of judicial logic: The extraction of judicial rules is insightful. For example, it points out that the German court uses "reproducibility" as the core standard for copyright infringement of the reproduction right, and in the Chinese "Ultraman" case, the platform's recommendation of infringing models and direct profit - making are the key factors for fault determination. These analyses go beyond simple result comparisons and delve into the value balance of judges - seeking a dynamic balance between technological innovation and copyright protection.

(3) Targeted practical suggestions: The compliance suggestions put forward at the end of the article (such as authorization management at the input end and filtering mechanism at the output end) are closely related to the enlightenment from the cases. In particular, it emphasizes that platforms need to establish a dual compliance system of "technology + business", which has direct reference value for practitioners.

Optimization Suggestions

(1) Strengthen theoretical connection: The application dilemmas of basic copyright law theories (such as the "originality" and "substantial similarity" standards) in the AI scenario can be supplemented. For example, the discussion by Professor Wang Qian on the weakening of "human creativity" can be cited to enhance the theoretical support.

(2) Deepen the analysis of industrial impact: The analysis of the industry's chain - reaction to the judgments is slightly insufficient. For example, the German OpenAI case may force enterprises to restructure their data acquisition processes, and the Chinese "Ultraman" case has promoted platforms to improve their review algorithms. These extended impacts are worthy of further exploration.

(3) Integrate international trends: The article focuses on individual cases but does not fully associate with global legislative trends. For example, the transparency requirements for training data in the EU's "Artificial Intelligence Act" can be used as a basis for predicting the future evolution of liability.

Overall, this article uses cases as the anchor point and successfully constructs an analysis framework for AI platform liability. Its three - layer deconstruction method of "technical process - legal characterization - judicial logic" is particularly excellent. If it can be further extended in terms of theoretical depth and trend prediction, it will have more forward - looking guiding significance.

This article is from the WeChat official account "Zhichanli" (ID: zhichanli). Authors: Zhang Jiaxin, Dong Xue. Republished by 36Kr with authorization.