Welche Verantwortung muss die Plattform bei Urheberrechtsverletzungen von KI-generierten Inhalten übernehmen? - Vergleich und Interpretation von jüngsten Fällen in China und im Ausland
By comparing Chinese and foreign cases, it can be found that the judicial systems of different countries have gradually reached a certain consensus in their exploration. Balancing the protection of the rights of creators and the promotion of artificial intelligence innovation is an art that requires careful consideration.
Author | Zhang Jiaxin, Dong Xue Jiren Law Firm
With the rapid development of generative artificial intelligence (AIGC), the legal risks of alleged copyright infringement by AIGC products have become increasingly prominent. AI content platforms play a central role in this technological trend. The behavioral norms and legal responsibilities of these platforms in data training and content generation are the main focuses of the judicial systems of different countries. Recently, several relevant cases have emerged both in China and abroad, and different judicial systems have developed different standards and views on determining the responsibilities of AI platforms.
What role does the platform play in AIGC copyright infringement?
AIGC copyright infringement refers to the situation where content created by generative AI infringes on the intellectual property rights of others. Before analyzing the copyright infringement liability of AI platforms, it is necessary to distinguish two essential phases in the AIGC content generation process: data training (input side) and content generation/distribution (output side). The input phase refers to the training stage of the AI model. The platform trains the model by collecting a large amount of data so that the model can "learn" the underlying rules of language or images. The output phase is the stage where the model generates content according to the instructions of users and interacts and distributes this content with users through the platform. There are potential copyright infringement risks in both phases, but the legal evaluation may vary. When discussing the platform's liability, it must first be clear what actions the platform has taken at which stage of the relevant case. The platform's actions in different phases vary significantly, and the criteria for determining liability also change.
Data training phase: Is it copyright infringement to collect materials for AI training without permission?
Recently, the court in Munich, Germany, issued a ruling in the lawsuit of GEMA against OpenAI, which is regarded as the first case in Europe in the field of generative AI copyright. The ruling addresses whether the data use in the training stage of the AI model constitutes copyright infringement. GEMA is the German collective management organization for music copyrights. It found that OpenAI had included the texts of several popular songs it manages in the training dataset of the large ChatGPT model without permission. When users make an input in ChatGPT (e.g., "What is the complete lyrics?"), the model can generate the lyrics almost exactly. Therefore, GEMA accused OpenAI of using copyrighted song lyrics for training without permission, resulting in the model outputting the song lyrics, which is equivalent to reproducing the work and distributing it to the public, thus infringing on the copyright. OpenAI, on the other hand, argued that the language model did not store or copy the song lyrics in the training data verbatim but only learned the statistical rules of the language and therefore should not be regarded as "copying" in the sense of the German Copyright Act. However, the Munich court did not accept this argument and found that the repeatable information of the song lyrics was fixed in OpenAI's model parameters. Users can make the model reproduce these song lyrics through simple inputs. OpenAI's "memorization" of the song lyrics meets the criteria of the definition of "copying" in the German Copyright Act and is therefore an illegal reproduction of the work and infringes on the copyright of the relevant song lyrics.
This German ruling emphasizes that if an AI model uses protected content for training without permission and can output content similar to the original work, the training behavior itself may be regarded as direct copyright infringement. In comparison with the strict stance in Germany, the handling of the data training phase in the case of Getty Images against Stability AI in the UK is different.
Getty Images accused Stability AI of using its large image collection (including millions of images with Getty watermarks) for training the AI image generation model Stable Diffusion without permission. At the beginning of the case, Getty also claimed that the platform had collected images for model training without permission, thus infringing on the copyright. Due to issues of jurisdiction and other reasons, Getty withdrew the claim of direct copyright infringement regarding the use of training data in the UK lawsuit and only maintained the secondary claims of copyright and trademark infringement. The court found that the Stable Diffusion model is trained through parameterization and feature abstraction and does not store or reproduce the original images. The generated images cannot be attributed to a specific existing work. In other words, regarding the central definition of "copying" in copyright law, the judge adopted a strict interpretation - if the AI model does not store the original images and its output does not substantially reproduce the unique expression of a protected work, the training stage of the model is not regarded as direct copyright infringement. This view is similar to OpenAI's argument in the German case, that the training process only extracts features, styles, and statistical rules and does not store the work itself.
The stance of the UK court may reflect the relative tolerance of the common - law system towards data use in AI training: in the absence of clear legal regulations, the judiciary tends to be cautious when determining the model training itself as direct copyright infringement so as not to undermine technological innovation.
Recently, there have also been notable AIGC cases in China that focus on the responsibilities of commercial AI platforms on the output side, namely the "Medusa" case in Shanghai and the "Ultraman" case in Hangzhou. Both cases have a similar background: users used the model generation services of an AI platform to create works with the IP characters of others. However, the responsibilities of the platform in the two cases are different, which is due to the differences in the platform's own actions.
The "Medusa" case: This was the first case of copyright infringement by a large AI model in Shanghai. User Li used the services of an AI painting platform to upload a large number of images of the character "Medusa" from the animation "Battle Through the Heavens" to train a LoRA model. Then he published this model on his own account on the platform so that other users could use and share it. The court found that Li had collected the character of Medusa, trained the model, and shared it without permission, which infringed on the reproduction rights and the rights of online dissemination of the copyright owner of this work image. The AI painting platform only provided technical support to users. Its role was relatively neutral, and it did not participate in the users' model training or generation. In addition, the platform itself set up a complaint module. After receiving the notice from the copyright owner, it quickly removed the relevant infringing model from the platform and blocked the relevant keywords. Objectively, it fulfilled the legal "notice - deletion" obligations and subjectively showed no intention or negligence to support the copyright infringement. Therefore, the court found that the platform was not at fault and did not assume joint liability for the users' copyright infringement actions. In short, the "Medusa" case emphasizes the neutral role of the platform as an intermediary in content distribution: as long as the platform takes immediate measures to stop the infringement after learning of the copyright infringement and neither profits from it nor actively participates in it, it does not need to assume joint liability.
The "Ultraman" case (Hangzhou): This case is regarded as the first AIGC copyright infringement case in China. On an AI painting platform, there have long been a large number of infringing models with the famous Japanese IP character "Ultraman". Users could easily use these models to generate Ultraman images. In early 2024, the Hangzhou Internet Court issued a first - instance ruling. The court found that the preliminary actions such as data input and training of large AI models should be treated relatively leniently, while the determination of copyright infringement in subsequent actions such as the output and use of generated content should be stricter. In this specific case, the court found that the platform knew or should have known that users were using its services to infringe on the character of "Ultraman" but allowed a large number of infringing models to exist without taking effective measures to stop it. Therefore, it was subjectively at fault. At the same time, the platform categorized and recommended well - known IP modules such as "Ultraman" separately, which actually facilitated and promoted the users' infringing creativity. In addition, the platform directly profited from these infringing actions by charging users fees. Due to these factors, the court found that the platform assumed joint liability for the copyright infringement and was jointly liable with the directly infringing users. This ruling was confirmed by the Hangzhou Higher People's Court after an appeal process. It once again emphasizes the high duty of care of commercial AI platforms on the output side: they must not only establish a comprehensive content review and filtering system but also take active measures to stop the infringement after learning of the copyright infringement. Otherwise, the platform can also assume joint liability even if it was not directly involved in the copyright infringement, if it is at fault due to its "knowledge - negligence".
Overall, there is a certain consensus among different judicial systems regarding the responsibilities of AI platforms: one should not use AI technology as a shelter for copyright infringement, but neither should one stifle innovation. Therefore, we see a balance: cautious and lenient treatment is given to the forward - looking issues of data training, while stricter control is exercised over the distribution of generated content to ensure the protection of copyright owners.
What responsibilities and challenges will AI platforms face in the future? How can the balance between innovation and copyright protection be achieved?
With the tightening of regulations, AI platforms must strengthen their compliance measures on both the input and output sides. On the one hand, they need to improve the management of training data on the input side and ensure that the data sources for model training are legal and the permissions are clearly defined. At the technical level, they can exclude obviously copyrighted and unauthorized data through a filtering system. At the business level, they can actively cooperate with copyright owners to build a library of licensed materials and obtain the necessary data materials for training through license agreements or paid permissions. On the other hand, they must strengthen the content review and risk management system on the output side. It is necessary to improve the system for detecting sensitive content and the channels for handling user complaints and promptly remove the suspected infringing models and products from the platform.
The content generated by AI poses a hitherto unknown challenge to the law, but it is also an opportunity to advance the law and technology together. By comparing Chinese and foreign cases, it can be found that the judicial systems of different countries have gradually reached a certain consensus in their exploration. Balancing the protection of the rights of creators and the promotion of artificial intelligence innovation is an art that requires careful consideration. Determining the responsibilities of platforms in AIGC copyright infringement cases still has a long way to go. Industry members must constantly keep abreast of developments and dynamically adjust their strategies.
Evaluation by Zhichanli AI - Agent
This article systematically organizes the logical system of the judicial determination of platform responsibilities in AIGC copyright infringement cases from a comparative - legal perspective. It is both theoretically profound and practically valuable. The strengths and areas for improvement can be summarized as follows:
Main achievements
(1) Systematic comparison of cases: Through the horizontal comparison of the German case of GEMA against OpenAI (copyright infringement on the training side), the British case of Getty against Stability (exemption on the training side), and the Chinese cases of "Medusa" and "Ultraman" (responsibility on the output side), the article clearly shows the different stances of different legal jurisdictions regarding the responsibilities of AI platforms. Germany emphasizes that the "memorization - reproduction" of training data constitutes an infringement of reproduction rights, while the UK is more tolerant due to technical neutrality, and China focuses more on determining the platform's fault on the output side. This "input - output" two - part framework effectively highlights the key variables in determining liability.
(2) In - depth analysis of judicial logic: The summary of the decision - making rules is clear. For example, the article points out that the German court has defined "repeatability" as the core criterion for infringement of reproduction rights, while in the Chinese "Ultraman" case, the platform's recommendation of infringing models and direct profit - making are regarded as key criteria for determining fault. These analyses go beyond a simple comparison of results and deeply explore the value considerations of judges - namely, finding the dynamic balance between technological innovation and copyright protection.
(3) Targeted practical suggestions: The compliance suggestions at the end of the article (e.g., permission management on the input side and... (The text seems incomplete here))