Die Vergangenheit der Umschichtung von Großen Modellen: Ein umfassender Überblick

Ist das Umschlagen eines Produkts tatsächlich Plagiat oder eher Inspiration?

The controversy over whether Huawei's Pangu large model is a rebranded version of Alibaba Cloud's Qwen large model has once again brought the discussion of "original research" and "rebranding" of models to the forefront.

Looking back three years ago, when ChatGPT just opened the era of large models, the practice of rebranding was still at the stage of small - scale imitation of ChatGPT. By calling ChatGPT's API and adding a "Chinese UI" on the interface, one could sell memberships in WeChat groups based on the number of calls. That year, rebranding became the first ticket for many people to enter the AI wealth story.

Meanwhile, among the companies that started to independently develop large models, there were also many that borrowed from ChatGPT. Although these enterprises had their own self - developed model architectures, they more or less used the data generated by dialogue models such as ChatGPT or GPT - 4 for fine - tuning during the fine - tuning stage. These synthetic corpora not only ensured the diversity of data but also were high - quality data aligned by OpenAI. Borrowing from ChatGPT was an open secret in the industry.

Since 2023, the large - model track has entered the open - source era. Using open - source frameworks for model training has become the choice of many startup teams. More and more teams have made their research results public, promoting the exchange and iteration of technology and making rebranded development a more common practice. Consequently, controversial rebranding incidents have gradually increased. Various suspected rebranding incidents have frequently hit the hot search lists and then been explained and clarified by the relevant parties.

The domestic large - model industry has been developing forward in the cycle of "rebranding" and "being rebranded".

01 The Year of GPT's Popularity: Counterfeit APIs and Data Creation

Looking back at the evolution history of AI, all kinds of large models we see today originate from the same ancestor - the Transformer neural network architecture released by Google Brain team in 2017. The original architecture and core of Transformer include an encoder and a decoder. The encoder is responsible for understanding the input text, and the decoder is responsible for generating the output text.

Today, three major mainstream Transformer architectures are still used in the field of large - language models: Decoder - only (such as the GPT series), Encoder - Decoder (such as T5), and Encoder - only (such as BERT). However, the most attention - grabbing and widely applied architecture is the GPT - style architecture centered on Decoder - only, which has continuously derived various variants.

In November 2022, based on GPT 3.5, OpenAI launched ChatGPT. It acquired tens of millions of users in a short time after its release, officially bringing LLM onto the public stage and making the GPT architecture the mainstream AI architecture. As ChatGPT fired the first shot in the large - model era, major manufacturers flocked to the large - model R & D track. Since ChatGPT could not be directly accessed by domestic users, some small workshops saw the profit - making prospects of rebranding.

Since the end of 2022, many counterfeit ChatGPTs have emerged on the Internet. At this time, rebranding basically did not involve any secondary development. Many developers simply packaged the API and sold it for profit.

From the end of 2022 to 2023, hundreds of ChatGPT mirror sites emerged in China, including the once - famous official account "ChatGPT Online". The operator obtained the OpenAI API and then resold it at a higher price on the front - end. This inferior rebranding method was soon discovered by the regulatory authorities. Shanghai Entropy Cloud Network Technology Co., Ltd., behind "ChatGPT Online", was fined 60,000 yuan for suspected imitation of ChatGPT, becoming the first administrative penalty for "ChatGPT rebranding".

On the other hand, in other models released during the same period, there were often some "GPT - flavored" responses, and the enterprises behind these models also faced rebranding suspicions.

In May 2023, some netizens found that iFlytek Spark large model would show content such as "I was developed by OpenAI" in some Q&A sessions. Thus, a piece of news about "iFlytek Spark large model being suspected of 'rebranding ChatGPT'" spread.

This situation is not an isolated case. Even DeepSeek V3, released in 2024, had a problem. Some users reported that it behaved abnormally during testing, and the model claimed to be OpenAI's ChatGPT. The relevant enterprise explained that this might be due to a large amount of content generated by ChatGPT being mixed into the training data, resulting in the model's "identity confusion".

The data pollution caused by the increasing amount of AI content in the public information on the Internet is indeed a possible reason for these "GPT - flavored" conversations. But another possibility is that the model R & D team actively used the data sets constructed by models under OpenAI such as ChatGPT during the fine - tuning training process, that is, the so - called "data distillation".

Data distillation is an efficient and low - cost knowledge transfer method in large - model training. The logic here is like using a powerful "teacher model" (such as GPT - 4) to generate a large amount of high - quality Q&A data and then feeding these data to a "student model" for learning.

In fact, after GPT - 3, OpenAI completely turned to closed - source. So for competitors who want to independently develop large models, they cannot rebrand OpenAI's products at the basic architecture level. These enterprises also have some accumulations in model technology to some extent and have launched their own research results at the architecture level. But if they want to ensure the training quality, it is undoubtedly a shortcut to obtain data by borrowing from stronger model products.

Although borrowing from ChatGPT/GPT - 4 to generate training data is an open secret in the industry, there have been few disclosed cases until the famous "ByteDance copying homework" incident. In December 2023, foreign media The Verge reported that ByteDance used Microsoft's OpenAI API account to generate data for training its own AI model, which actually violated the usage terms of Microsoft and OpenAI. Shortly after this news was disclosed, it was rumored that OpenAI suspended ByteDance's account.

ByteDance later stated that this incident was due to some engineers in the technical team applying GPT's API service to experimental project research during the early exploration of the model. The model was only for testing, had no plan to go online, and was never used externally. According to ByteDance, its use of the OpenAI model was before the release of the usage regulations.

Regarding this, Ye Zhiqiu from the algorithm department of a leading domestic AI enterprise told Face AI (ID: faceaibangg) that the general understanding in the industry is that data distillation should not be considered rebranding. "Data distillation is just a means to use a model with sufficient capabilities to generate data for additional training of another model in a vertical field."

Additional training (Continual Training) is a common method to improve model performance. By continuing to train the model on new data, it can better adapt to new tasks and fields. "If using data distillation for additional training is considered rebranding, then this technology should not be allowed." Ye Zhiqiu explained.

In 2025, the large - model development market is becoming increasingly mature, and model products that directly call APIs for "counterfeit rebranding" have gradually disappeared. At the application level, with the rapid iteration of the AI Agent field, it has become normal to call APIs to implement AI tools. General AI Agents like Manus have gradually entered the market, and rebranding at the AI application level has become a common technical means.

In the field of large - model development, with the arrival of the open - source era, rebranding in the model development field has fallen into a new round of debate.

02 The Era of Open - Source Large Models: Everyone Can Use

In 2023, many manufacturers chose the open - source approach to announce their model solutions to stimulate the iteration of models and model applications by the developer community. When Meta open - sourced LLaMA 2 in July 2023, it marked that the AI industry had entered the open - source era. After that, more than a dozen domestic models were launched after fine - tuning LLaMA 2. Meanwhile, using open - source model architectures for secondary development has become a new point of rebranding controversy.

In July 2023, Wang Xiaochuan, the CEO of Baichuan Intelligence, responded to the outside world's suspicion that its open - source model Baichuan - 7B was a rebranded version of LLaMA. He mentioned that there were about nine technological innovation points in the LLaMA 2 technical report, and six of them had been achieved in the model being developed by Baichuan Intelligence. "When comparing with LLaMA 2, our thinking in technology is not simply copying and borrowing. We have our own thoughts."

Just a few months later, the domestic AI circle witnessed another more intense rebranding storm. In November 2023, Jia Yangqing, the former vice - president of Alibaba's technology and the inventor of the deep - learning framework Caffe, said in his WeChat Moments that a certain rebranded model's approach was "changing the name in the code from LLaMA to their own name and then changing a few variable names." Later, it was confirmed that this information pointed directly to the Yi - 34B model under Zero One Everything, and the rebranding controversy in the open - source era was put on the table.

For a while, there was a fierce debate in major technical communities about whether Zero One Everything violated LLaMA's open - source agreement. Subsequently, Arthur Zucker, an engineer at Hugging Face, gave his opinion on this incident. He believed that LLaMA's open - source agreement mainly restricted the model weights, not the model architecture, so Zero One Everything's Yi - 34B did not violate the open - source agreement.

In fact, using an open - source model architecture is just the first step in creating a new model. Zero One Everything also explained in the description of the training process of Yi - 34B: The model training process is like cooking. The architecture only determines the raw materials and general steps of cooking... It has invested most of its energy in adjusting the training method, data ratio, data engineering, detailed parameters, and baby - sitting (training process monitoring) skills.

For the AI industry, one of the meanings of promoting technology open - sourcing is to stop "reinventing the wheel". Developing a brand - new model architecture from scratch and running through the pre - training process requires a large amount of cost. Open - sourcing by leading enterprises can reduce resource waste, and new teams can quickly invest in model technology iteration and application scenarios through rebranding. Robin Li, the CEO of Baidu, once said: "It doesn't make much sense to recreate a ChatGPT. There are great opportunities to develop applications based on large - language models, but there is no need to reinvent the wheel."

From 2023 to 2024, the AI industry witnessed a "battle of hundreds of models". Among the domestic large models, about 10% are base models, and 90% are industry models and vertical models that add specific data sets for fine - tuning on the basis of open - source models. Rebranding has helped a large number of small and medium - sized teams stand on the shoulders of giants and focus on engineering and application exploration in specific fields.

Today, when searching on Hugging Face sorted by "popularity", taking text models as an example, DeepSeek R1/V3, LLaMA3.2/3.3, Qwen2.5, and the Mistral series of models from France are all at the top. The download volumes of these open - source models range from hundreds of thousands to millions. This shows that open - sourcing has greatly promoted the evolution of the industry. Currently, there are more than 1.5 million models on the Hugging Face platform, and the vast majority of them are derivatives based on open - source architectures - sft fine - tuned versions, LoRA fine - tuned versions, etc.

On the other hand, with the emergence of lightweight fine - tuning solutions such as LoRA and QLoRA, the cost of directional fine - tuning of models is also constantly decreasing, providing a favorable basis for small and medium - sized teams to develop models. A survey by McKinsey in May this year showed that 92% of enterprises improved their business efficiency by 24% - 37% by fine - tuning open - source large models.

Since 2023, the threshold for model development has been continuously lowered due to open - sourcing. While a good ecosystem with a hundred models blooming has emerged, some bad rebranding behaviors of taking advantage of the situation have also appeared.

In May 2024, a research team from Stanford University released a model called LLaMA3V, claiming that a SOTA multi - modal model comparable to GPT - 4V could be trained with only $500 (about 3,650 yuan).

But later, some netizens found that LLaMA3V highly overlapped with the 8B multi - modal open - source small model MiniCPM - LLaMA3 - V 2.59 (Mianbi Xiaogangpao) released by Chinese enterprise Mianbi Intelligence in the same month. After the evidence of rebranding and plagiarism was confirmed, the team deleted the repository and disappeared. This incident reflects on the one hand that domestic models have also become the targets of rebranding due to their excellent performance; at the same time, it has once again triggered the industry's thinking about the compliance boundary of rebranding in the open - source era.

For the AI industry, promoting technology open - sourcing means that one of its significances is to stop "reinventing the wheel". Developing a brand - new model architecture from scratch and running through the pre - training process requires a large amount of cost. Open - sourcing by leading enterprises can reduce resource waste, and new teams can quickly invest in model technology iteration and application scenarios through rebranding. Robin Li, the CEO of Baidu, once said: "It doesn't make much sense to recreate a ChatGPT. There are great opportunities to develop applications based on large - language models, but there is no need to reinvent the wheel."

For the AI industry, promoting the open - sourcing of technology has the meaning of stopping "reinventing the wheel". Developing a brand - new model architecture from scratch and running through the pre - training process requires a large amount of cost. Leading enterprises' open - sourcing can reduce resource waste, and new - coming teams can quickly engage in model technology iteration and application scenarios through rebranding. Baidu CEO Robin Li once said: "It doesn't make much sense to recreate a ChatGPT. There are great opportunities to develop applications based on large - language models, but there is no need to reinvent the wheel."

For the AI industry, promoting technology open - sourcing helps to stop "reinventing the wheel". Developing a new model architecture from scratch and completing the pre - training process is costly. Open - sourcing by leading companies can reduce resource waste, and new teams can quickly enter model technology iteration and application scenarios through rebranding. Baidu CEO Robin Li once said, "It's not very meaningful to recreate a ChatGPT. There are great opportunities to develop applications based on large - language models, and there's no need to reinvent the wheel."

For the AI industry, promoting technology open - sourcing has the significance of avoiding "reinventing the wheel". Developing a brand - new model architecture from scratch and running through the pre - training process incurs high costs. Open - sourcing by leading enterprises can reduce resource waste, and new - entrant teams can rapidly engage in model technology iteration and application scenarios through rebranding. Baidu CEO Robin Li once stated, "Recreating a ChatGPT doesn't hold much value. There are ample opportunities to develop applications based on large - language models, and there's no necessity to reinvent the wheel."

For the AI industry, promoting the open - source of technology is to avoid "reinventing the wheel". Developing a new model architecture from scratch and running the pre - training process is very costly. Open - sourcing by leading companies can reduce resource waste, and new teams can quickly enter the model technology iteration and application scenarios through rebranding. Baidu CEO Robin Li once said, "It's not worth recreating a ChatGPT. There are many opportunities to develop applications based on large - language models, and there's no need to reinvent the wheel."

For the AI industry, promoting technology open - sourcing

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。