HomeArticle

Build a dual-network architecture with separated data and push, and the "dual-brain" large model all-in-one machine breaks through the computing power bottleneck for landing | Early-stage project

黄 楠2024-11-29 14:30
Large models are moving from the era of Scaling Law to the era of "Real-time Learning".

Author|Huang Nan

Editor|Yuan Silai

In the wave of large model technology, the Scaling Law is followed by the industry as the first principle. Until the tech media The Information pointed out in an exclusive report that the training effect of OpenAI's next-generation flagship model Orion may be far from expectations: Compared with GPT-4, the performance improvement may be negligible. This has triggered in-depth thinking among practitioners about the development path of large models: Is the Scaling Law the only direction?

For a long time, there have been significant bottlenecks in the implementation of large models based on the Scaling Law. In order to improve the model's capabilities, manufacturers continue to expand the pre-training data, training computing power, and the scale of model parameters, which is not only costly; at the same time, the homogeneity of algorithms will also lead to the homogeneity of data scale and training computing power, ultimately resulting in the convergence of output capabilities. On the other hand, whether large models can effectively learn customer data and become domain experts is also a major challenge.

Currently, the centralized brute-force training that solely relies on the Scaling Law has revealed many drawbacks. The "intelligence" of large models is not only determined by the scale of parameters. How large models play a role in actual scenarios is the focus of enterprise customers. To break the high wall between the model and the application implementation, "Chuan Shen Internet of Things" that Hardcore recently came into contact with proposes that the centralized pre-training model deserves re-examination, and the real-time learning and training model is more worth exploring.

He Enpei, the chairman of "Chuan Shen Internet of Things", pointed out that under the same parameters of the large model, the more advanced the model algorithm and architecture, the less training computing power is required, and the less training data is needed. This not only does not affect the model's capabilities, but even in some indicators, it can surpass the large-parameter models of the conventional architecture. "In contrast, this small-parameter model with an efficient algorithm and architecture is more suitable for commercial implementation, and it can also meet the needs of general scenarios."

He Enpei, the founder of Chuan Shen, delivers a keynote speech on "Exploration and Practice Based on the Dual-Network Architecture and Data-Pushing Separation Large Model"

Based on this concept, "Chuan Shen Internet of Things" has adopted a dual-network architecture in its released Renduo Large Model, which is independently developed with a full technology stack and does not use any open-source code and framework, separating the inference network from the data learning network.

Among them, the customer data learning network is like the left brain of a human, focusing on the dynamic management and iterative training of data, continuously injecting knowledge nutrients into the model; the inference network is like the right brain of a human, as a basic network pre-trained with a large amount of data, it has good reasoning and generalization capabilities.

This design of the dual-network working collaboratively can effectively reduce the computing power cost of training, avoid problems such as the degradation of the base model's capabilities and the weakening of generalization capabilities caused by fine-tuning. At the same time, the data learning network can also learn the historical data of the enterprise and learn the new data generated by the business operation in real time. The two networks work together to output the results required by the customers.

Tests show that the Renduo Large Model, based on the data-pushing separation technology, has broken through the limitations of the conventional large model technology architecture. The context input length is not limited, and it can compress hundreds of millions of user data into the neural network and conduct in-depth knowledge understanding, which is extremely close to the "real-time" data learning mode. Even with a very small amount of data update, it can be quickly uploaded and the data compression can be completed, and it can be iterated into a customized large model for the enterprise.

The Renduo Large Model has two versions: 2.1B and 9B. In terms of reducing the computing power cost, the computing power costs during its training and inference are 10%-20% and 25%-50% of the equivalent large model, respectively.

Hardcore has learned that currently, "Chuan Shen Internet of Things" has applied the data-pushing separation large model with the dual-network architecture to the Renduo "Dual-Brain" Large Model Integrated Machine, which is about to be launched to the market. This integrated machine, based on the dual-brain mode of data-pushing separation, can solve the pain points of customer data off-site training, limited vector effect, and high talent investment, and realize the local real-time learning of updated data and quickly transform into an "enterprise knowledge expert".

Regarding the security and privatization of customer data, the Renduo "Dual-Brain" Large Model Integrated Machine can be deployed and trained locally without the need to be uploaded to the public cloud, ensuring the privacy and security of data. At the same time, its characteristics of root originality and high performance comparison can solve the pain points of high hardware investment, high energy consumption, technical security, and software vulnerabilities in the process of customers applying large models to a certain extent.