HomeArticle

RenDu Dual-Brain Large Model: The first-of-its-kind technical route in China, breaking through the boundaries of AI technology.

36氪产业创新2024-11-19 10:15
Explore the separation of data and inference to reduce the cost of large models and improve efficiency.

Zen Buddhism says: "See the big from the small. One flower is a world, and one leaf is a Bodhi." It enlightens us that even in something as tiny as a leaf, we can perceive the grand wisdom of Bodhi.

Entering the AI era, enterprises often face challenges such as high decision-making costs, large investments, and unpredictable effects when undergoing intelligent transformation. In this context, enterprises urgently desire to break through the traditional AI reasoning and training logic and practice the concept of "seeing the big from the small and understanding the larger picture from the smaller details" in the era of large models. It is like comprehending the true meaning of Bodhi from a single leaf and more concisely and efficiently managing the intelligent process.

In the early stage of the rapid development of AI technology in 2021, some voices pointed out that the Scaling Law (Law of Scaling) may have limitations. However, the continuous iterations and significant technological leaps of ChatGPT from 2.0 to 3.0 and then to 3.5 have made the effectiveness of the Scaling Law widely recognized. But when ChatGPT 4.0 seems to reach the limit of human existing data processing, the exploration in this field seems to encounter a bottleneck.

In this context, the industry has begun to deeply discuss what new strategies and directions we need on the journey to AGI besides relying on the Scaling Law. The goal of large models should be to pursue the improvement of "wisdom" rather than just the scale of parameters. Customers expect large models to be effective in their actual scenarios, like experts who understand their business, to solve practical problems, rather than being an irrelevant external consultant, and not that the larger the parameters, the better. For large models to truly play the role of internal experts in serving customers, they cannot just stay at the surface interaction mode but must deeply understand and mine the actual data of customers. Therefore, the centralized pre-training mode needs to be reexamined, and the real-time learning and training mode is more worth exploring.

1. The future of large models cannot be fully bet on the Scaling Law

Many models are following OpenAI's path, blindly increasing the brain capacity of the model (that is, "parameters"), believing that this can make the model smarter. However, a large number of recent papers show that the smartness of a large model is not directly proportional to its brain capacity, and even when the brain capacity increases, the smartness may decrease instead.

Recently, Transn's "RenDu Degree-Pushing Separation Large Model" has taken a unique approach, using a dual-network architecture to achieve the separation of number-pushing, separating the reasoning network from the data learning network. It can be understood as two collaborative and linked brains: one is the customer data learning network brain, which focuses on the dynamic management and iterative training of data, continuously injecting knowledge into the model; the other is the reasoning network brain, as a basic network pre-trained with a large amount of data, it has good reasoning and generalization capabilities. The two networks work collaboratively through sharing the embedding layer and the intermediate representation layer, forming an efficient cooperation mode similar to the "main brain" and "auxiliary brain", which supports both independent training and joint reasoning.

With this innovative model, Transn has become the first artificial intelligence enterprise in the global large model field to achieve the technical route of number-pushing separation, and it is also a significant breakthrough for Chinese artificial intelligence in the industry.

 

(1) Breaking the context input length limit to achieve real-time data learning effects

The dual-network architecture of number-pushing separation can break through the limitations of the conventional large model number-pushing mixed integrated technical architecture. After the reasoning brain matures, the data brain can continuously learn the incoming data without affecting the ability of the reasoning brain. Therefore, for the dual-network architecture, the context input length is no longer limited, and data such as 100 million characters or even more can be compressed into the neural network to achieve in-depth knowledge understanding.

The technical architecture of the RenDu large model does not need to store data through a large number of parameters to enrich knowledge. It can rely on the data brain to learn data in real-time in the customer scenario. This can significantly reduce the parameter scale, thereby reducing the hardware input costs for training and reasoning.

This architecture can continuously learn and improve the completion of data compression with the new data generated by the customer's business development. In the number-pushing separation mode, the network compression for updating data has minimal impact on the reasoning network, can be widely adapted to various scenarios, flexibly handle data, and the training time can be shortened to the minute level.

(2) Data learning and training can be completed locally by the customer to ensure data security without worries

The dual-network large model architecture can reduce the computing power and energy consumption costs of training and reasoning, and can also effectively avoid the problems of base model ability degradation and generalization ability weakening that occur when the integrated large model is fine-tuned and trained with customer data. Moreover, the data network learns customer data without increasing computing power and professionals, and the data can be trained on the customer site to learn the enterprise's historical data and new data, eliminating the enterprise's concerns about data security.

The RenDu's number-pushing separation dual-brain mode solves the three major problems of customizing large models for customers in the application: the customer data needs to be off-site, the vector effect is poor, and the talent input is high. It realizes local real-time learning, allowing customer data to quickly transform into an "expert" to serve the customers. Importantly, the local training of customer data is not transmitted to the public cloud, ensuring the privacy and security of the data.

2. The Scaling Law is not omnipotent. The performance-parameter ratio is fundamental, and local enterprises need to find an alternative path

In the Chinese market, large language models have not fully reflected the Scaling Law. In the AGI field, the Scaling Law involves three elements: computing power, algorithm, and data, and its implementation requires a large amount of financial support. In the past period, even some international large companies claimed that in the context of big data and large computing power, the algorithm is worthless.

He Enpei, the founder of Transn, believes that the large model route that only relies on the Scaling Law has encountered a bottleneck. To truly break through, it needs to rely on algorithms and architectures. In fact, under different algorithms and frameworks, the model performance is not always proportional to the parameter scale. Small-parameter models with innovative architectures and efficient algorithm designs can also have strong performance, and even surpass conventional large-parameter models in specific indicators.

Currently, the dual-network architecture of the number-pushing separation large model has been applied to the RenDu "dual-brain" large model integrated machine, and the built-in RenDu large model has two parameters: 9B and 2.1B. In many domestic and international evaluations, the 9B parameter model stands out when compared with hundreds of billions or even trillions of parameter large models, achieving leading performance with fewer parameters.

In fact, the high cost investment has already made large model developers and using enterprises hesitate. And the best solution is obviously that enterprises can achieve the application of large models at the lowest cost. Compared with large-parameter models, small-parameter models reduce the computing power input and resource consumption, are more suitable for commercial landing, and meet the application requirements of general scenarios. It is a convenient way to verify the feasibility of the landing of large models. Therefore, Transn will increase R & D investment in improving the ability of large models in algorithms and architectures in the future and continue to iterate.

He Enpei firmly believes that "winning with algorithms" is one of the technical paths with Chinese characteristics, which is in line with the intelligence of the Chinese people and is particularly important in the AI era represented by large models. It is believed that there are many teams like Transn in China who are working silently and are leading intelligent innovation with unique concepts. Although they have not yet come to the forefront, they will eventually become an important force in the development of Chinese AI technology.