HomeArticle

Instead of following the Transformer route, Caiyun Technology launches the general large model Yunjin Tianzhang | Frontline

王方玉2024-11-14 20:03
The first general large-scale model developed based on the DCFormer architecture.

Written by Wang Fangyu

Edited by Su Jianxun

At present, the underlying technology of the vast majority of generative AI products stems from the Transformer model architecture proposed by Google in 2017. However, Caiyun Technology, an AI startup in China, has taken a unique approach and developed a brand-new model architecture called DCFormer, and launched a new product based on this.

On November 13, Caiyun Technology released the first general large model based on the DCFormer architecture, Yunjin Tianzhang, at its Beijing headquarters.

According to CEO Yuan Xingyuan, Yunjin Tianzhang can endow fictional characters in a fictional world view with basic abilities such as programming and mathematics. It can rapidly expand and condense a large amount of text, make large-scale changes to the article style, and also possess the basic abilities of question answering, mathematics, and programming of other models.

In addition to the different application scenarios it is good at, the biggest difference between Yunjin Tianzhang and the conventional large model lies in the underlying model. It is introduced that by improving the attention matrix, under the same training data, the DCFormer architecture can increase the computing power intelligent conversion rate to 1.7 to 2 times that of Transformer at most.

Furthermore, DCFormer is an improvement on the basis of Transformer and can be superimposed with the existing models instead of being mutually exclusive. Therefore, all large models based on the Transformer architecture can reduce costs on the basis of DCFormer.

The relevant papers on the achievements of Caiyun Technology's DCFormer architecture were officially published at the 41st International Conference on Machine Learning (ICML 2024) in May this year. This conference is one of the top three conferences in the field of international machine learning. In addition, the model code, weights, and training dataset of DCFormer have also been fully open-sourced on Github.

Why choose to take a unique approach and adopt the DCFormer architecture? Yuan Xingyuan told 36Kr that the huge energy demand of AI in the operation process has become an industry consensus, and improving the underlying model architecture to improve efficiency is the best strategy to deal with this challenge. The improvement of model efficiency can also effectively reduce the cost of the upgrade and iteration of artificial intelligence and accelerate the arrival of the AI era.

Although the DCFormer architecture can reduce the cost of large model training and inference, Caiyun Technology is relatively cautious in commercial exploration and pays attention to the input-output ratio.

At present, Caiyun Technology has three AI products for C-end users, namely Caiyun Weather, Caiyun Xiaomeng, and Caiyun Xiaoyi. It has obtained more than $10 million in ARR (Annual Recurring Revenue) in the global market and is one of the few artificial intelligence companies in China that can achieve profitability. Its latest round of financing is the B2 round personally invested by the former CEO of Kuaishou, Su Hua, with a pre-investment valuation of $120 million.

Yuan Xingyuan told 36Kr that the research and application development of the DCFormer architecture by Caiyun Technology mainly serves its own business. Currently, the new Caiyun Xiaomeng V3.5 based on the DCFormer architecture can create content of several hundred to one thousand words at a time while maintaining logical coherence and detailed description. In the future, it is expected to break through to create 2,000 - 5,000 words, achieving a stronger intelligent level and a higher goal of user activity.