HomeArticle

Li Kaifu responds to giving up pre-training: Training a large model once costs three to four million US dollars, which top companies can afford. | Frontline

周鑫雨2024-10-16 20:24
The latest model of ZeroOne Everything has outperformed GPT-4o in the ranking.

Written by Zhou Xinyu

Edited by Su Jianxun

"ZeroOneWorld will never give up pre-training."

On October 16, 2024, Li Kaifu, the founder and CEO of AI unicorn ZeroOneWorld, publicly made a military order. Also on the same day, ZeroOneWorld released its latest flagship large model with the MoE (Mixture of Experts) architecture: Yi-Lightning.

This is also the new development of ZeroOneWorld in updating the model after a lapse of 5 months.

Li Kaifu said that the training of Yi-Lightning only used 2000 GPUs, with a training period of only one and a half months, and a cost of only over 3 million US dollars, which is 1% or 2% of the cost of Elon Musk's xAI.

Although the training cost is low, the performance of Yi-Lightning is not compromised. On the overall list of Chatbot Arena of the LMSYS team at the University of California, Berkeley, The performance of Yi-Lightning is tied for the 6th place with xAI's Grok-2-08-13 model, and surpasses OpenAI's GPT-4o-2024-05-13.

The overall list of the comprehensive ability of the large language model of Chatbot Arena of the LMSYS team. Source: ZeroOneWorld

It is worth noting that in terms of Chinese ability, Yi-Lightning tied for the second place with OpenAI's latest model o1-mini version; in terms of mathematical ability, Yi-Lightning tied for the 3rd place with Gemini-1.5-Pro-002, only second to o1, which is strong in mathematics and logical reasoning.

In terms of pricing, Yi-Lightning also set the lowest pricing for ZeroOneWorld's model: 0.99 yuan per million Tokens.

The pricing of ZeroOneWorld's model. Source: The official website of ZeroOneWorld

Of course, the ranking on the list does not mean the ability of the model in the task scenarios. At the press conference, ZeroOneWorld focused on practicality and let Yi-Lightining show its strength.

For example, compared with Yi-Large released in May 2024, the first-packet time (the time from receiving the task request to the system starting to output the response result) of Yi-Lightining is shortened by half, and the maximum generation speed is increased by nearly four times.

In the specific translation scenario, Yi-Lightining is faster than the latest flagship models of Doubao, DeepSeek, and Tongyi Qianwen in translation speed, and the translation results are more faithful, expressive, and elegant.

The performance of the four models on the same translation task. Source: ZeroOneWorld

At the press conference, Li Kaifu also revealed the training strategy of Yi-Lightning:

  • Unique Hybrid Attention Mechanism: In the process of handling long sequence data, it can improve performance while reducing inference cost;

  • Dynamic Top-P Routing Mechanism: Automatically selects the most suitable combination of expert networks according to the task difficulty;

  • Multi-stage Training: Allows the model to absorb different knowledge at different stages, facilitating the debugging work of the mixing team for data ratio adjustment, and ensuring the training speed and stability at different stages.

At the press conference in May 2024, ZeroOneWorld released the C-end productivity product "Wanzhi" in China. After a lapse of five months, the B-end commercialization map of ZeroOneWorld has also made new progress - AI 2.0 digital humans focusing on scenarios such as retail and e-commerce.

Behind the AI 2.0 digital humans is the multi-modal collaborative training of the e-commerce sales talk large model, the role large model, and the live-streaming voice large model. Currently, the AI 2.0 digital humans have also been connected to Yi-Lightning. Users only need to input the goods for sale, the gender and tone of the voice, and the corresponding digital human can be generated.

The comparison before and after the connection of ZeroOneWorld AI 2.0 digital humans to Yi-Lightning. Source: ZeroOneWorld

Nowadays, the development of large models has also come to the deep water area of technical exploration. Even though he made a military order of "never giving up pre-training", Li Kaifu also admitted: "But not every company can do this, and the cost of doing this is relatively high. In the future, there may be fewer and fewer large model companies training for pre-training."

However, Li Kaifu still has an optimistic attitude towards the current six large model unicorns:

"As far as I know, the financing amounts of these six companies are sufficient. For our pre-training production run, the training cost is three to four million US dollars at a time, and this amount is also affordable for top companies. I think as long as the six large model companies in China have good enough talents, the determination to do pre-training, financing and chips will not be a problem."

Welcome to communicate!