vivo has put the 3B end-side large model into mobile phones and also released an intelligent entity adapted to mobile phones | Frontline.
Author | Qiu Xiaofen
Editor | Su Jianxun
On October 10, the 2024 vivo Developer Conference (ODC) was held at the Shenzhen International Convention and Exhibition Center. This time, vivo showcased their new progress in large-scale models.
In the vivo Developer Conferences in the past two years, AI has always occupied the largest space - Zhou Wei, the president of vivo AI Global Research Institute, told the media including 36Kr that in the 6 years of investing in AI, vivo has invested more than 23 billion yuan in AI.
At the developer conference last year, vivo released the Lanxin Large Model Matrix, which consists of three parameter magnitudes of self-developed 1 billion, 10 billion, and 100 billion, and 5 language large models. If last year vivo pursued "big and comprehensive" in large-scale models, after a year of precipitation, vivo's strategy in AI this year focuses more on the implementation of AI and its combination with actual scenarios.
Lanxin Large Model Matrix
The end-side large model, which is naturally matched with the mobile phone scenario, is the focus of the AI part of this developer conference. This time, vivo released the Lanxin End-side Large Model with a parameter magnitude of 3 billion (hereinafter referred to as "Lanxin 3B") - while previously, mobile phone manufacturers were basically competing in models with 6B and 7B parameter magnitudes.
Zhou Wei said that in the past, the industry had an obsessive pursuit of the upper limit of the size of large-scale models. However, excessive pursuit of large parameters on the end side is meaningless. In the limited space of the mobile phone, it instead occupies memory and power, and does not play a significant role. And the vivo team found that the 3B model is the parameter quantity that is most suitable for mobile phone end-side applications.
According to the introduction, in terms of capabilities such as conversation writing, summary extraction, and information extraction, "Lanxin 3B" is almost comparable to the 7B - 9B models in the industry. vivo provided a series of parameter comparisons - compared to Lanxin 7B, Lanxin 3B has a 300% performance improvement, a 46% power consumption optimization, and only 1.4GB of memory occupation.
However, vivo's "Lanxin Large Model Matrix" is not only limited to the end-side large model. vivo also announced the upgrades of their other types of large models (speech, image, multimodal) this time.
For example, vivo's new language large model is based on a cloud-based large model of the 100-billion-level. This time, vivo focused on optimizing the intent understanding and task planning capabilities. Compared to last year, the overall capability has been improved by 30%.
vivo's new Lanxin Speech Large Model has enhanced the ability to accurately understand natural semantics and simulate human voices;
vivo's Lanxin Image Large Model has focused on strengthening the Oriental aesthetics and Chinese characteristics this year;
The Lanxin Multimodal Large Model has upgraded its visual perception and understanding capabilities.
Zhou Wei said that currently, the cost of calling the cloud-based large model on the mobile phone has dropped to "less than one cent per time".
Cost reduction not only comes from the decrease in cloud costs but also from vivo's continuous promotion of large-scale end-side popularization. "This year, we have end-sided more than ten or dozens of functions. In the future, maybe chatting, recognition, decision-making, and execution will all be end-sided."
Up to now, vivo's AI capabilities have covered more than 60 countries and regions around the world, serving more than 500 million mobile phone users, and the output of large-scale model tokens has exceeded 3 trillion.
However, various upgrades of large-scale models have laid a solid foundation for the underlying technical infrastructure, but to make users perceive it, further productization is required. At this developer conference, vivo explored and implemented "PhoneGPT" mobile phone intelligent entity on the mobile phone based on the Lanxin Large Model technology.
PhoneGPT
From the demonstration, this intelligent entity reconstructs the interaction mode between the user and the mobile phone. For example, based on vivo's voice interaction "Lanxin Xiaov", the user can perform recognition operations on the screen interface and directly take over the audio for autonomous conversations to complete the tasks assigned by the user, such as helping the user to book a restaurant or order coffee, etc.
The realization of AI on the mobile phone cannot be separated from the support of a powerful operating system. vivo is also exploring the deep integration of AI and OS based on the Lanxin Large Model as the basic technical foundation. At this developer conference, vivo launched a new generation of operating system "Original System 5" (OriginOS 5).
Zhou Wei said that the reconstruction of the operating system includes reconstructing a complete interaction and digital service experience.
In terms of the interaction experience, based on "Original System 5", the user can use one press and one copy, one press and one drag to meet the multi-tasking needs of the user. In addition, the system also supports new voices, such as the dialects of the Miao and Zhuang ethnic groups, etc.
In terms of reconstructing the digital service experience, with the support of the Lanxin Multimodal Large Model technology, vivo has newly launched the Xiaov Circle Search function.
According to vivo, on the basis of the existing text search, through the combination of image recognition and circle selection interaction, the precise search object is provided, offering a more convenient "one-circle search".
The Original System 5 also newly upgraded the "Atom Island" function. In addition to the notification function, the intent recognition ability of "Lanxin Xiaov" can analyze and judge the current needs of the user and actively provide the subsequent services for the user.
end