HomeArticle

Vivo: How many times does one have to overthrow oneself to make an AI phone?

36氪品牌2024-10-11 18:26
AI phones, vivo paves the way.

Although vivo has consecutively ranked as the top 1 in the domestic mobile phone market share for three years, the sense of "not fitting in" still lingers on it.

For instance, when everyone believes that the mobile phone business is no longer attractive and turns to building industrial ecosystems and considering entering the automotive industry, vivo remains committed to the mobile phone territory: from image optimization to the operating system, and from upgrading the AI capabilities to reverse-engineering the chip design.

It wasn't until October this year that vivo suddenly started talking about the AI large model and the "new story" of the operating system and AI mobile phones.

How to Define an AI Mobile Phone?

This story is not a grand narrative. On the surface, it is another embrace and exploration of new technologies by traditional enterprises; in essence, it is the best analysis sample of a leader's self-subversion when there is no "reference answer" on the road ahead.

As early as 2017, vivo established a dedicated AI research team internally; the "Blue Heart Little V" based on large model technology and vivo's self-developed general large model matrix "Blue Heart Large Model" were also launched as early as 2023. But is an AI assistant installed on the phone and an intelligent effect to remove passers-by in the album all the imagination of an AI mobile phone?

Such a definition is rather opportunistic.

In order to think about this problem, vivo has experienced a long period of pain.

Led by Zhou Wei, vice president of vivo, last year, vivo spent a full 11 months thinking about what an AI mobile phone is exactly.

The difficulties came unexpectedly. The always-successful "user-oriented" research had little effect on this new thing about "AI mobile phones": When you asked a person 200 years ago how to make a car run faster, he would only tell you to choose a more expensive horse... A similar plot is repeated in the era of large models.

When there is no road ahead, create one yourself. In the last 6 months of the silent period, vivo decided to "build a car that no one has ever seen from scratch."

At the vivo Developer Conference on October 10, vivo officially announced a new AI strategy - Blue Heart Intelligence, which is the personal intelligence after the deep integration of large model technology and the mobile phone operating system. vivo deeply integrates large model technology with the mobile phone operating system to continuously build a more natural and intuitive interaction for users, bringing a more intelligent and warm experience.

But what is the difference between this AI + operating system and the past smart albums and smart memos? vivo decomposes this deep integration into three steps:

The first step is to reshape from the bottom and start with the interaction.

Inside vivo, before the launch of each new product, there must be a review and deduction of the industry logic. Zhou Wei once assigned such a thinking question to the team: Why can the touch screen replace the traditional keyboard-style Nokia?

A most basic logic is that the interaction logic of the touch screen is definitely more convenient than various buttons; sliding left and right is more flexible and has more possibilities than pressing the keyboard. After that, the touchscreen phone has developed from being operated with fingernails to being operated with fingertips. The essential change is to make the entire operation and interaction more in line with people's natural habits and intuition.

Thinking to this step, the problem can be further extended. What is more efficient than touch? Undoubtedly, it is voice. This seemingly over-discussed technology, once integrated into the operating system, will bring a huge breakthrough and change in the product experience.

But the voice is different from the globally unified standard gesture touch. Different countries and regions have different languages. For this reason, vivo's voice large model has carried out specialized language translation and adaptation for Cantonese, Sichuan dialect, Northeastern dialect, Henan dialect, and even Miao dialect, striving for a more natural and emotional human-computer dialogue.

After the interaction link is shortened, vivo's second fire is directed at the service experience between people and the digital world: AI mobile phones need to acquire the ability to change from passive to active. More popularly speaking, as the "digital companion" that knows the user best, the mobile phone should not only respond passively.

This passive-to-active change can be disassembled from three directions: The first point is to comprehensively upgrade the basic functions of the mobile phone relying on AI technology, reconstruct 15 essential functions such as input method, phone calls, notes, and scanning, and build a platform-based AI public capability; the second point is to build a framework and platform for connecting services, including an intelligent body platform that integrates development and distribution for developers, and a complete intention framework system for lightweight and atomic services, so that while people find services, services also actively match people's needs. The third point is to build a personal intelligent system framework to make the mobile phone a dedicated personal assistant.

For example, "Little V Suggestions" can be in the form of a resident component on the desktop, providing active and thoughtful services 24 hours a day. When the user is on a business trip, whether it is a taxi suggestion when going out, a boarding gate reminder after arriving at the airport, a city guide after arriving, or a check-in reminder before arriving at the hotel, Little V can predict in advance and give the most appropriate suggestions.

Another example is "Little V Memory", which not only understands the user's thoughts but also silently remembers every bit of the time spent with the user. For various articles and videos collected by the user on a daily basis, Little V will also carefully organize them. On some flagship models, Little V will use the analysis capabilities on the device side to present the collected content in a more logical manner.

Furthermore, can the mobile phone only connect to the digital world? vivo's thinking is no, and the mobile phone should be able to reconstruct the connection between people and the physical world with the help of a large model.

For example, "vivo See Blue Heart Upgrade", through pairing with wireless headphones and a compatible camera, can help blind people, telling them where the shampoo is, where the conditioner is, where the body wash is, and how to get on the bus? It can even tell them what is being exhibited in a museum. Thus, visually impaired people can better see and hear the world in front of them and explore the beauty of the world.

The Dragon-Slaying Sword of Technology and the Old Battlefield of Application

After determining what vivo is going to do and what it can do, the next story enters a dual-track narrative plot, forging the dragon-slaying sword of technology and finding the battlefield of application.

Technologically, during the 2024 vivo Developer Conference, vivo made four key announcements about AI capabilities:

Key point 1, [Upgrade the Language Large Model]: vivo officially launched the 100-billion-level Blue Heart Large Model cloud capability, focusing on optimizing intent understanding and distribution, and task planning capabilities. Compared to last year, the overall capability has increased by 30%, and it continues to lead the domestic first echelon on the CMMLU and SuperCLUE lists.

Key point 2, [Release the Blue Heart End-side Large Model 3B]: In response to the "small model, strong capability, low power consumption" industry's impossible triangle dilemma, vivo launched a new 3-billion-parameter Blue Heart End-side Large Model 3B. In terms of capabilities such as conversation writing, summary extraction, and information extraction, it can surpass the industry's 7B - 9B models. Compared to the Blue Heart 7B, the extreme performance is improved by 300%, the power consumption in the balanced mode is optimized by 46%, the memory is reduced by 63%, the extreme word output speed can reach 80 words/s, the system power consumption is only 450mA, and the memory only occupies 1.4GB.

Key point 3, [Release the Blue Heart Voice Large Model]: Currently, vivo's self-developed Blue Heart Voice Large Model can accurately understand natural semantics, perceive emotions, simulate human voices, and support simultaneous interpretation in Chinese, English, Japanese, Korean, and Thai, as well as mutual translation in more than 15 languages.

Key point 4, [Release the Blue Heart Image Large Model and the Blue Heart Multi-modal Large Model]: vivo has upgraded the Blue Heart Image Large Model for Chinese characteristics and Eastern aesthetics; the Blue Heart Multi-modal Large Model has upgraded the context understanding and memory capabilities this year, enabling a deeper understanding of the phone screen and a more smooth and natural real-time conversation on the video stream.

With the dragon-slaying sword of technology in hand, where should vivo use it?

Rather than redefining the mobile phone, vivo prefers to call it the optimization of existing functions: At least, years of product experience tell vivo that for the current mobile phone functions such as making calls, sending text messages, image editing, calculators, perhaps we have become sufficiently adapted and familiar, but there is still a long way to go to reach perfection.

This is a new battle in an old battlefield, and the next key task is to sort out where exactly those markets that we are accustomed to but still have huge room for improvement are.

Zhou Wei recalls that every year he would specially set aside three months for seclusion. "We have more than 130 tracks, corresponding to more than 130 technical team directors and senior directors. With them, I spend four and a half days a week in meetings for five days, answering how you are going to do communication, what is the mission of communication? What are your 123 goals that you are going to complete? Every track has gone through this process."

The typical achievement of existing function optimization is search. The newly launched intelligent body Little V Circle Search not only supports calling it out by long-pressing the navigation bar, but users can also directly drag and drop pictures, files, and text for processing. In addition to supporting the existing voice and text instruction input, users can also use the most natural fingertip circle selection method to send the content they want to know to Little V, and quickly find the desired local documents or services through Little V Search. Click the preview list to open it directly.

The representative of the system-level optimization is the update of the memory optimization mechanism of OriginOS 5. With the increasing memory occupancy of software such as WeChat and Honor of Kings, insufficient memory and lag have become the top concerns for many users. In the past, the common practice in the industry was to add more hardware. On this basis, vivo has pioneered the Ledger Memory Ledger mechanism in the Android field at the software level and iteratively upgraded the Unfair Scheduling Mechanism 3.0 and Virtual Graphics Card 2.0. Through the overall optimization in storage, computing efficiency, and display experience, it achieves "heavy load as light load, and smooth operation for a long time", and can more easily handle heavy-load scenarios such as large-scale mobile games, bringing an ultimate smooth experience in senses and operations.

After thinking about doing subtraction for optimization, the next questions are how to do subtraction and who will do it.

Using an Engineering Approach to Build Large Models

In the process of doing subtraction, vivo once compared itself to a contractor in a huge project. And the most important work of the contractor is communication and sampling.

The focus of sampling is the intelligent body. For example, for the most commonly used smart home connection center, in the past, controlling smart home devices through mobile phones often involved cumbersome steps and frequent issues of model incompatibility. To solve this problem, vivo trained an intelligent body that can operate more than 4,000 types of air conditioners. In this way, compared to the past, for the same smart home control, the new intelligent body has greatly improved the generalization of air conditioner adaptation and control.

Taking this as a model, vivo's next task is to build a large ecosystem and negotiate cooperation one by one. vivo's idea is to break out of the thinking limitation that major applications are building intelligent bodies based on their own APPs. vivo builds a more generalized intelligent body square from the mobile phone side.

In the middle, vivo's main energy should be placed on the construction of interface standards, coupling, and user usage paradigms. To put it more straightforwardly, at the mobile phone operating system level, after the initial user intent recognition is completed, the different intelligent body applications in the intelligent body square are matched through the中台 to link their capabilities with the user's needs, thus solving the user's problems and helping the applications acquire customers.

In the process, vivo takes a step back. If there are already 50 teams in an application industry doing this thing, then vivo will definitely not do it: For example, when the user has a music demand, the intelligent body that is evoked is not vivo's intelligent body, but a music platform such as QQ Music to provide the most professional content answer.

Based on this user problem-driven, ecological partner-forward, and vivo positioning-backward orientation: At this stage, vivo has learned the operations of millions of applications.

And when quantitative change leads to qualitative change, a magical phenomenon occurs. After downloading a new application, artificial intelligence learns the various hidden functions and gameplay of the APP earlier than humans.

It has been more than ten years since Jobs shouted to redefine the mobile phone. In these ten years, the global mobile phone industry has been rolling forward along the path that Jobs initially envisioned of software ecosystem dominance and simplified touch interaction. In the process, all mobile phone enterprises have competition, but more of an unspoken understanding of progress in the same direction.

Until this moment, the large model redefines the mobile phone again.

In the past, mobile phones could only be touched and interacted, the calculator had to be an independent function, the album was just a simple classification of photos... Such刻板 experiences that have almost formed muscle memory have all been overturned overnight.

The redefinition objects of vivo have become themselves and their past military achievements.

This is destined to be a long and arduous road, and being down-to-earth is the only ticket to the future.