Why are we still unable to use AI phones?
In 2007, Steve Jobs said at a press conference that the iPhone was "a revolutionary phone."
There's nothing wrong with that statement. However, if Jobs were alive today, he would probably think that this revolution isn't thorough enough.
On the surface, the evolution of mobile phones is "technology brings about ecological changes, and ecological changes bring about changes in consumers' behavior." But if we take a step back, there's a deeper logic behind this: the relationship between humans and mobile phones has undergone a qualitative change, and the boundary between humans and machines is constantly blurring.
In the Nokia era, mobile phones were communication tools. You used them to make calls and send texts, and their presence ended the moment you actively picked them up.
After the iPhone appeared, mobile phones became "external organs." They extend your memory (memos, photo albums), your perception (maps, search), and your social life (WeChat, Weibo). Nowadays, it's almost impossible to go out without a phone, and days without a phone start to feel incredibly long - it has become our "silicon-based organ."
Now, AI phones aim to make the third leap: from being an "external organ" to becoming your "alter ego" - the phone has its own judgment, and you only need to give a vague instruction, and it will handle the rest. Don't find this strange. Since the emergence of AI, as long as technology progresses, this is the general trend.
The first - generation phones only cost you money. The second - generation phones not only cost you money but also occupy your attention. The third - generation phones not only cost money and occupy your attention but also take away some of your control.
The essential difference between the third - generation "alter ego" and the second - generation "external organ" is that the "external organ" waits for your instructions, while the "alter ego" thinks for you.
Why can't we rely solely on software upgrades?
A reasonable question is: Instead of developing new phones, why not push an AI update to old phones to directly achieve these functions?
Because a real AI experience requires three things to happen simultaneously, and software upgrades can't provide these.
The first is edge - side computing power.
Since we're making AI phones, large models must be deployed locally. Moreover, it can't just be a text model; it must be able to understand and reason about voice, images, text, and videos, and this ability can't rely on the network - that is, it can process voice, images, and text in real - time without an internet connection and conduct cross - modal reasoning.
To achieve such an ability, a dedicated NPU (Neural Processing Unit) is needed.
So, what's the order of magnitude of the NPU required to run a usable large model on the edge side?
The answer is: There's no upper limit.
The 16 - core neural engine of Apple A18 can perform 35 trillion operations per second. The NPU computing power of MediaTek Dimensity 9400 is about twice that. This is the performance of the flagship phone chips we can buy today.
Just three or four years ago, the NPU computing power of Snapdragon 8 Gen 1 was only a little over one - tenth of today's flagships. The NPU of Snapdragon 8 Gen 2 is more than four times faster than that of 8 Gen 1, and 8 Gen 3 is almost twice as fast on this basis.
This is why AI phones must start from scratch - the NPU design of old chips isn't prepared for tasks at the level of "edge - side cross - modal reasoning," just like you can't make a bicycle reach the speed of an F1 car.
The second is memory bandwidth.
Here, we need to explain a principle that many people overlook: When a large model is reasoning, the real bottleneck is often not the calculation but the data transfer. Every time the model generates a character, it has to transfer billions of parameters from the memory to the processor - this process is called KV cache read and write.
No matter how powerful the NPU is, if the memory can't keep up, it can only idle and wait. Taking a simplified 7 - billion - parameter (7B) large model as an example, a barely usable reasoning speed is about 19 characters per second, and this already requires the latest memory standard. Two or three years ago, the memory speed of mainstream flagship phones was only half or even lower than today's. If you asked it to say something, it would be like Xie Ruolin, spitting out one word at a time.
This is the consequence of insufficient memory bandwidth: The edge - side large model either can't run or is too slow to be of practical value.
Third, and most importantly: the permission architecture of the operating system.
The system design of traditional smartphones confines each app to its own sandbox, and they can't freely read or write each other's data. When they need to call data, they must apply for permission from us.
This design protects security but fundamentally blocks the possibility of AI "connecting everything."
This is because a real AI assistant needs to call the calendar, read emails, operate the map, and send messages. It requires a re - design of the permission model at the OS level. As your alter ego, it must have your permissions, and this can't be solved by just patching. Just like an ancient emperor couldn't just issue an edict to make a minister an imperial envoy. He had to give the minister a title like "Kai Fu Yi Tong San Si" or "Jia Jie Yue" and allow the minister to establish an office and have his own staff.
Therefore, the development of AI phones is a systematic project that coordinates software and hardware. Every layer, including the chip, memory, OS, and model, needs to be redesigned to make the concept of an "alter ego" technically feasible. Otherwise, it's just an AI large model with relatively comprehensive functions and can't be called an AI phone.
The biggest obstacle isn't technology but the ecosystem
The technical challenges can be solved with money and time, so they aren't the real obstacles.
The truly difficult problem lies in the business ecosystem.
At the end of 2024, ByteDance cooperated with ZTE to launch the Doubao phone (Nubia M153). Its technical solution is quite radical: AI directly recognizes the screen content through the GUI Agent and simulates manual operations, bypassing the limitations of traditional APIs. In theory, it can order takeaways, send messages, and book flights for you, crossing the boundaries of any app. The engineering prototype sold out instantly as soon as it was launched and was resold at ten times the original price.
Then, nothing more happened. This phone was almost jointly boycotted by platforms from WeChat to Taobao and various banks.
The reason is simple. Such a phone touches on the most core interests of almost all Internet platforms - data monetization.
Each super app is essentially a data collection machine. WeChat knows who you chat with every day, Taobao knows what you bought last month, Meituan knows which community you live in, and Douyin knows what content you like to watch... As long as these platforms have the data in their hands, they can create your profile and accurately push ads to you, directly improving the efficiency of monetization. For example, the things pushed to me on Taobao and the merchants recommended to me on Meituan are completely different from those shown on my wife's phone. On the same platform, different "traffic taxes" are charged according to different user profiles, and data is monetized layer by layer in this way.
Now, if the AI assistant on the phone can freely call this data, the situation will be completely different.
Because AI has its own ideas and won't calculate according to the algorithms of each platform. Today is Thursday, and the platform may recommend KFC to you according to the algorithm, but AI may recommend light meals like salads to you because it has recently analyzed your medical examination report.
Do you see why the giants are jointly boycotting AI phones?
On an AI phone, various recommendations occur in the AI interface of the phone, not in the Meituan app. Users can get recommendations without opening Meituan, and the relationship between Meituan's recommendation engine and users is quietly bypassed. Meituan still delivers food, but it has no say in what kind of food to deliver.
In short, on an AI phone, what to push and what not to push is determined by the AI on your phone, not by the algorithms of giants like Taobao, Meituan, and Douyin.
This is a big problem, folks. Because in this way, the user data that the platforms have painstakingly accumulated over the years will become the nourishment for AI overnight. The place where users make decisions will also shift from the product page to the AI interface, and the platforms will be directly bypassed.
So, at this moment, would you still invest money in these large platforms for "bidding rankings"? In the past, because these platforms decided who to recommend to consumers on the homepage, everyone was willing to invest in traffic, play the bidding ranking game, and spend money to get on the homepage.
But on an AI phone, it's the AI on the phone that makes the push decisions, not your platform. So why don't I invest my money in the manufacturer of the AI phone and let the AI speak well for me?
This is equivalent to cutting off the important traffic revenue of traditional Internet platforms, so of course, they will fight you.
Therefore, boycotting the Doubao phone at that time was the only rational choice for the platforms because this is a completely root - cutting strategy, and it must be stopped both emotionally and rationally.
The contradiction won't be solved but bypassed
Since app manufacturers have various concerns, how will this obstacle be dealt with in the end?
Laoju's judgment is: This contradiction won't be solved but bypassed.
The first way is more like piercing through the wall rather than bypassing it - the regulatory authorities force the opening through administrative orders.
The EU's DMA has already forced Apple, Google, etc. to open interoperability. In China, the promotion of platform interconnection has also been ongoing. In 2021, relevant departments required WeChat and Taobao to open external links, which was a signal. Now, in the AI era, data interconnection will naturally bring higher efficiency and give birth to more powerful products, which is a natural thing.
However, it's obvious that this solution ignores the interests of large Internet platform enterprises too much.
You know, the data moat of platforms isn't just a monopoly tool but also the return on the long - term investment of large enterprises. The social relationship chain that WeChat has accumulated over more than a decade and the transaction data jointly precipitated by countless merchants and consumers on Taobao are all valuable assets of the enterprises, representing real R & D investment, operating costs, and risk - taking, and behind them are the labor of thousands of technical personnel, affecting the employment and lives of hundreds of thousands of people.
If forced to open, it will face the question of "why" in law, and in business, it's equivalent to declaring that such investment has no return value. A more practical problem is that the official can require the opening of external links, but it's difficult to accurately define the boundary of "AI data call rights." Platforms can completely provide a formally compliant but substantially emasculated interface to deal with supervision, and such dealing will obviously reduce the user experience of AI phones, which isn't worth the loss.
Therefore, if forced to promote, it will not only be inefficient but also have many uncontrollable risks, and the associated costs will become very high.
The second way is that the operating system replaces the app as the new entry point.
Compared with "piercing through," this is more like a surprise attack like a covert operation.
Phones always need an operating system, and the operating system can always mobilize apps. So, is there a possibility that as long as the operating system is AI - enabled, the phone can naturally become an AI phone?
Today, when you open a food - delivery app, every step from searching, browsing, recommending to placing an order occurs within the app's interface, and it fully grasps your decision - making process and behavior data. But after the operating system takes over the entry point, you only need to say a word to Siri/Xiaoyi/Xiaoai, and Siri/Xiaoyi/Xiaoai will read the screen and the keyboard by themselves. Except for payment, you won't know anything else, and they will think and make decisions for you.
As for the food - delivery app, it won't even see your shadow. It only receives an instruction: Deliver a takeaway to this address.
Platforms certainly don't want to see such a scenario. But the problem is that when phone manufacturers decide to do this, platforms may have to comply - just like when the App Store took a commission, developers complained but still had to list their apps. Because that's where the users are. If one day users' decisions all occur in the AI interface of the OS, app manufacturers will lose traffic if they don't connect, and they will have to accept a downgrade if they do. This is a dilemma with no good options.
In this way, the food - delivery platform will change from a "platform" that controls users' entire behavior process to an "outsourcing contractor" that only handles order - taking and fulfillment. User data will belong to the phone manufacturer, advertising revenue will belong to the phone manufacturer, and the user relationship will also belong to the phone manufacturer. All that's left for the platform is the "delivery" segment with the thinnest profit.
The third way is a complete bypass. That is, we may directly open a second battlefield and define AI phones with a completely new set of rules. This is the most radical way and is worth explaining separately - to establish a brand - new data layer outside the app ecosystem.
The first two ways, whether it's the regulatory authorities forcing the opening or the operating system seizing the entry point, are essentially still fighting on the existing battlefield, and both sides are competing for the right to use the data recorded on apps.
But what if the data doesn't go through apps? How will you deal with it?
Anyway, we spend almost seven or eight hours a day holding our phones, so the sensors, microphones, cameras, GPS, and other modules on the phone naturally spend seven or eight hours with us every day.
None of the data recorded by these modules passes through WeChat or Taobao. But when combined, the user profile they piece together may be more complete and real than that mastered by any single app.
This is the real subversiveness of the third way: It redefines what it means to "understand the user." In the past, understanding the user meant knowing what they said, what they bought, and what they searched for - this was the strategy of apps in the smartphone era. In the future, understanding the user may mean perceiving their state, rhythm, emotions, and habits - this is the strategy of sensors in the AI phone era.
As long as you're willing to sacrifice a lot of your privacy and be constantly monitored by the phone, you can get a more extreme user experience,