HomeArticle

Beyond mobile phones, what other opportunities are there for AI hardware? Let's start with Doubao Mobile.

中欧商业评论2026-01-29 16:48
Doubao's Dilemma: Misaligned Ecological Niches and Three Paths to Break Through in AI Hardware

Editor's Note

In a seemingly ossified "winner - takes - all" market, challengers with sufficient strategic patience and systematic capabilities can still break the deadlock.

Recently, Doubao Mobile Assistant has sparked a rather dramatic controversy. This AI application, which claims to be able to "take over the phone", was once regarded by users as "the AI teammate who finally understands me" because of its abilities such as simulated clicking and screen recognition. It precisely addressed the pain point of users frequently switching between multiple applications and painted a picture of a future where intelligent agents are highly autonomous. However, it didn't last long. WeChat, Alipay, and many bank apps successively issued risk warnings and implemented technical blockades, quickly pushing this frenzy into an embarrassing situation of "ecological encirclement".

The division in the public opinion field is quite typical. Supporters condemn Internet giants for using their monopoly position to stifle innovation, believing that this is an attempt by old forces to suppress new technologies. Opponents, on the other hand, are worried about data security and privacy boundaries, believing that this "system - level takeover" brings uncontrollable risks. But if we step out of this binary - opposition sentiment of "innovation versus monopoly" and examine it from the perspective of business evolution, we will find that perhaps neither side has touched on the core of the problem.

The focus of this debate - the right to innovation and ecological security - is undoubtedly important, but both sides have evaded a more fundamental question: In this digital ecosystem, where exactly does Doubao stand? Why has its technical path triggered such a strong counter - measure? This article attempts to explore the answers to this question from three dimensions: ecological niche, connection mode, and implementation path. At the same time, we also want to ask: In the golden age of the combination of AI and hardware, where exactly are the real opportunities?

Doubao's Triple Dilemmas

Dilemma 1: Fundamental Mismatch of Ecological Niche

To understand Doubao's first dilemma, we need to introduce a key concept: ecological centrality. Simply put, centrality measures the degree of control a product has over users' core resources - including operating system - level permissions, account systems, data sovereignty, and the resulting ecological discourse power. The higher the centrality, the greater the product's discourse power in the ecosystem and the lower its degree of being restricted by others. The mobile phone is precisely the epitome of a high - centrality product and a super - battlefield that giants guard strictly.

So, what is the ecological niche of Doubao Mobile Assistant? It is neither an operating system, unable to schedule resources from the bottom layer, nor a super app, lacking an independent account system and data precipitation. It is a third - party application that attempts to "take over" the operation processes of other applications through simulated clicking and screen recognition. The essence of this ecological niche is low centrality - it has no substantial control over users' core resources.

This mismatch leads to a structural unequal competition. Doubao tries to seize control of the high - permission ecosystem with a low - permission stance. It wants to control WeChat to send messages, make Alipay complete transfers, and command bank apps to query balances, but it neither has the underlying permissions of these giants nor the trust endorsement of users for these giants. This is like trying to establish a new coordination mechanism within the existing ecological system without the authorization and support of the ecological dominator - even if it temporarily has certain technical means, it will inevitably face counter - measures from the ecological dominator because this approach fundamentally challenges the existing permission distribution system.

The deeper contradiction lies in that Doubao's value proposition is based on eroding the control power of the existing ecosystem. The more successful it is, the more it threatens the core interests of the operating system and super apps. This is not a confrontation between technological innovation and old forces, but a low - centrality tool trying to reconstruct the rules of a high - centrality ecosystem. In business competition, the cost of this mismatch is often fatal.

Dilemma 2: Structural Fragility of Connection Mode

Doubao's technical path seems ingenious but is actually built on an extremely fragile foundation. This fragility is reflected in two aspects: technology and interests, which together constitute the fate of a parasitic connection.

From a technical perspective, Doubao relies on simulated clicking and screen recognition to achieve cross - application operations. This means that it must accurately identify the interface elements of each app - where the buttons are, where the input boxes are, and where the confirmation keys are. However, these interfaces are not static. Every time WeChat updates its version, it may adjust the layout of the chat interface. A single revamp of Alipay can completely change the transfer process, and the security upgrades of bank apps are even more frequent. Every time the interface changes, the operation scripts carefully constructed by Doubao will become invalid. This dependence is not an accidental technical defect but an inherent attribute of the parasitic model - the survival of the parasite depends on the stability of the host, and the host not only has no obligation to maintain stability for the parasite but also has the motivation to expel the parasite by making changes.

Even more fatal is the conflict at the interest level. This conflict exists in three progressive levels.

The first level is the competition for data sovereignty. To achieve true intelligence, an AI assistant must deeply understand users' behavior patterns - which means it needs to access WeChat's social relationship graph, Alipay's consumption records, the calendar's schedule, and the health app's physical sign data. However, these data are precisely the foundation for super apps to build their moats. WeChat's social relationship chain supports its precise advertising placement and mini - program ecosystem, and Alipay's consumption data is the core asset for its credit assessment and financial services.

The second level is the zero - sum game of business logic. Asking Tencent and Alibaba to open these core data interfaces to a third - party system is equivalent to asking them to actively remove their business barriers. This is not a problem of technical docking but a fundamental conflict of interest distribution. Every bit of data sovereignty that Doubao acquires means a piece of territory lost by the giants. When WeChat discovers that Doubao is reading users' chat records and when Alipay discovers that Doubao is analyzing users' consumption behavior, their risk - control systems will make the only rational response - to block. This is not conservatism but an instinct for business self - defense.

The third level is the resistance of users' psychology. Even if the giants are willing to make some compromises under regulatory pressure, will users be willing to accept this deep - binding? When a system needs to fully penetrate your digital life to provide value, it is no longer a tool that can be replaced at any time but has become the custodian of your digital personality. This deep - dependence not only brings anxiety about data leakage risks but also an instinctive resistance to the loss of autonomy. Once all behavior trajectories are entrusted to AI for overall planning, the relationship between users and the system changes from "active use" to "passive dependence", and crossing this psychological threshold is much more difficult than technological implementation.

A parasitic connection cannot establish a stable business relationship. The host can cut off the survival space of the parasite at any time through technological upgrades or strategic adjustments. Doubao's dilemma is not that its technology is not advanced enough but that it has chosen a connection mode that is fundamentally unsustainable. On the contrary, if a symbiotic connection that allows the host to also benefit can be established, the situation will be completely different - this is precisely the core logic of Mode 3 later in this article.

Dilemma 3: Fatal Defects in the Implementation Path

The invention of the automobile was not about installing an engine on a horse but about reconstructing the underlying logic of the entire means of transportation - from biological power to mechanical power, from the carriage frame to the internal combustion engine chassis. This metaphor is important because it reveals the essence of a technological revolution: True innovation is not about superimposing new technologies on the old model but about fundamentally reconstructing the implementation path.

It should be clarified that the "AI middle layer" itself is not the wrong direction. Voice interaction, intention understanding, and intelligent agents are all inevitable trends in the evolution of future human - machine interaction. The problem is not the introduction of AI as an intermediary but the specific implementation method chosen by Doubao - achieving automation through simulated clicking. This is precisely a transitional implementation method like "installing an engine on a horse": the physical structure of the horse (the UI interface of the app) is not designed for the engine (AI automation). The result of a forced combination is that it retains all the defects of the old model (interface fragility, permission restrictions) and introduces the additional costs of new technologies (supervision burden, trust anxiety), ultimately resulting in a "mechanical horse" that runs slowly and is prone to falling apart.

This path design leads to a fatal experience defect: the embarrassment of semi - automation.

The core problem lies in the fuzzy trust boundary. When you operate manually, each step is completed with the dual feedback of vision and touch, and you can immediately detect and correct errors if they occur. However, when AI takes over through simulated clicking, you must conduct a global check at the last step - when booking a flight, would you really dare to pay without checking the flight information? Before sending a message, would you really dare to send it without confirming the recipient and content? This mandatory secondary confirmation actually expands the original "3 - step operation" to "observing AI execute 10 steps + manual confirmation 1 step". Users' attention never leaves, and they are even more nervous because they cannot intervene in real - time.

Doubao's semi - automation neither achieves the sense of liberation of full automation nor maintains the sense of control of manual operation. It falls into a "structural trust challenge" of automation - users can neither completely let go nor directly intervene. The essential problem with this experience is not that the technology is not intelligent enough but that the implementation path is wrongly chosen. When you choose to "take over" users' operations through simulated clicking, you are destined to face this structural trust challenge.

The more fundamental contradiction is that the 20 - year evolution direction of mobile phones is to eliminate the middle layer and allow people to directly communicate with information. From the intelligent prediction of input methods, to the shortcut operations of apps, to the one - screen access of widgets, all optimizations are aimed at reducing users' operation steps. However, the "automation" achieved by Doubao through simulated clicking actually re - inserts a complex and opaque middle layer. This is not a question of technological advancement but a fundamental regression of product experience.

From Dilemma to Outlet

Doubao's triple dilemmas reveal a more macroscopic problem: In the super - battlefield of mobile phones strictly guarded by giants, trying to seize control through parasitic technological means is doomed to be an unwinnable war. However, this does not mean that Doubao should abandon AI hardware. Doubao has leading domestic AI capabilities, which are real hard - power. The problem lies not in the capabilities but in the way of releasing them.

Since the mobile phone battlefield is full of difficulties, we need to find a new direction. Avoiding direct confrontation with high - centrality giants, establishing a sustainable connection mode, and choosing the right implementation path - three completely different models emerge as a result (Table 1).

The selection logic of these three models has different focuses: Mode 1 reconstructs the "value proposition" by making the hardware the local custodian of users' digital assets and building an irreplaceable moat through data accumulation; Mode 2 reconstructs the "key activities" by designing dedicated AI hardware for a specific high - value vertical scenario to build an end - to - end closed - loop; Mode 3 reconstructs the "channel path" by bypassing the interface layer of super apps and establishing a symbiotic connection based on APIs with a conversational AI as the intention entry. These three models do not attempt to control users' mobile phones and existing apps but create new value space by reconstructing the elements of the business model.

Table 1 Three Business Models of AI Hardware

Mode 1: Local Data Hub - Reconstructing the "Value Proposition"

Core logic: Become the local custodian of users' digital assets and build an irreplaceable moat through data accumulation.

The key to this model is to make the hardware device the hub of users' digital life, locally storing and managing users' most core digital assets - account credentials, personal preferences, historical operation records, private data, etc. Different from cloud services, local custody brings three unique advantages: absolute privacy control, as sensitive data is not uploaded to the cloud and users have complete ownership and control; offline availability, as it does not depend on network connection and the device can always call local data to provide services; instant response, as the access speed of local data far exceeds that of the cloud, bringing a better user experience.

As the usage time increases, more and more data accumulates on the device, forming a strong locking effect. When users' digital identities, habitual patterns, and preference settings are deeply bound to this device, the cost of replacing the device becomes extremely high. More importantly, these local data provide an irreplaceable value - added space for AI - personalized recommendations based on users' historical behavior, intelligent decision - making based on preference data, and considerate services based on private information, all of which can only be achieved by local AI.

Inspiration for AI hardware: Doubao can launch an AI hardware positioned as a "personal digital asset butler". Instead of simulating clicks on apps on the mobile phone, it allows users to actively entrust core data to the device. For example, through the import function, users can migrate historical chat records, work documents, personal photos, health data, etc. to the device at one time. The device builds a semantic index through the local AI model, and users can ask at any time: "In which meeting did I discuss the quarterly goals?" or "Help me find all the discussions about Project A last year."

The key lies in two points: users have complete sovereignty over the data, and the device only serves as a local processor without uploading data to the cloud, fundamentally eliminating privacy concerns. As users import more and more data, the value of the device increases exponentially - it changes from a tool to an "external brain" of users' digital life, and users are neither willing nor able to replace it easily.

Mode 2: Special - Purpose Hardware for Vertical Scenarios - Reconstructing the "Key Activities"

The core of this model is to design dedicated AI hardware for a specific high - value scenario. Here, the hardware is not just a container for algorithms but provides more than 10 times the efficiency in a specific scenario through dedicated sensors or processing capabilities. Its key feature is an "end - to - end closed - loop": the hardware itself can complete the delivery of core value without relying on the API interfaces of external service providers or the general computing power of mobile phones.

Before discussing successful cases, we must examine the failure of the Humane AI Pin. This product, once highly anticipated, attempted to completely replace smartphones through projection and voice interaction, challenging general scenarios. Its failure provides three key lessons:

Lesson 1: Do not challenge smartphones in the fields where they are most proficient. In general scenarios such as sending emails, checking maps, and browsing information streams, after 20 years of evolution, the touch - interaction of smartphones has reached the peak of efficiency - the perfect combination of muscle memory and visual feedback, achieving millisecond - level response and confirmation.

Lesson 2: The change of interaction mode must bring about an improvement in efficiency rather than a regression. Trying to use voice interaction to replace this efficient "what you see is what you get" touch - interaction not only fails to improve efficiency but also increases users' cognitive burden. Users find that it takes repeated confirmations to ensure the accuracy of voice recognition when sending an email with the AI Pin, while it only takes a few seconds with a smartphone.

Lesson 3: Cool technology does not equal a good product. This is a typical case of "innovation for the sake of innovation". Although the technology is cool, the experience regresses. The opportunities for AI hardware thus become clear: not to challenge the general scenarios of smartphones but to find breakthroughs in vertical scenarios where smartphones are ineffective.

In contrast, Plaud Note shows the right way. This Shenzhen - based company, founded in 2022, received more than 20 times the target funds during the crowdfunding on Kickstarter in 2023 and its shipment volume exceeded 500,000 units in 2024. It did not attempt to replace smartphones but focused on the extremely narrow aspect of "business recording".

In business meetings or phone communications, mobile phone recording often faces many pain points: poor sound quality due to noisy environments, inability to distinguish speakers in multi - person meetings, and large storage space occupied by recording files. What's more painful is the post - meeting sorting. Listening to a one - hour recording and manually organizing notes often takes two hours, and it is difficult to quickly retrieve key information. The solution provided by Plaud Note is a typical "vertical - special" strategy. At the hardware level, it adheres to the back of the mobile phone through MagSafe and uses a beam - forming microphone array to obtain high - quality sound sources at the studio level and directly completes noise reduction