HomeArticle

Who is defining AI hardware in 2026?

晓曦2026-05-22 13:45
Full-scenario symbiosis changes the commercialization logic of AI hardware

In 2026, AI hardware, which was in a critical period of industrial leap, bid farewell to the stage of scattered concept stacking.

The series of national standards for "Intelligent Grading of Artificial Intelligence Terminals" jointly issued by the Ministry of Industry and Information Technology, the Ministry of Commerce, and the State Administration for Market Regulation have set a clear scale for this restless track. It divides terminal intelligence into four levels from L1 to L4, gradually rising from the response level to the collaborative level.

This set of standard systems clarifies five capability elements: perception, cognition, execution, memory, and learning. It covers seven categories including mobile phones, computers, TVs, glasses, car cockpits, speakers, and headphones. It basically outlines the first batch of AI hardware forms expected to be popularized on a large scale and provides specific testing methods.

For consumers, they no longer need to painstakingly understand the technical logic or listen to the self - promotion of manufacturers to figure out how smart a device is.

Almost at the same time as the release of the standards, Alibaba Cloud showcased the implementation results of multiple AI hardware products at the Alibaba Cloud Summit held on May 20th. Meanwhile, it announced the joint launch of the "Qwen Intelligent Hardware X Tmall Cooperation Plan" with Tmall. The plan includes exclusive rights of the Qwen model, support of Tmall's hundreds of millions of traffic, and full - domain brand exposure resources. The two parties will jointly invest over 100 million resources to help hardware manufacturers achieve value leapfrog from three dimensions: technology, brand, and sales channels, and accelerate the emergence of new AI hardware species.

The Tmall 618 promotion is about to kick off. Multiple AI hardware products equipped with Qwen capabilities will be unveiled on Tmall. The two platforms will jointly provide traffic and brand exposure resources to promote the accelerated commercial implementation of AI hardware. The state has drawn a pyramid for AI hardware, while cloud providers offer the necessary capability base to climb the pyramid.

These rapidly occurring changes point to the same trend:

AI hardware is moving from concept verification at the edge side to large - scale popularization with edge - cloud collaboration, and the release of the capabilities of AI cloud services happens to coincide with this turning point.

01. Who Stays at L1 and Who Rushes to L4?

From L1 to L4, each level of leap corresponds to a higher capability threshold.

L1 devices can only execute preset instructions. In essence, they are the intelligent versions of traditional electrical appliances. L2 devices start to have tool attributes, and users can actively call certain functions.

Yu Xiuming, the deputy director of the China Electronics Standardization Institute, pointed out when interpreting the standards that through research, testing, and analysis, currently, products with a relatively high user ownership rate are generally at the L1 and L2 levels, and some new products can reach the L3 level.

Overall, AI terminals are evolving in parallel along three paths: upgrading traditional terminals, expanding the quantity of emerging terminals, and exploring future terminals.

The real watershed is at the L3 assistance level. The core of L3 is that the terminal can fully understand users' instructions and intentions, and has the ability to actively identify and provide services.

Taking smart air conditioners as an example, an L3 - level device can automatically detect whether there is sweat on the user's forehead and then actively lower the temperature. After the user presses the "leave home" mode, the camera will first determine whether there is still someone at home, and then turn off the lights after the person leaves. These actions require comprehensive input from audio, video, and sensors to make complex intention recognition and judgments. The standards require the device to have the ability to understand complex intentions, perform chain reasoning, and have long - term memory, which means the device not only needs to answer what it is but also understand why and even predict what to do next.

Some hardware manufacturers have been standing still at the L1 level in the past few years, showing several typical characteristics.

One is that the product definition is too closed, only solving a single function, and not reserving sensors or computing power redundancy for subsequent upgrades. Another is the excessive reliance on lightweight models at the edge side, resulting in a breakdown in capabilities in complex scenarios.

There is also a more hidden problem: packaging L1 functions as L2 or L3 gimmicks. Such products will quickly be exposed in front of standard tests, and consumers will vote with their feet.

Regarding this, Chen Liwei, the deputy general manager of the Solution Architecture Department of the Public Cloud Business Unit of Alibaba Cloud Intelligence Group, believes that the entire hardware industry is in the stage of moving from L2 to L3. Whoever can build the L3 infrastructure first and achieve the L3 - level product experience will be able to capture a larger market share.

Staying at L1 or even L2 is no longer a safe zone. To smoothly enter the L3 stage, the cooperation between multi - modal perception and generalized reasoning is required.

At this Alibaba Cloud Summit, the Qwen3.7 - Max, the flagship model of Qwen, was also grandly launched. In the global blind test ranking of large models by the third - party institution Arena, Qwen3.7 - Max ranks first among domestic models, comparable to the world's strongest models.

The original intention of designing Qwen3.7 - Max is to make the model the core of the Agent, with the ability of autonomous planning, continuous iteration, and cross - device collaboration. The technological upgrade happens to meet the requirements for perception and cognition elements at the L3 level. Currently, the multi - modal interaction development kit provided by Alibaba Cloud for the intelligent hardware industry fully supports the access to Qwen3.7 - Max.

The stronger the cloud - based generalization ability, the lower the L3 adaptation cost of the hardware. Chen Liwei also pointed out: "Today, no single hardware product can achieve an end - to - end closed - loop user experience through a single model. The solution must be a combination of multiple models."

02. Edge - Cloud Collaboration Becomes a Necessity

After the L3 assistance level, the L4 collaborative level will be an even greater leap.

From the existing definition, the core feature of L4 is not whether a single device is smarter, but that multiple devices form an intelligent system. When a user enters the home, the glasses, speakers, robots, and cockpits will automatically share memories and then serve the user in the physical world.

Therefore, in the future, the biggest challenge for hardware manufacturers to smoothly implement technology and products at the L4 level is system integration and device collaboration.

In the standard classification table, most products from mobile terminals to glasses and headphones are marked as edge - cloud collaboration. The logic behind this is straightforward: real - time response depends on the edge side, and complex reasoning depends on the cloud side, which is the current optimal solution for intelligentization.

Ecovacs' housekeeping robot "Bajie" is a typical example. Considering the ability of continuous iteration of open - source models, Ecovacs chose to access the Qwen large model early on.

The core challenge of the housekeeping robot comes from the non - standard nature of the home environment, which has high safety requirements, high information density, and very long - tail needs. One of the solutions of Ecovacs' "Bajie" is to encapsulate the robot's atomic capabilities (grasping, picking and placing, perception, planning) into API interfaces that are easy for the model to understand. The cloud side processes complex tasks such as environmental perception and action decomposition based on Qwen3.6 - Plus.

When the user gives a vague instruction such as "tidy up the living room", the cloud side can first understand what objects are in the living room and what the tidying standard is, and then break it down into a series of action instructions and send them to the robotic arm. There is no need for pre - programming behind this series of understandings. The intelligent agent on "Bajie" actively strings together the tasks.

Currently, Ecovacs has also opened up the system, atomic capabilities, and simulation platform of "Bajie", allowing more ecological partners to easily participate in the algorithm development and application implementation of household robots through "Bajie".

The Shenmu series of products under Hangzhou Yanjimei also confirms the necessity of edge - cloud collaboration. As a company focusing on low - power intelligent imaging, the core of Yanjimei's products is to optimize the power supply and network communication problems of cameras, achieving no network and no power supply. The challenge brought by low power consumption is that the computing power of the edge - side chip is limited and cannot bear the inference load of large - scale models.

Their solution is to perform real - time labeling and preliminary processing on the edge side. The edge - side AI chip identifies people, cars, and non - motor vehicles in the picture, and then uploads the text and picture information to the cloud side through a low - power 4G beacon. The cloud side then conducts in - depth understanding and structured memory based on the Qwen large model, allowing users to ask the camera questions like searching an album, such as "What color cats appeared at the door yesterday afternoon". This kind of experience is almost impossible to achieve with a pure edge - side solution.

Based on this architecture, the company's paid conversion rate has increased by 25%, the average customer unit price has increased by 30%, and the continuous retention rate of paid users has reached over 75%. AI capabilities have been directly transformed into commercial competitiveness.

The division - of - labor model of edge - cloud collaboration is becoming an industry consensus, and the role of cloud providers has also changed significantly.

In the past, cloud providers only provided cloud resources such as computing power and storage. Now, they provide edge - cloud collaboration and infrastructure bases around Agents. They package visual understanding, task planning, and even front - end code generation capabilities into callable services, reducing the threshold for hardware manufacturers to embed AI capabilities into existing systems from the development layer.

Chen Liwei also summarized the four core challenges of Alibaba Cloud at present: model combination, engineering complexity, continuous operation ability, and data closed - loop.

When it comes to model combination and engineering, it is worth mentioning the previously released next - generation full - modal large model Qwen3.5 - Omni.

Qwen3.5 - Omni has achieved SOTA in 215 tasks such as audio - video understanding, recognition, and interaction, greatly enhancing the real - time interaction experience and having "high emotional intelligence". More surprisingly, Qwen3.5 - Omni shows the ability of audio - video Vibe Coding. When users describe their needs in front of the camera, the model can independently generate complex product codes such as APPs, web pages, and games. The real - time full - modal ability provides the key technical foundation for AI hardware to move from L1 and L2 to L3 and L4.

While full - modal models are maturing, hardware manufacturers are also exploring differentiated implementation paths.

For example, Leishen Robotics, a company focusing on toC humanoid robots, is arranging an interesting edge - cloud collaboration attempt. Users can completely take over the robot's AI system through their own computers or local intelligent agents via the home local area network, enabling the robot to have customized capabilities such as smart home control, dialect conversation, and personalized topics.

Guangfan Technology, which has just launched the world's first AI headset with visual perception ability, has observed that the biggest change in the AI hardware industry in the past year is "speed". The iteration speed of software and hardware is amazing. AI has evolved from simple chatting to having intelligent agents and self - learning capabilities, and the things it can do are increasing significantly every day. Guangfan's implementation path is to build an AI - native operating system with a wider scope than OpenClaw, covering multi - modal interaction, hardware scheduling, software scheduling, and computing power scheduling.

The exploration of "front - line players" proves that edge - cloud collaboration is a long - term theme that is "difficult but correct". Cloud - based intelligence is evolving rapidly, while the execution ability and hardware scheduling ability at the edge side are still the key variables determining the intelligentization stage of AI hardware.

03. Where the Collaboration Boundary Is, Where the Market Is

In addition to providing technological guidance, the significance of the grading standards also lies in releasing signals at the commercial level.

Consumers can evaluate products based on the L1 to L4 levels. Driven by this, hardware manufacturers will also have a clear upgrade roadmap.

Especially for start - up companies, it is unrealistic to self - develop multi - modal models and inference frameworks. More manufacturers need a standardized AI base and a clear commercial return path.

The commercial imagination of AI hardware services can be traced from the high user stickiness of Luka Doctor's AI learning camera. The public data of Luka Doctor shows that the average daily usage time of early users was only more than 30 minutes. After accessing Qwen3.6 - Plus, the average daily usage time increased by 50%, and about 50 million photos taken by users interacted with AI every month. More accurate object recognition and OCR capabilities have led to more frequent image recognition, and the enhancement of generalized reasoning has increased the number of question - answer rounds. The quantifiable progress of the AI base is directly reflected in the qualitative change of user stickiness.

After users have hundreds of interactions on hardware devices every day and accumulated a large amount of personal interest data, a natural need emerges: how can these memories and preferences be linked to other devices? For example, continue to formulate learning tasks based on the data on school devices.

After the intelligentization level of a single device reaches a certain height, the real imagination of the market will lie in the system intelligence under full - scenario symbiosis.

The core feature of the L4 collaborative level mentioned in the standards is cross - device collaboration and user preference memory. A mobile phone, a pair of glasses, a cockpit, and a speaker form an intelligent network around the user.

When you wear glasses and enter the car, the cockpit automatically switches to your driving preferences. When you say a word to the speaker, the robot starts to tidy up the living room. A consistent experience requires all devices to share the same cloud - based intelligent base, and also requires cloud providers to provide a unified identity, memory, and execution scheduling system.

Full - scenario symbiosis will directly change the commercial logic of AI hardware.

In the past, most hardware businesses made money from the supply chain, and each sale was a single transaction. Now, the addition of AI has opened up new imagination. In the future, premium services can also be continuously provided through subscription.

In the collaborative scenario, users are more willing to pay for the continuous experience across devices, such as subscribing to personal assistant services and purchasing scenario - based skill packages. As a result, the value distribution of the entire track will be reshuffled.

Take an existing example. After Rokid glasses access the Alibaba version of OpenClaw product JVS Claw on the edge side, office workers can efficiently complete operations such as creating calendars, replying to WeChat messages, and making payments. If these high - frequency behaviors can be further integrated and precipitated into scenarios to improve work efficiency, subscription services for life assistants can be extended.

During the 618 promotion, Tmall also launched dozens of host brands equipped with JVS Claw, fully accessing intelligent assistants and ushering in the Agent PC era.

Hardware has become the entrance to services, not the end.

The wave of market reconstruction will surge towards products that can integrate into this intelligent network and gradually abandon L1 - level devices that are like isolated islands.

The grading standards provide guidance for the industrial end - game, edge - cloud collaboration provides a definite path, and the standardization capabilities of cloud providers are making this path wider and smoother.