In the first battle of AI-native hardware, how will Alibaba respond with its smart glasses?
As large models progress from the generation phase to the completion phase, all technology companies are asking the same question: When AI truly integrates into human life, where will it start? Will it be on web pages, apps, or some new portable terminals? The answer is becoming increasingly clear: Hardware might be the most crucial starting point in the AI-native era.
Large technology companies are exploring the form of the next-generation entry point through AI hardware. These attempts point to a common understanding: Mobile phones may not be the best containers for AI, and screens struggle to support continuous conversations, perception, and proactive services.
Technology giants with sufficient ambition for AI cannot afford to abandon hardware.
Against this backdrop, Alibaba unveiled its first self-developed AI glasses, the Quark AI Glasses, at the 2025 World Artificial Intelligence Conference.
Since AI hardware became an industry hotspot last year, product launches and demos have been springing up. However, this launch is not just the debut of a consumer electronic product. It is also Alibaba's first tangible step in integrating AI capabilities into a physical terminal since the integration of its AI To C business.
Alibaba has become the second global technology platform company, after Meta, to truly drive the implementation of smart glasses with large model capabilities.
AI + Hardware + Ecosystem Collaboration: The Physical Embodiment of Alibaba's Full-Stack Capabilities
Alibaba is intensively laying out scenarios to penetrate the C-end user market. At the end of 2024, the Tongyi App was incorporated into Alibaba's Smart Information Business Group and jointly implemented the AI to C strategy with Quark. Quark was clearly designated as Alibaba's flagship AI application. This structural change initially outlined Alibaba's strategic path in the large model era.
After entering 2025, Quark gradually became the most prominent and powerful interface in this system:
In March, Quark announced a full upgrade to the "AI Super Box" — an intelligent task center that integrates conversation, search, execution, and decision-making, no longer a traditional search box.
During the college entrance examination season in June, Quark launched features such as "In-depth College Entrance Examination Search," "Intelligent College Application Report," and "Intelligent College Selection." It generated over 12 million personalized college application reports, with more than half of the candidates from third- and fourth-tier cities.
In July, against the backdrop of large model companies competing for the "AI + Health" scenario, Quark's health large model passed the evaluation at the chief physician level of 12 core domestic disciplines, becoming the first "Chief Physician-level AI Doctor" in the consumer product system.
Each breakthrough we see now is actually the result of long-term early investment. The launch of the Quark AI Glasses is also a natural outcome.
It's no coincidence that Quark undertakes this path. As a rare "neutral" tool product in the Alibaba ecosystem, Quark has an excellent reputation among young people. It also has long-term accumulations in underlying capabilities such as voice, semantics, and images. Coupled with its proven product-algorithm coupling mechanism in scenarios such as search, health, and education, it has become the most suitable physical carrier for the AI assistant form.
As early as January 2025, 36Kr reported that the Quark team was exploring AI glasses. This time, Alibaba chose a pair of glasses as a breakthrough point to enter the hardware terminal market in the large model era.
It's not just the debut of a product but also a concrete manifestation of Alibaba's AI to C strategy. The AI glasses are defined as the physical carrier of the super entrance. Through them, Alibaba aims to create a portable AI assistant with real perception and action capabilities.
This positioning also reflects Alibaba's judgment on the next-generation terminal form: The next entrance will not be a mere combination of software and hardware but a closed-loop output of integrated capabilities of "model + hardware + ecosystem."
Specifically, these glasses integrate Alibaba's capabilities in multiple dimensions:
In terms of hardware, the Quark AI Glasses team has rich terminal experience, and most of its core members come from the hardware industry.
In terms of software and algorithms, Quark's capabilities in voice recognition, semantic understanding, and image Q&A have been verified in its app over the past few years. Based on this, it has built multiple vertical scenario models based on Tongyi Qianwen.
Finally, combined with the ecosystem capabilities of Fliggy, Alibaba Business Travel, Gaode, Alipay, Taobao, etc., it forms a multi-loop collaborative link from scenario, instruction, invocation to feedback.
These capabilities define several core features that distinguish the Quark AI Glasses from similar products. Song Gang, the person in charge of the Quark AI Glasses, told us: First, it must be a pair of comfortable glasses to wear; second, it must be an all-weather available intelligent terminal; most importantly, it must be a portable AI super assistant. "We prioritize the super AI assistant."
Transforming from a device to an assistant is not just a slogan but also reflected in the function definition. The team didn't choose to break through from the display end but emphasized "high-frequency and essential" scenarios such as voice interaction, first-person shooting, and recognition Q&A. In the early product stage, it focused on building sustainable basic capabilities.
In actual interaction, the Quark AI Glasses are equipped with voice and multi-modal large models and have core capabilities such as semantic understanding, multi-round conversation, and billion-level image retrieval. Even under real conditions such as complex lighting and accent differences that deviate from the test environment, its recognition and response performance remains relatively stable.
Combined with Alibaba's own business, more scenarios can be linked. For example, voiceprint payment can be implemented based on bone conduction. The glasses can also use the user's health and exercise data as multi-modal input data.
"We can provide a closed-loop experience," said Song Gang.
With comprehensive capabilities, the Quark AI Glasses have evolved from a single-function shooting device to a personal assistant that can accompany users in their daily lives, study, and work.
The path and logic are relatively clear, but the real challenge lies in implementation. For Alibaba, this is not only an exploration experiment of a new hardware form but also a crucial battle to verify whether its full-stack AI capabilities can form a closed loop.
Where Are the Boundaries of a Pair of Glasses?
Steve Jobs once said in an interview with a Boston public TV channel in 1990, "Users can't predict products they haven't seen. Only when the product is in front of them can they provide useful feedback."
Even today, it's still difficult for ordinary consumers to imagine what the ultimate form of AI hardware will be. The involvement of large models in C-end users mostly remains on mobile phone screens.
However, both Ray-Ban Meta and Quark AI Glasses show that tech giants are clearly looking for the next-generation terminal — a physical form that can truly change the interaction mode and serve as both an entrance and an exit.
The Quark AI Glasses don't want to simply transplant the app into hardware, nor do they want to copy Ray-Ban Meta like other domestic AI glasses. Instead, they aim to make an attempt based on a new interaction logic.
Although Meta is also working on glasses, the logics are different. Ray-Ban Meta focuses on light social interaction and photo sharing, emphasizing "recording." In contrast, the Quark AI Glasses emphasize understanding and execution. Meta is more like making the glasses "capture the world you see," while Quark "understands the world you face." The former is more like a hardware version of Instagram, while the latter is a new AI assistant.
That's why it's not suitable for all companies to develop.
Therefore, the product definition starts with "high-frequency life scenarios" from the very beginning. The AI glasses don't cover single-point, flashy AI functions but respond to users' general needs of "not seeing clearly, not remembering, and not solving problems." In the scenario of Fliggy travel itinerary reminder shown at the press conference, users only need to say "What time is my flight?" and the glasses can automatically retrieve itinerary data and display the boarding gate information on the lens, eliminating the need to take out the phone and search for the app. If combined with Gaode's indoor navigation capabilities, the entire route can be guided through voice.
Behind these scenarios is a special development team jointly formed by multiple BUs within Alibaba and Quark. Instead of simply accessing the API, they conduct in-depth customization oriented towards "Agent" to promote the transition of the business from passive response to proactive service. In addition to Fliggy and Alibaba Business Travel, Gaode, Alipay, Taobao, etc. have also launched similar collaborations. For example, functions such as QR code payment, searching for the same product and comparing prices, cycling navigation, and express delivery reminders are all building micro AI links in different life scenarios.
This can only be done by Alibaba because Alibaba has been handling and integrating various aspects of our daily lives for many years. The ecosystem already exists and only needs to be activated. For other companies, even if they have far-reaching ideas, what often holds them back is not the idea but the path. They often have to start from scratch and connect each part one by one. Each breakpoint will delay the final implementation of the AI product.
In terms of function implementation, several current ability boundaries commonly faced by AI glasses are being gradually broken through.
For example, in terms of voice interaction, the Quark AI Glasses are equipped with 5 microphones + a bone conduction system. They can be accurately awakened even in a noisy environment, understand the intention of multi-round instructions through the Tongyi Qianwen model, and then be intelligently distributed by the self-developed Master Agent central control system, significantly reducing the response delay. In the image Q&A (VQA) scenario, Quark combines its self-developed image blur detection algorithm, SuperRAW technology, and billion-level image retrieval ability, along with the inference support of the Tongyi Qianwen large model, to achieve a faster recognition response speed and higher answer quality.
The integration of multi-modal capabilities enables the Quark AI Glasses to have better understanding ability, in addition to taking photos, compared with other AI glasses that can take photos. In scenarios such as visiting museums, encountering unfamiliar plants on the road, or taking street photos, users can ask questions about the objects in front of them and get accurate and immediate AI explanations.
But the boundaries don't stop there.
A common problem for Internet companies when developing hardware is that they only focus on software and neglect hardware. The success in the Web and App eras has made Internet companies accustomed to rapid iteration and incremental progress, which is not conducive to the R & D of hardware products such as consumer electronics because the hardware development cycle is much longer. For example, Apple spent nearly two years researching and developing the epoch-making iPhone 4.
Fortunately, the Quark team has enough hardware talents to understand the hardware requirements and bottlenecks. In the specific design, the team tried to solve the common industry problem of balancing battery life and wearing comfort. It adopted a detachable temple design and a portable battery replacement case the size of an earphone case, allowing users to replace the battery at any time for all-weather battery life. It is equipped with a dual-chip system that can intelligently switch the main control chip according to the load to control power consumption. At the same time, lightweight materials such as titanium alloy are used. The temples support elastic adjustment to fit different head circumferences, and the ergonomic structure of the nose pads and ear bends helps to minimize the pressure caused by long-term wearing.
The team doesn't shy away from the complexity of this challenge. As Song Gang said, "The complete experience link of AI glasses is relatively long, so you can't have obvious ability weaknesses. Once you have obvious weaknesses, it's easy to lead to an incomplete experience, and users may find it hard to believe that you can create a product with a good experience."
The mission of AI glasses as a new terminal is not a one-time function stacking but to truly find a lightweight, efficient, and trustworthy interaction method between humans and the world.
The Battle for the Entrance Is Also a Battle for the Paradigm
Actually, the attempt to use hardware as an AI entrance is not limited to glasses. As early as 2017, Chinese Internet giants started the battle for smart speakers.
However, the AI capabilities at that time were insufficient to support complex interactions. Speakers could receive instructions and be awakened, but when users really wanted the AI to perform a task, they would encounter obstacles.
All this has changed fundamentally with the arrival of large models — understanding context, continuous conversation, proactive service. Perhaps a new form of human-computer interaction is emerging. For example, the new terminal is needed, not a touch screen, not an icon, nor a web page, but a carrier that can constantly perceive and understand in advance.
Although mobile phones are powerful, they are actively used devices. Users must take them out, open the app, and issue clear instructions to interact. In the AI era, the truly ideal terminal is always online by default: it understands first and then acts without you having to take the initiative.
Glasses are naturally within people's line of sight and hearing range and have a better first-person perception ability than mobile phones. Combined with the large model's unified understanding of vision, language, and knowledge, they can recognize the world in front of people in real-time, proactively push information, and implement multi-modal mixed instructions, replacing various complex operation steps.
The significance of AI glasses may not be the next popular hardware but the physical interface of the next-generation operating system.
That's why global technology platform companies, including Meta, Alibaba, ByteDance, and Google, have all turned their attention to AI hardware. Whether it's headphones, glasses, projectors, or rings, they are essentially answering the same question: How can we make AI available at any time when it truly becomes powerful? Where is the entrance? Who will own it?
Today, the step Alibaba takes with the Quark AI Glasses is a systematic response to this question.
This article is from the WeChat official account "Intelligent Emergence." Author: Xiaoxi. Republished by 36Kr with permission.