The battle in the AI hardware market is fierce. DingTalk achieved a turnaround in just three months.
This article is about 4,500 words long. It is recommended to take 9 minutes to read.
The AI hardware track has never been so crowded.
From the intelligent ring Stream capable of conversation developed by two former Meta employees, to glasses integrated with AI assistants and portable recording cards, and even pendants and wristbands, technology companies are trying to embed artificial intelligence into all wearable devices. Even without Elon Musk's assertion that "edge intelligence will rise," a hardware battle for the interaction entrance has already fully erupted in Silicon Valley.
Domestic tech giants are no less aggressive. At the end of August, at the release of a major version codenamed "Fern," DingTalk's first AI hardware product, DingTalk A1, made its debut, carrying DingTalk's hardware genes. Within Alibaba, this is a team that has won hardware battles. To the outside world, many people thought this was just an attack in the smart hardware track, and "a tech giant entering the arena" just attracted more attention to the already crowded track.
But many people may have misunderstood why DingTalk decided to enter the AI hardware field at this time.
Many are making grand claims and presenting various magnificent narratives about how AI will change the world, which attracts attention and capital. DingTalk is different.
If AI can change the world, it must involve the majority of people. First, people need to help AI understand the physical world. To enable everyone to actively use tools to help AI improve its understanding of the physical world, the popularization of tools is an inevitable path.
In DingTalk's view, the so - called AI popularization cannot be achieved by a few companies imposing it from top - down. It is shaped by the daily experiences of ordinary employees: it could be the creation of a form, a one - on - one response from a DingTalk customer service representative, or a DingTalk programmer visiting customers across the country. All these happen in a trivial and quiet way, like a gentle stream.
But if one can understand DingTalk's long - standing persistence, they may agree that DingTalk A1 is not born to defeat a certain company or to be a blockbuster product. It is the first move in DingTalk's AI popularization concept and the starting point of a long journey.
"The Silent Majority" and the Ever - Present AI
By the end of 2024, there were more than 60 million enterprises in China, and small and medium - sized enterprises accounted for the vast majority of all enterprises, maintaining the vitality of the Chinese economy.
However, when the AI wave roared in, some small and micro - enterprises were far from the action. They were afraid of being left behind by the times, and "we must do AI" had become a consensus. But they were at a loss as to how to do it and how to use it. They became the "silent majority" in the current AI boom.
First of all, there is a technological threshold.
Currently, large AI models are still productivity tools for a minority. They require continuous adjustment and optimization. For a small company lacking people, money, and time, the application threshold is obviously very high. Take Yiwu as an example. The vast majority of enterprises have fewer than ten employees. They don't have a clear "management department," rarely hold "meetings," and there is hardly any process.
"We see AI everywhere on short - video apps every day, and it seems that AI has dominated the whole world. But when you really enter enterprises, you'll find that many enterprises can't even use basic office software well," observed a DingTalk service provider.
Many enterprises don't know how to implement large AI models and don't have the extra energy to adapt to various AI tools. These users expect a product that can solve daily problems in the simplest and most efficient way and visibly improve productivity.
This is the real enterprise - level market in China, rich in levels and complex in environment.
Therefore, for AI to truly achieve popularization, it's not about making the models more powerful, but about eliminating the threshold.
As a product that has been in this market for many years, DingTalk clearly realizes at this moment that software as an entrance has reached its limit: software has a high startup cost. As a customer said in DingTalk's research, "When I really want to record something, I don't have time to open the software."
This is the origin of DingTalk A1, and its mission is clear: to make "AI always present."
Thomas Edison lit the first incandescent lamp in the electrical era, and Apple's iPhone, which replaced the keyboard with touch, opened the mobile Internet era. Every leap in technological revolution ultimately needs a physical hardware as a carrier. After all, humans can have more intensive interactions and connections with tangible objects.
As DingTalk's first AI hardware, DingTalk A1 is presented in the form of a card - type voice recorder. This is mainly based on the consideration of "unobtrusive" use. After all, AI hardware is a new thing, and the interaction interface must be simple and clear enough for users to get started quickly. DingTalk A1 can be attached to a mobile phone, allowing users to carry it anywhere at any time in all scenarios. It can be operated with a single button. Users only need to do two things: press the button and start.
For example, a sales company's manager may suddenly think of a to - do item or a solution and casually turn on A1 to record it. "It's much more convenient than opening the app on the phone. People won't find it strange, and it's also convenient for me."
If we look back in a few years, the significance of DingTalk A1 is far more than a card - type voice recorder. It's more like a bridge connecting the digital world and the physical world and a key position in the era of "spatial intelligence."
The First Key to Unlock Spatial Intelligence
On November 10th, Fei - Fei Li, a professor at Stanford University in the United States, published an article proposing the concept of "spatial intelligence." She believes that this will become the next peak of AI technology. Current AI systems represented by large language models, although proficient in generating text and images, still stay in the "world of language" and lack a real understanding of the real - world space, physical laws, and causal relationships.
The interactions between people and between people and objects constitute a complexity far beyond language in the world. Undoubtedly, the new - generation large AI models need to truly have initiative, perceiving, reasoning, and acting in the real world like humans.
Currently, we are enriching input methods through cameras, microphones, sensors, etc. Through voice intelligence, visual intelligence, and tactile intelligence, we are enabling AI to gradually open its "five senses" and understand the meaning of human behavior in the physical space step by step, ultimately "reconstructing" the world at the geometric and physical levels.
If we agree with this path, DingTalk A1 is undoubtedly the beginning of establishing voice intelligence.
Through this zero - threshold interface, AI can obtain continuous, unstructured, and multi - dimensional spatial information.
Compared with ordinary smartphones that can only use a microphone, DingTalk A1 is equipped with 5 omnidirectional microphones and 1 bone - conduction microphone, which can recognize sounds within 8 meters. As a result, the breadth and depth of information acquisition far exceed that of a mobile phone.
If the data injected into enterprise - level large models before was like a trickle, with the cooperation of DingTalk A1, the incoming data will be like a mighty river.
For example, the AI product "AI Listening and Recording" that cooperates with DingTalk A1 has added a "visualized recording" function. Using the 5 microphones of DingTalk A1, it can identify different speakers and their positions in space through voiceprints. When reviewing the recording, the interface will visually show who spoke when and where, restoring the meeting scene.
Ultimately, DingTalk hopes that this information can be precipitated into "knowledge" and "wisdom," improving the productivity of ordinary employees and business owners in the work scenario and helping enterprises build their own AI capabilities, that is, forming a complete closed - loop from data collection to model construction, decision - making assistance, and feedback learning.
Obviously, in the process of forming this closed - loop, the relationship between humans and AI has clearly evolved from one - way input - output to two - way human - machine collaboration. After DingTalk integrates software and hardware, DingTalk AI can process the information collected by DingTalk and integrate it into the entire work process.
For DingTalk users, the most obvious change is the assistance in "decision - making and action," which turns DingTalk from a passive work software into a self - driven advisor.
For example, an entrepreneur engaged in RV import and export used to make a call to an overseas customer, take notes, translate, and then write an email himself. Now, after using DingTalk A1 in meetings, the phone call is directly transcribed into Chinese in real - time, and an email is generated according to the meeting requirements. All he needs to do is make the final revisions and reviews.
For employees, this efficiency improvement doesn't add extra time cost and can help them continuously improve their business. For example, a social worker records meetings with DingTalk A1. DingTalk can organize the text through AI and provide analysis and summaries based on the meeting minutes. For example, the AI assistant will tell him/her what methods were used in completing the task, which aspects have been done well, and which aspects are lacking.
This is a silent revolution led by DingTalk. In the offices and warehouses of countless enterprises, DingTalk uses AI to liberate the productivity of small and medium - sized enterprise managers and front - line employees. As Mo Shang, a staff member in DingTalk's service center, said, "What DingTalk is doing is actually laying pipelines. Only by building these pipelines and providing the best service can AI capabilities, computing power, and data, like water, electricity, and gas, really flow into every small enterprise."
Alibaba's First Truly Successful AI Hardware
In the enterprise - level AI market, the source of data comes from tens of millions of enterprises across the country. Data is undoubtedly the raw material for AI applications, and computing power is the foundation of the AI world.
Globally, the arms race in AI computing power and talent has reached the middle stage. After entering the era of spatial intelligence AI, it is foreseeable that the territories of major forces will continue to expand.
Taking enterprise - level AI hardware as an example, the strong industrial chain in the Pearl River Delta region in China has completely eliminated the threshold for hardware manufacturing. For just a few dozen yuan, you can manufacture a card - type voice recorder in Shenzhen. Obviously, what AI hardware ultimately competes on is still AI software capabilities, and even the development capabilities of the entire system and ecosystem.
Only the continuously advancing strong players will remain in the end. Therefore, the responsibility of AI popularization must be borne by a national - level application like DingTalk. After all, DingTalk's AI relies on the computing power, technology, and talent of the entire Alibaba Group.
As an important layout of Alibaba, DingTalk most intuitively demonstrates such strength. At this year's press conference, DingTalk released more than 10 AI products at once, including DingTalk One, AI Search and Inquiry, AI Spreadsheet, AI Listening and Recording, and the smart hardware DingTalk A1. The launch of such a product matrix in terms of quantity, quality, and speed is beyond the reach of startups.
Relying on Alibaba's technology also gives DingTalk's AI products amazing efficiency. DingTalk's AI Spreadsheet and Alibaba Cloud's ADB - PG database team jointly launched the storage - computing integrated architecture O - Table, which supports real - time updates of tens of millions of rows in a single table in just seconds.
DingTalk A1 is, strictly speaking, Alibaba's first truly successful AI hardware. It is also the first card - type voice recorder in the industry to add a real - time transcription function. Real - time means there is no time for the AI model to make corrections, and the error tolerance rate is extremely low. On the computing power side, if the post - recording transcription is changed to synchronous transcription, the model must have the computing power to handle high - concurrency scenarios.
Currently, card - type voice recorders on the market can only upload recordings to the cloud after the recording is finished and then be processed by an external large model. The reason they don't add the real - time transcription function is that this function will exponentially increase the difficulty and cost of transcription.
More importantly, card recorders generally charge by recording duration, only calculating the final recording duration uploaded to the cloud. Once the real - time transcription function is added, the entire charging model needs to be reconstructed. So the lack of this seemingly simple function in current products is actually an inevitable result of the computing power gap.
The heavy responsibility of being a pioneer falls on DingTalk because it is precisely such barriers that make DingTalk A1 dare to attempt real - time transcription. Moreover, DingTalk A1 provides 1000 minutes of free recording time, the highest in the industry, and the real - time transcription of DingTalk A1 is not included in the duration.
Undoubtedly, as a hardware product, DingTalk A1 is not a high - ROI product. The computing power cost behind the product is difficult to amortize in the short term.
Actually, DingTalk is also the first in the industry to use the AI pay - by - effect model on the AI software side. When enterprises are still hesitant about AI paid products, as a leading enterprise, DingTalk is willing to pay the "entrance ticket" for small and medium - sized companies. The ultimate goal is to enable enterprises to "dare to use, be able to use, and afford AI."
This can be regarded as DingTalk's responsibility for AI popularization.
"They Really Want to Do This Well"
"Many people think that elites can change the world. No, it's the down - to - earth elites who change the world," said Wuzhao, the founder of DingTalk, in an interview with 36Kr after the release of DingTalk A1 in August.
Michael Sandel, a political philosophy scholar, once proposed that the success of elites not only depends on their own efforts but also on factors such as birth, luck, and timing. Once elites think this luck is taken for granted, they will fall into the "arrogance of elites."
As one of Alibaba's core products, DingTalk can naturally mobilize a large amount of resources and funds, and its starting line is far ahead of its peers.
For example, behind the more than a dozen products DingTalk released this time, there is extensive support from Alibaba. Taking the product "AI Listening and Recording" as an example, DingTalk completed more than 100 million hours of audio - video data training with Tongyi Laboratory. As a result, the recognition accuracy of more than 30 Chinese dialects and 140 global languages reached 90% - they completed in one month what a startup would take years to achieve. This product ability, when applied to the DingTalk A1 hardware, can quickly set it apart from others in the voice transcription ability.
This is a reality that can easily make people proud: although the success of the product is due to the team's efforts, we cannot deny DingTalk's "born - with - advantages" resource endowment.
However, the DingTalk team always keeps a distance from this "sense of superiority" and strives to maintain a humble and down - to - earth style.
If you look deep into DingTalk, you'll find that this is a very "practical" team from top - level management to front - line employees. Their management and R & D personnel have to visit customers in person, listening to many seemingly trivial questions from customers, such as not being able to find the entrance or why there is a charge. The product team of DingTalk A1 is the same. Some users on Xiaohongshu praised this team: in the community, official staff reply to users' questions one by one every day. It can be seen that they really want to do this well.
During the just - ended Double 11, DingTalk A1 became a dark horse in the voice recorder category, leading in sales volume and turnover on platforms such as Douyin and Tmall. Its sales volume exceeded 10,000 in just one week, leaving behind a group of new and old players. It has only been 3 months since its release, and DingTalk has begun to transform this track in a counter - attacking posture.
For DingTalk, DingTalk A1 is just the beginning of truly defining a new work style in the AI era and the start of constructing a piece of the puzzle for AI to change human life. DingTalk's hundreds of millions of users and even competitors are all part of these constructors. They will compete and cooperate, but all will move towards a common future.