AI Agent invades the office
Not long ago, at the World Artificial Intelligence Conference (WAIC) in Shanghai, the "AI Productivity Wonder House" booth was crowded with people.
A corporate manager simply described the requirement in one sentence: "You act as an intelligent customer service assistant, responsible for answering customers' questions based on the document", and then uploaded a product document.
In less than a minute, an AI customer service capable of handling professional consultations was built, and it could respond to customers' questions instantly.
This scenario vividly demonstrates the revolutionary penetration of AI Agents in the office scenario - it is no longer just a concept demonstration but has become a "digital employee" that can shoulder KPIs and integrate into the core processes.
01 From "Showcasing Skills" to "Pragmatism"
In fact, the implementation of AI in the office field has not been achieved overnight but has gone through a step - by - step development process from shallow to deep.
If we look back two years, it was exactly when ChatGPT emerged. The industry continued to delve into the technical depths of large models on one hand, and explored the implementation and application of large models in different fields on the other. Represented by Microsoft Office Copilot and WPS AI 1.0, AI officially entered the office scenario.
At this time, AI - enabled office work was just starting. As a functional plug - in, AI provided assistance such as text generation, format optimization, and basic data analysis. Its characteristic was "passive response" - users issued instructions, and AI performed a single action, without forming a complete task loop. We can call this stage the Copilot Assistance Stage.
With the in - depth application of large - model capabilities such as DingTalk AI and the BetterYeah platform, AI Agents gradually showed the characteristics of task automation and initial autonomy in the office scenario. By mid - 2024, AI - enabled office work entered the Agent Task Stage.
At this stage, AI can understand the context based on instructions and complete tasks through multiple steps. For example, 3300 AI assistants created by rookie employees can automatically handle 80% of HR consultations; the "Business Travel Data Inquiry AI" can generate personalized reports in 3 - 5 seconds, saving thousands of person - days of cost annually. Although AI at this stage begins to undertake standardized processes, it still deeply depends on manually defined rules.
Observing the recently concluded WAIC, we can find that AI Agents have evolved again in the office scenario. AI has transformed into a "digital employee", deeply embedded in business processes and taking on responsibilities.
LuminaSphere launched by EHGO uses the "Assistant/Bag" architecture, which can deploy exclusive AI assistants by department (finance, HR, legal) and set role permissions, directly connecting to DingTalk/WeChat to push results; the Shizai Agent undertakes operations in more than 20 financial scenarios in Hebei Telecom, reducing the single - scenario processing time from 2 hours to 10 minutes; Yongsheng Property uses DingTalk AI to analyze the morning meeting content of more than a thousand projects across the country, reducing the management manpower from 15 to 3.
From the above cases, we can see that AI Agents already have the capabilities of domain knowledge, permission awareness, and execution feedback.
It is worth mentioning that against the backdrop of the full integration of Agent capabilities into office platforms such as DingTalk and Enterprise WeChat, the platform - based ecosystem of AI Agents has taken shape.
Taking the DingTalk ecosystem as an example, employees of Cainiao Group have created more than 3300 AI assistants on DingTalk. The "Cainiao AI" solves 80% of HR consultations with an accuracy rate of nearly 90%, reducing 30% of knowledge - base administrators; the "Bailian AI" of Belle Fashion, based on DingTalk, simulates scenarios to help shopping guides practice, increasing the sales of the pilot brand in Tianjin. Its "Ten - Thousand - Group Linkage" model improves the replenishment efficiency, and more than 8000 stores achieve efficient collaboration through DingTalk.
02 Three Driving Forces and the Key to Breaking the Deadlock
The reason why AI - enabled office work can continue to evolve and explode this year lies in the impetus of three driving forces behind it.
Firstly, on the demand side, the rising labor cost, combined with the need to solve the "three highs" pain points of high - frequency operations, high error rates, and high repetition rates in specific work, has promoted the transition of AI Agents from the laboratory to the office.
Secondly, on the technical side, the integration of LLM + RPA + low - code has broken through the bottleneck of task loops. For example, the ISSUT screen semantic analysis technology of the Shizai Agent has increased the understanding ability by 10 times.
Thirdly, on the ecological side: platforms such as DingTalk/Enterprise WeChat have become natural test beds, and low - threshold development tools allow business personnel to build Agents independently.
In specific practice, how does AI Agent solve the actual problems of office workers? By analyzing the cases mentioned above, we can find that the implementation of current AI - enabled office work has shifted from local efficiency improvement to the reshaping of core business. The key to breaking the deadlock lies in "precise targeting of pain points + in - depth integration of technologies".
When the Shizai Agent digital employee was implemented in more than 20 financial scenarios in Hebei Telecom, it directly addressed the "three highs" pain points of financial work (high - frequency operations, high error rates, high labor costs). Its core technology combines generative AI and traditional RPA. The self - developed vertical process large model TARS enables intelligent understanding. Combined with the screen semantic analysis technology (ISSUT), the efficiency of the covered automated scenarios has been increased by 10 times, scenarios such as procurement data retrieval achieve "instant response", and the labor release rate reaches 90%.
The "Ten - Thousand - Group Linkage" model of Belle Group, which deploys more than 800 business AI nodes based on the BetterYeah platform, breaks through the data silos between intelligent systems and reshapes the business process.
The core idea of the "Ten - Thousand - Group Linkage" is low - code + seamless system integration. Business personnel can quickly create AI assistants and connect systems such as ERP and CRM through MCP (tool protocol layer) to achieve a closed - loop from automatic inventory monitoring and early warning to automatic replenishment.
Similar to Belle Group's "Ten - Thousand - Group Linkage" model, the private AI assistant "Zhaoxiaoju" of China Merchants Securities also achieves one - stop processing of multi - scenario office work through system integration.
The highest - level implementation paradigm of AI Agent is to go deeper in the direction of complex decision - making and human - machine co - creation.
For example, SenseTime's office assistant Xiaohuaxiong, based on the SenseNova 6.5 large model, has broken through the "text - image interleaved thinking chain" technology, which can handle complex multi - modal inputs, conduct in - depth fusion analysis, and output results in a multi - modal form. In the actual office scenario, SenseTime Xiaohuaxiong can analyze complex Excel tables, conduct global analysis through the multi - modal thinking chain construction, and finally generate a structured report.
Its technical foundation is to improve the perception efficiency and the depth of modal fusion through the early alignment of visual and language representations, enabling AI to evolve from an "executor" to an "analytical partner". Belle's "Bailian AI" trains shopping guides by simulating scenarios, significantly increasing the sales of the pilot brand in Tianjin, demonstrating AI's decision - making empowerment in unstructured scenarios.
03 Defect Repair and Ecosystem Reconstruction
It is not difficult to see from the above analysis that the implementation of AI Agents has been quite successful. However, from the actual experience and feedback of users, there are still unsolved defects in the implementation of AI Agents in the office.
Firstly, there is the contradiction between development efficiency and implementation depth. Many enterprises face the dilemma of "getting a demo in a week but not being able to use it well in half a year" when implementing AI Agents.
In the early stage of development, it takes a long time to sort out work processes and requirements. In the later stage, due to AI's lack of business understanding, it requires manual data feeding and training like "guiding interns", which to some extent adds a new work burden. Although platforms such as BetterYeah reduce the threshold through "generating an Agent with one sentence", the customization of complex business flows still depends on professional development.
Secondly, there is the contradiction between data fusion and system isolation. As we know, enterprise data is usually scattered in silos such as ERP, CRM, and IoT. LLM cannot call key information in real - time, and the development cost of traditional interfaces is relatively high, which leads to the lack of context support for AI decision - making. Some manufacturers' AI Agent products use private deployment to solve this problem, but the new problem is that the workflow of this design loses the potential of cloud - based collaboration.
Finally, there is the contradiction between task closure and execution breakpoints. Currently, most LLMs can only generate suggestions and cannot perform final operations such as approval and order dispatching. Once, an automobile enterprise had a batch of products reworked because AI missed compliance reviews. It was only after solidifying the "ISO standard verification" node that a closed - loop was achieved.
Problems such as unclear task decomposition and lagging execution feedback hinder AI from becoming a real "responsible entity" from a "suggestor".
Through the window of WAIC, we can also glimpse the evolution direction of AI - enabled office work.
In terms of technical architecture, the "golden triangle" of MCP + LLM + Agent is becoming the new standard.
As a "universal plug", MCP standardizes the connection between tools and data. LLM is responsible for task planning, and Agent schedules execution and feeds back the status. The data flow module of Volcengine HiAgent 2.0 is designed in this way, supporting full - process automation from cleaning to optimization.
Interaction should be multi - modal rather than single. Text - image, voice, and video interactions should not only become the mainstream but also be able to connect seamlessly.
Through "visual encoder optimization + deep and narrow backbone model", SenseTime SenseNova 6.5 achieves text - image interleaved reasoning. Its humanoid robot can smoothly explain PPTs and interact in real - time; the popularization of DingTalk Flash Notes in cross - industry meeting scenarios marks that office interaction is breaking through the text dependence.
In terms of implementation and application, AI should be upgraded from a "tool" to an "organizational member". BetterYeah's Nova Agent supports agents to negotiate and cooperate like human teams; HiAgent 2.0's "Digital Employee Dispatch Station" can customize, manage, and assess AI performance.
It is imaginable that in the future, enterprises may operate in the mode of "human director + AI execution team", and may even give rise to "one - person companies", where entrepreneurs rely on AI teams to support core operations.
04 Conclusion
In the finance center of Hebei Telecom, the digital employee on the screen is automatically entering the information of invoices into the system - a task that used to take human employees several hours now only requires a single click. This seemingly insignificant scenario is a microcosm of the drastic change in office logic. When AI Agents move from processing tables to shoulder KPIs, from executing commands to active collaboration, the "human - machine relationship" in the office has been permanently rewritten.
Just as DingTalk AI takes root in the property, retail, and education industries, or as the Shizai Agent saves every labor force in the telecommunications field, the essence of the AI - enabled office revolution is not the superposition of tools but the reconstruction of production relations. The future competitiveness of enterprises will depend on whether they can integrate the "brain" of LLM, the "hands and feet" of Agent, and the "nerves" of MCP into an organism - where there is no boundary between humans and machines, only the symbiotic evolution of the intelligent agent society.
This article is from the WeChat official account "Insight New Research Institute" (ID: DJXYS - 0309). Author: Someone concerned about artificial intelligence. Republished by 36Kr with authorization.