HomeArticle

Cloud Collaboration: How Does AI Define the Next Generation of Intelligent Terminals?

晓曦2025-09-05 11:49
Form a replicable paradigm of end-cloud collaboration to accelerate the large-scale implementation of AI capabilities across all industries.

A new wave of artificial intelligence is surging from the cloud to every screen and device around us. From AI PCs and AI phones to a variety of intelligent robots, an industrial transformation is accelerating. This is not simply a functional transplant but an evolution jointly driven by AI and terminal experience, which will reshape the way humans interact with machines.

However, when AI moves towards the terminal, the challenges it faces are far more complex than expected. How to balance cost and experience? How to ensure data security and privacy? How to make general large models adapt to the unique needs of various industries?

There is a clear industry consensus: the rise of edge-side intelligence is not about being separated from the cloud but about building an intelligent model of "cloud-edge collaboration". This model is no longer a simple task allocation but a precise layout based on scenario requirements, aiming to achieve the best balance among experience, cost, and security. In actual implementation, this collaborative strategy is mainly determined by four core driving forces:

● Real-time and security. In fields such as robotics and autonomous driving, even a millisecond of delay can lead to security issues. The "neural-level control" of robots must complete a closed-loop on the edge side. Relying solely on round-trip communication with the cloud is an unacceptable risk. This is the core value of edge-side computing power, which ensures the reliability of intelligent devices at critical moments.

● Data privacy and compliance. Protecting the privacy of user data is of utmost importance. Some data is not suitable for direct upload to the cloud, so local processing becomes an inevitable choice. This means that a sufficiently powerful model must be deployed on the edge side to handle sensitive information.

Balance between cost and efficiency. Placing high-frequency and relatively simple tasks on the edge side can greatly reduce the cloud inference cost. For "heavy tasks" such as knowledge Q&A, complex logical reasoning, and model iteration and upgrading, they should be handed over to the cloud with more abundant computing power and higher efficiency. This refined division of labor is the key to large-scale commercial deployment.

● Global collaboration and optimization. The cloud is more like a collaboration and management platform. It integrates scattered terminal devices into an organic whole, ensures the optimal operation of the system from a global perspective, and achieves a synergistic effect of "1 + 1 > 2" through cluster scheduling, thus solving the management problem of large-scale device deployment.

Creating intelligent terminals with excellent experience is a complex systematic project that is far beyond the ability of a single enterprise to complete independently. Every link in the industrial chain, including hardware, cloud platforms, algorithms, and data, is indispensable. An open and collaborative ecosystem has become the consensus to promote the development of the industry.

In this ecosystem, the roles are clearly divided: hardware manufacturers provide the physical carriers, algorithm companies focus on the research and optimization of core models, and platforms like Alibaba Cloud, which have full-stack AI cloud capabilities, play the role of the "intelligent cornerstone". They provide not only the models themselves but also a complete set of "full-stack" services from underlying computing power, data processing, model training and deployment to application development. This enables innovators to focus their energy on the fields they are best at, whether it is the insight into user needs or the understanding of human-machine interaction, thus accelerating the R & D and iteration of products.

The following is the edited and organized dialogue content between Zou Ping, the dean of 36Kr Research Institute, and Huang Bolin, the vice president of the Future Education Group (Seewo) of CVTE, Zhang Zhizheng, the co-founder of Galaxy Universal, Yan Xin, a senior algorithm engineer of embodied intelligence at Xinyan Group, and Zheng Haichao, the director of the Tongyi large model solution at Alibaba Cloud Intelligence Group:

01. Path selection and methodology for the implementation of edge-side intelligence

36Kr: Some enterprises choose to deploy small models on the edge side, while others choose to provide services to users by calling from the cloud. How do you choose your own technology development path?

Huang Bolin: First, we should return to the customer's needs. Seewo is an education brand focusing on education informatization. A representative product of ours is the Seewo Learning Machine. The needs of children of different age groups are different. Whether to place the large model on the edge side or the cloud side, we mainly consider whether it can meet the needs of children at that age.

The technology path we have chosen is a combination of edge and cloud. For problems that can be solved by traditional models such as image recognition, we place them on the edge side. Relatively complex and non-standard problems are sent to the cloud. Ultimately, it all serves the customer's needs.

Zhang Zhizheng: The change brought by large models to the field of embodied intelligence is that it gives robots a smart brain. Galaxy Universal hopes to build robots driven by large models.

Traditional robot motion control relies on hard programming, which means that the robot's actions are fixed, and more attention is paid to repeated positioning accuracy. After the emergence of large models, the most significant thing is that we can build the robot's brain directly oriented to the success rate of tasks.

What does having a smart brain mean? First, it means a qualitative leap in the generalization ability of robots. Second, it is not a single-point replacement relationship but will lead to the reconstruction of the entire collaborative relationship and interaction mode. Third, large models connect the entire closed-loop of data, models, and computing power. When we apply the large model of embodied intelligence to specific scenarios, it can learn and iterate by itself, form a complete solution, and empower thousands of households and various industries, ultimately forming productivity.

Yan Xin: The robots that Xinyan Group is mainly working on are emotional companion robots centered on emotional intelligence.

We adopt a strategy of combining the cloud, the edge, and the terminal. On the edge side, the robot perceives multi-modal and unstructured data through the model and needs to provide real-time, robust, and secure support and services. When it comes to the cloud, we mainly use models with larger parameters to provide more planning and decision-making capabilities.

36Kr: As a digital platform empowering various industries, can Alibaba Cloud summarize some methodologies for business implementation to address the scattered problems in different industries?

Zheng Haichao: From a broader perspective, ultimately, we still need to look at the actual implementation of enterprise scenarios and the closed-loop of the entire business value. When implementing, there are three combinations:

The first is the combination of large models and small models. To complete a complete business, it may be necessary to deploy some small models on the edge side. If it is found that a larger model is needed to get very good results, then it has to go to the cloud.

The second is the combination of the cloud and the edge. For example, the voice scenario is more sensitive to latency and requires some pre-processing. There is also some data related to security and privacy, and users prefer to keep this data on the edge side.

The third is the combination of generative large models and discriminative models. When doing business, it is not necessary to use models all the time. In fact, rules can also be written. If a simple thing matches the rules, it can be processed directly without going through the large model. Because large models will definitely lead to an increase in time and cost, and the computing power on the edge side is relatively limited. So, from the perspective of actual business implementation, we believe that the key lies in these three combinations.

02. Technical breakthroughs for the large-scale implementation of edge-side AI

36Kr: From the initial exploration to large-scale implementation, what specific technical bottlenecks did Galaxy Universal encounter in the aspects of model lightweighting, computing power adaptation, and inference optimization, and what solutions were explored?

Zhang Zhizheng: Galaxy Universal hopes to promote the most advanced large models of embodied intelligence to different scenarios.

For example, the industrial scenario attaches great importance to privacy, as well as the rhythm and efficiency of work. So, the neural-level control needs to be placed on the edge side. What should be considered uniformly behind the diverse scenarios and needs? First, behaviors related to safety. Second, the problem of latency. Third, the privacy of data. For example, data involving user privacy can only be processed on the edge side and cannot be placed in the cloud.

So, what should be placed in the cloud? First, we try to place the learning process in the cloud as much as possible because the cloud has more abundant computing power and higher learning efficiency. Second, in situations where the requirement for latency is relatively low but the complexity requirement is relatively high, for example, asking a robot to untangle a tangled wire harness.

36Kr: From software to hardware, users have different requirements for intelligence. How does Xinyan's AI companion robot design algorithms to operate in complex and unstructured environments, and what challenges will there be in model deployment?

Yan Xin: The home scenario is a very complex and open environment with unstructured and multi-modal data. Whether we can obtain this data through multi-modal models and extract emotional information from it poses a great challenge in algorithms.

Thanks to Alibaba Cloud, with multi-modal models like Qwen2.5-Omni and the rich multi-modal data accumulated by Cetecom in the past decade, we have made a lot of explorations in this direction, which requires stable quantization of the model.

The robot must operate in a real-time, stable, and safe manner. We divide the entire process into three parts: perception, planning, and decision-making. Decision-making and planning are provided by the cloud model, which requires the cloud model to have very strong reasoning and agent capabilities. In the home companion robot, due to privacy, security, and real-time requirements, the model needs to be placed on the edge side during the perception process.

36Kr: Can Seewo share a relatively successful AI application of cloud-edge collaboration and tell us what changes in experience it has brought to teachers and students respectively?

Huang Bolin: A typical example is how to evaluate whether a class is good or not. The classroom is a typical scenario where the people are fixed and the tasks are certain, but what happens is extremely complex.

The challenge for large models lies in how to respond to this situation. We started a project R & D two years ago. First, we divided the task into eight parts and then uploaded them segment by segment to the cloud large model. After getting the results back, we assembled them into a complete report. When the report is finally delivered to teachers and students, timeliness is also required. We managed to generate an evaluation report for a class within 5 minutes.

The challenges behind this are that the evaluation itself must be fair, which involves relatively professional teaching evaluation methods. At the same time, timeliness must also be considered, which involves pre-processing complex tasks. We process the data while recording, and the report can be generated 5 minutes after class. We probably place a 7B model locally and cooperate with Tongyi Qianwen for intention recognition to achieve better results.

36Kr: To meet the non-universal needs under different industry characteristics, what standardized tools or solutions can Alibaba Cloud provide?

Zheng Haichao: First, Tongyi, as an open-source basic large model independently developed by Alibaba, is the first to achieve open source in "full size, full modality, and multiple scenarios", including multiple modalities such as text, image, video, voice, and coding. In each modality, we also provide model services of different specifications. Second, to help everyone do business better, we provide a complete training environment. The large model service platform, Bailian, allows everyone to use various basic models, upload their own data, and conduct mixed training in combination with the data of the Tongyi model family, and then perform model deployment and inference. Finally, if you have specific needs, we may need to do further customization together. Based on Alibaba Cloud's customized tool platform, we can help you create a customized model of your own.

03. Building a win-win ecosystem with full-stack AI capabilities

36Kr: What measures will Alibaba Cloud take to respond to the needs of various industries and promote open source, openness, and industry co-construction?

Zheng Haichao: Alibaba Cloud will cooperate with everyone at multiple levels.

At the infrastructure layer, Alibaba Cloud deploys infrastructure all over the world to provide a stable and cost-effective computing power base for the model training and inference service process. At the data and model service platform layer, through the artificial intelligence platform PAI, we provide customers with efficient and low-cost technical support for model services throughout the entire life cycle of "data - training - inference - AI application". At the model layer, we continuously promote the upgrade of model architecture and the evolution of model capabilities to build an open-source model family of full size and full modality. At the application and solution layer, we create a multi-modal interaction development kit, which consists of "multi-modal interaction" and "intelligent assistant" as the core, providing a 60% - 80% general capability base to support terminal manufacturers to quickly complete secondary development and productization in differentiated scenarios.

It is worth mentioning that the Tongyi multi-modal interaction development kit has been widely implemented in edge-side scenarios, injecting the ability of "understanding, seeing, and thinking" into terminal devices. Through natural dialogue, multi-modal perception, and real-time interaction, it makes everything an extension of users' perception and a life assistant. This kit provides an edge-side SDK and algorithm enhancement, supports local processing such as VAD and echo cancellation, significantly reduces latency and power consumption, and has wide hardware and system compatibility (Android, iOS, Linux, RTOS). Through the visual configuration interface, developers can manage models, prompts, knowledge bases, and agent processes without code and quickly deploy and test. At the same time, it supports calling models on the large model service platform Bailian and allows customizing agents, plugins, and third-party protocol access to build a flexible and scalable multi-modal development ecosystem.

In the process of serving various industries, Alibaba Cloud has accumulated and opened up the industry know-how and best practices formed in the process of serving leading customers, forming a replicable paradigm of cloud-edge collaboration to accelerate the large-scale implementation of AI capabilities across all industries.

For more wonderful content, you can click to watch the complete live replay:

Yunqi is Coming! Highlights Preview

More discussions on the implementation of AI value will be presented at the 2025 Yunqi Conference. Please pay attention to the "36Kr Pioneer AI Hardware Sub-forum" and the "Tongyi Multi-modal Interaction Technology Sub-forum" at the Yunqi Conference on the afternoon of September 24th.

Welcome to click the link to get your tickets and go directly to the Yunqi Conference to witness the "intelligent transformation moment" of intelligent hardware on-site.