Hören Sie auf, von "Superhirnen der KI" zu verführen: Cambridge - Professor behauptet, kollektive Roboter

Echte Maschinenintelligenz erfordert, dass 1 + 1 > 2.

Have you ever wondered, why can't we build robots that are truly adaptable to the real world?

Although the scale of artificial intelligence (AI) models is getting larger and their individual capabilities are getting stronger, when embodied robots perform collective tasks, they either react too slowly to keep up with real-time tasks or often "malfunction" because they can't adapt to new scenarios.

Today, Amanda Prorok, a professor of collective intelligence and robotics in the Department of Computer Science and Technology at the University of Cambridge in the UK, published a perspective article in the scientific journal Science Robotics, revealing the reasons for the "failure of collective intelligence" in robots:

The classic approach to robot autonomy, which focuses on individual robots operating independently, is not suitable for complex real-world environments where interaction and collaboration are essential.

In simple terms, the future of robot intelligence has never been about "a single super-brain ruling the world," but rather about a group of professional partners collaborating together.

Professor Prorok even believes that this approach is "fundamentally wrong." Because scaling laws indicate that achieving more complex behaviors requires disproportionately high investment, making this approach to achieving robot autonomy neither scalable nor sustainable.

She suggests that researchers need to make a paradigm shift: design robot swarms to consist of diverse and specialized agents so that they can function as part of a larger system.

Article link: www.science.org/doi/10.1126/scirobotics.adv4049

Why can't a single model go far?

Most of today's most advanced robots rely on a large, centrally controlled model, trying to use one model to complete all tasks such as navigation, perception, and interaction.

Professor Prorok believes that there are misunderstandings in the current approach to pursuing robot autonomy. Throughout the entire lifecycle of a robot's operation, it will inevitably interact with other agents. However, to this day, the collective behavior among agents is still often regarded as an accidental phenomenon rather than an integral part of intelligence.

Neither physical nor virtual AI systems were designed with the need to interact with other agents such as machines or humans in mind from the very beginning. Existing cognitive frameworks and basic AI textbooks still define classic AI problems as the confrontation of isolated machines in non-social environments.

Similarly, the traditional definition of robot autonomy positions an independent robot as an agent that can autonomously cope with the environment without relying on any external factors.

The scaling laws of deep learning show that to make robot behaviors more complex, the model scale and data requirements will increase exponentially. Currently, typical AI products adopt a centralized, single-body architecture with a parameter scale reaching millions or even billions. This means that to make robots a little smarter, you need to invest multiple times more energy, time, and money.

Worse still, these large models are not practical at all: just calculating the model itself requires hundreds of gigabytes of memory, and the forward propagation delay means that they can currently only run offline at a low speed and cannot be applied to high-frequency control scenarios, making it difficult to implement many robot application scenarios.

The following figure shows the operating performance of four large models on two platforms. The data shows that even when running on a more powerful development board, only the smallest model can barely meet the standard of "real-time response."

Figure | Comparison of the inference frequencies of the DINOv2 model on two platforms (a); Linear fitting of the model data shows (b).

Collective intelligence: 1 + 1 should be greater than 2

Professor Prorok said that nowadays, the field of robot research is following the trend of integrating off-the-shelf centralized, single-body AI models into autonomous system architectures, and this decision-making design is driving the boom in the development of general-purpose robots. However, intelligence is not a single super-organism, and not all envisioned tasks can be completed by a single super-robot.

Since a single model won't work, can robots cooperate like humans?

The answer is yes. Instead of pursuing independent robots controlled by the same set of general brains, scientists believe that more attention should be focused on designing robot swarms composed of robots with interdependent minds and bodies, diverse forms, and specialized functions, so that they can operate collaboratively as part of a larger system.

Therefore, collective robot intelligence requires a more modular and combinatorial design approach in both hardware and software design. In this architecture, multiple models can not only learn from each other but also interact within and across different physical robots.

First of all, the core of collective intelligence is specialized division of labor.

Instead of having one model handle everything, each robot focuses on one skill, and then complex capabilities are combined through collaboration.

A group composed of agents using specialized models can achieve super-linear improvement through skill combination and integrate individual capabilities in novel and untrained ways.

This modular and combinatorial approach to autonomy enables agents and robots to effectively simulate the behavior patterns of large neural networks and dynamically reorganize during operation to meet task requirements, achieving "super-linear gain." As the number of skill combinations increases, the performance growth rate far exceeds that of a single model.

Figure | Schematic diagram of the scaling law of a robot model swarm

Secondly, some skills can only be learned through social learning in a collective.

Based on the concept of training interaction from scratch, the collaborative learning process can also enable a single expert model to have a deeper understanding of its own capabilities and limitations. This kind of awareness will naturally prompt it to understand when and in what situation it needs to cooperate with other models to overcome its shortcomings and achieve more complex task goals.

Skills such as theory of mind and metacognition are not innate but are learned in a collective environment. These skills are crucial for tasks involving interaction with humans, other agents, and robots.

In addition, collective learning can improve the performance of a single agent, even when facing a single task, by accelerating the learning process and optimizing long-term results.

This advantage stems from an experience-sharing mechanism: in the field of robotics, since physical operations are performed rather than virtual simulations, collecting experimental data is often costly, risky, and even has safety hazards. Borrowing others' experiences can catalyze the learning process, and more importantly, it can avoid repeating dangerous behaviors. Sharing knowledge can also alleviate the problem of catastrophic forgetting by sharing and distributing specific skills.

Collective intelligence is not simply "putting numbers together"

Of course, making robots "team up and cooperate" is not as simple as just putting a few robots together. There are still many challenges to be solved in this field, and researchers need to continue to "overcome the hurdles."

The first hurdle is the technical one of "how to cooperate"

Currently, the explicit local interaction between robots usually relies on a narrow-band communication network. However, how to design a method to help robots determine "what to communicate, when, and with whom" remains an unsolved problem.

Most of the existing communication between robots relies on narrow-band networks, and it is difficult to design strategies for "what to say and when to say it."

Although some researchers have tried to use differentiable communication channels to let robots automatically generate interaction signals or use graph neural networks to plan collaborative paths, these technologies are still in their infancy. More research is needed to understand the trade-offs and limitations of these emerging methods when building robust and adaptable robot swarms.

The second hurdle is the design one of "how to implement"

Although the field is progressing rapidly, the concept of the "robot hybrid" paradigm is still unclear, and it is quite challenging to design models that can handle different and possibly non-overlapping action domains.

In the "hybrid robot" approach, this problem can be transformed into an algorithm or mechanism for connecting various models. Initial exploration can draw on research results such as integrated models, mixture of experts, hypernetworks, and hierarchical learning, but the ultimate goal is to assemble robot swarms according to the underlying task requirements.

In these aspects, there are still many technical gaps, especially in realizing the real-time combination of professional robot skills and behaviors.

The third hurdle is the standard one of "how to evaluate"

Currently, the definition of performance still remains at a basic level and is often simplified as a substitute indicator for learning loss or the optimization goal of a single robot model, while more complex comprehensive indicators such as team-level satisfaction and collective resilience are ignored.

This evaluation system neither considers task diversity nor the comprehensive ability of the team to adapt to different collaboration scenarios, object distributions, and types. As a result, the system performs excellently in controlled single-machine tests but struggles in environments that require flexible team collaboration or precise role allocation.

Therefore, Professor Prorok suggests that we need to develop more comprehensive benchmark testing standards to establish a comprehensive evaluation system that goes beyond simple individual success rates.

Today, although the application of AI technology is crucial, true breakthroughs still require us to resist the temptation to avoid deep fundamental challenges for short-term benefits.

In Professor Prorok's view, future robots will not be a single large model but a well-coordinated collaborative team. When robots learn to "unite," they will truly have the ability to enter the real world.

This article is from the WeChat official account "Academic Headlines" (ID: SciTouTiao), author: Academic Headlines. Republished by 36Kr with permission.