HomeArticle

Led by the author of AlphaGo, eight robotic arms work in collaboration without collisions. A new work by DeepMind is published in a sub - journal of Science.

量子位2025-09-10 10:31
Science fiction blockbusters are coming to life, and the era of multi-robotic arm collaboration is on the horizon!

A group of robotic arms are bustling about, working on their own, coordinating with each other without colliding.

The scene from a science - fiction blockbuster has really come into reality. It's so elegant, truly elegant.

In the video, there are 4 robotic arms. In the simulation environment, 4 are installed on the table, and the other 4 are installed on the ceiling.

This is the latest achievement published in the Science sub - journal Science Robotics, jointly proposed by research institutions such as DeepMind, Intrinsic AI, and UCL - RoboBallet.

RoboBallet innovatively applies Graph Neural Networks (GNN) to Reinforcement Learning as its policy network and state - action value estimator to solve complex problems in the collaborative motion planning of multiple robots (robotic arms).

This method can control up to 8 robotic arms simultaneously, coordinate the configuration space of up to 56 degrees of freedom, and handle up to 40 shared tasks. Each step of planning only takes 0.3 milliseconds, and task allocation and scheduling are completely unrestricted.

It's worth mentioning that the corresponding author of this paper, Matthew Lai, is a senior researcher at Google DeepMind. Since joining Google DeepMind in 2016, he has participated in star projects such as AlphaGo and AlphaZero.

Utilizing Graph Neural Networks and Reinforcement Learning

Generally speaking, the core of RoboBallet is to combine graph neural networks with reinforcement learning. It uses Graph Neural Networks (GNN) as the policy network and state - action value estimator, solving the joint problem of large - scale multi - robot task allocation, scheduling, and motion planning, and achieving high - quality trajectory planning that is computationally efficient, scalable, and capable of zero - shot generalization.

Specifically, in modern automated manufacturing, the core challenge lies in how to enable multiple robots to collaborate efficiently without collisions in a shared space full of obstacles to complete a large number of tasks (such as welding, assembly, etc.).

This involves three highly complex sub - problems:

  • Task Allocation: Decide which robot performs which task to minimize the total execution time.
  • Task Scheduling: Decide the execution order of tasks.
  • Motion Planning: Find a collision - free path in the joint space to move the robot's end - effector to the target pose.

When these three sub - problems are combined, the complexity increases sharply. Traditional algorithms often struggle to calculate feasible solutions in real - world scenarios, and the industrial community currently mainly relies on time - consuming and labor - intensive manual planning.

Therefore, to address this high - dimensional complexity, RoboBallet is used for task and motion planning in randomly generated environments. It can plan multi - arm grasping trajectories for environments different from those seen during training (with arbitrary obstacle geometries, task poses, and robot positions).

To achieve this, RoboBallet innovatively models the entire scene as a graph structure at the data representation level.

Among them, the nodes in the graph represent the core entities in the scene, including robots, tasks, and obstacles, while the edges represent the relationships between these entities (e.g., relative poses).

There are bidirectional edges between robot nodes to support mutual coordination and collision avoidance. And there are unidirectional edges from task nodes and obstacle nodes to robot nodes, which are used to transmit the environmental information required for planning to the robots (as shown in Figure c).

Next, RoboBallet uses Graph Neural Networks (GNN) as the policy network, handling the changing graph size through weight sharing. It takes the observation graph as input and generates commanded joint velocities for all robots at each time step. This enables the robotic arms to perform relational and combinatorial reasoning with only the raw state as input.

In the specific policy learning and evaluation stage, RoboBallet trains the policy network by fine - tuning the TD3 (Twin - Delayed Deep Deterministic Policy Gradient) algorithm, enabling the model to generate multi - robotic - arm trajectories while solving sub - problems such as task allocation, scheduling, and motion planning, shifting the expensive online computation to the offline training stage.

(Note: In this task, the robotic arms are rewarded for successfully solving tasks and avoiding collisions.)

Meanwhile, to address the problem of sparse rewards, RoboBallet also adopts the Hindsight Experience Replay method, enabling the model to learn efficiently without a manually designed reward function.

In terms of specific deployment, RoboBallet uses the seven - degree - of - freedom robotic arm of Franka Panda and trains in a simulated environment with random obstacles and tasks.

To verify the performance, the research team conducted tests in a simulated work cell containing 4 (8) robots, 40 tasks, and 30 obstacles and compared it with the RRT - Connect method. It's worth mentioning that all of this can be done on a single GPU (Graphics Processing Unit), whether it's a real or simulated multi - arm work cell.

The experiments show that RoboBallet performs excellently in several key indicators:

In terms of the scalability of training time, even when the number of tasks quadruples, the number of training steps required for RoboBallet to converge only increases slightly.

In terms of planning speed, the experiments show that in the inference stage, even in the largest scenario with 8 robots and 40 tasks, each planning step only takes about 0.3 milliseconds on an NVIDIA A100, achieving a real - time planning speed more than 300 times faster at a 10 Hz time step.

On a single Intel Cascade Lake CPU core, each step takes about 30 milliseconds, still about 3 times faster than real - time at a 10 Hz time step. Each planning step includes one inference and one collision detection for the entire scene.

In terms of multi - agent collaboration, as the number of robots increases from 4 to 8, the average execution time is reduced by about 60%.

In terms of generalization, after the model is trained in a randomly generated environment, it can be zero - shot transferred to new environments with different robot positions, obstacle geometries, and task poses without additional training.

Finally, the high speed and scalability of RoboBallet enable it to be applied to new capabilities such as work cell layout optimization (reducing task execution time by 33%), fault - tolerant planning, and online perception - based replanning.

Reference Links

[1]https://x.com/GoogleDeepMind/status/1965040645103407572

This article is from the WeChat official account "QbitAI", author: henry, published by 36Kr with authorization.