StartseiteArtikel

Neues Werk von Boston Dynamics: Erster Einsatz der "Koordinatensystem-Transfer-Schnittstelle" zur Verbesserung der generalisierten hierarchischen Intelligenz von Robotern

新智元2025-07-29 11:12
Das HEP-Framework hat erstmals eine Schnittstelle für die Koordinatensystemübertragung entwickelt, um eine effiziente Lernweise mit wenigen Beispielen und eine starke Generalisierungsfähigkeit des Roboters zu erreichen.

The HEP framework proposed by the RAI team of Northeastern University in the United States and Boston Dynamics pioneered the "coordinate system transfer interface", seamlessly coupling the generalization ability of high-level strategies with the flexibility of low-level strategies, and achieving efficient learning and strong generalization with "little data". Through its hierarchical structure, natural generalization of spatial symmetry, and an innovative voxel encoder, this framework significantly enhances the performance of robots in complex tasks.

In the field of intelligent robot operation, the scarcity of data and the challenge of generalization have long troubled the implementation of AI applications.

Most methods either rely on a large amount of data or fail to perform well when the environment changes slightly.

How can AI, like humans, robustly adapt to complex and ever - changing real scenarios with only a small number of demonstrations?

Researchers from the RAI team of Northeastern University in the United States and Boston Dynamics proposed the HEP (Hierarchical Equivariant Policy via Frame Transfer) framework, pioneering the "coordinate system transfer interface". It seamlessly couples the generalization ability of high - level strategies with the flexibility of low - level strategies, achieving efficient learning and strong generalization with "little data".

Paper link: https://openreview.net/pdf?id=nAv5ketrHq

Project code: https://codemasterzhao.github.io/HierEquiPo.github.io/

The coordinate system transfer interface realizes the seamless integration of high - level generalization ability and low - level flexibility, opening up a new path for the generalization and deployment of robots with few samples, high robustness, and in multiple scenarios.

Main Contributions

1. An extremely simple and efficient hierarchical structure — The high - level module is responsible for predicting global sub - goals (keypose), and the low - level module autonomously optimizes the trajectory based on local coordinates.

2. Natural generalization of spatial symmetry — It maintains equivariance under both the T(3) (translation) and SO(2) (planar rotation) groups, significantly reducing the dependence on the number of examples.

3. An innovative voxel encoder — It uses a Stacked Voxel + SO(2) equivariant network to efficiently encode three - dimensional visual information, taking into account both details and computational speed.

Method Overview

The HEP framework consists of three parts:

1. High - level strategy: It first reads the three - dimensional point cloud information perceived by the robot and then predicts a rough target position, that is, the "key pose".

2. Coordinate system transfer interface: Then, it transforms the global point cloud and the above key pose into a local coordinate system centered on the key pose, so that subsequent processing is based on this "local perspective".

3. Low - level strategy: Finally, the low - level strategy generates a continuous and precise robot motion trajectory through an equivariant diffusion operation on the voxelized three - dimensional visual features in this local coordinate system.

Open/closed - loop compatibility: The same interface supports two control modes: one - time output (Open‑loop) and step - by - step feedback (Closed‑loop).

Lightweight and efficient: The high - level only needs to predict the translation vector, reducing the computational and learning difficulty and enhancing generalization. The low - level focuses on details and enhances generalization from the high - level through the coordinate system transfer interface.

Core Innovations

Coordinate System Transfer Interface (Frame Transfer)

Design idea: The high - level strategy provides the "reference coordinates" of the task, and the low - level strategy autonomously optimizes the execution details on this basis.

This design not only releases the flexibility of the low - level but also consistently transfers the generalization ability and anti - interference ability of the high - level to the low - level, achieving an integrated improvement of "generalization and robustness".

The advantages include:

Flexibility: The low - level can autonomously adjust the execution details within the local coordinate system.

Generalization: The high - level's ability to adapt to global transformations is transferred to the low - level without loss through the coordinate system transfer interface.

Simplify the high - level: It only needs to predict translation, avoiding precise planning in the high - dimensional SE(3) space.

Implementation of T(3) and SO(2) Equivariance

Schematic diagram of T(3) equivariance

Schematic diagram of SO(2) equivariance

When rotation and translation along the xyz axes occur, the trajectory predicted by the model can also ensure corresponding translation and rotation.

High - level: It uses an SO(2) - equivariant 3D U - Net to predict the discretized translation probability map, so it has SO(2) - equivariance and T(3) equivariance.

Low - level: It extracts local features based on the stacked voxel encoder and combines the SO(2) - equivariant diffusion strategy, so it has SO(2) equivariance.

System: A complete proof of equivariance is given in the paper appendix (Proposition 4.2&4.3). Through coordinate system transformation, SO(2) equivariance is preserved, and T(3) equivariance can be transferred from the high - level to the low - level, making the entire system have the equivariance of SO(2)XT(3).

Innovative Voxel Encoder (Stacked Voxel Representation)

Principle: Group the point cloud by voxel grids, aggregate the features of points within each voxel with an equivariant PointNet, and form an equivariant voxel map of c×D×H×W.

The advantages include:

Detail preservation: Compared with traditional downsampling, it better preserves local geometric information.

Computationally friendly: The point cloud - convolution hybrid structure balances speed and accuracy.

Equivariance: It is theoretically guaranteed to be consistent under T(3)×SO(2) transformations.

Simulation Experiments

Dataset: 30 RLBench tasks, with each task trained using 100 demonstrations.

Comparison baselines: 3D Diffuser Actor, Chained Diffuser, Equivariant Diffusion Policy.

Open - loop results: HEP won in 28 out of 30 tasks, with an average improvement of +10%.

Closed - loop results: On 10 long - range tasks, HEP had an average improvement of +23%, significantly better than single - level methods.

Ablation Analysis

Removing the equivariant structure: Performance decreased by 24%.

Removing the coordinate system transfer: Performance decreased by 16%.

Removing the stacked voxel: Performance decreased by 10%.

This fully verifies the contributions of each module.

Real - Robot Experiments

The Hierarchical Strategy Shows Significant Advantages in Complex Long - Range Tasks

On a real robot, the HEP hierarchical framework learned a robust "pot - washing" task involving multi - step cooperation such as moving the pot lid, adding detergent, and scrubbing with only 30 demonstration data, significantly outperforming non - hierarchical methods.

Coordinate System Transfer Interface: A Bridge for Transferring Generalization and Robustness

Theoretical guarantee: It is proven that the coordinate system transfer interface can transfer the high - level's ability to adapt to spatial changes to the low - level without loss, making the overall strategy easier to extend to new scenarios.

In the Pick&Place task, with only one demonstration, the low - level diffusion model of HEP can achieve 1 - shot generalization learning, significantly improving data efficiency.

Under perturbation tests with environmental changes and the introduction of irrelevant objects, the success rate of HEP increased by up to 60% compared with traditional methods.

The Interface Design Brings Possibilities for Future Expansion

The coordinate system transfer interface only imposes soft constraints on the low - level strategy, which not only ensures flexibility but also provides a natural interface for introducing multi - modal and cross - platform high - level strategies such as VLM or Cross - embodiment as decision planners in the future.

References

https://openreview.net/pdf?id=nAv5ketrHq

This article is from the WeChat official account "New Intelligence Yuan", author: LRST, published by 36Kr with authorization.