Shanghai Jiao Tong University: Achieving End - to - End High - Speed Obstacle Avoidance for Drones by Breaking Limits with Differentiable Physics Implementation

A new navigation method for drone swarms from Shanghai Jiao Tong University: Physics + Deep learning, 20 m/s.

A research team from Shanghai Jiao Tong University has proposed an end-to-end method that integrates physical modeling of drones with deep learning, achieving a lightweight, deployable, and collaborative autonomous navigation solution for drone swarms. Its robustness and maneuverability significantly outperform existing solutions.

Imagine this: In an unknown forest, an urban ruin, or even an indoor space full of obstacles, a group of drones can quickly shuttle like birds, without relying on maps, communication, or expensive equipment. This vision has now become a reality!

The research team from Shanghai Jiao Tong University has proposed an end-to-end method that integrates physical modeling of drones with deep learning. This research has successfully deployed the strategy of differentiable physical training to real robots for the first time, achieving a truly "lightweight, deployable, and collaborative" end-to-end autonomous navigation solution for drone swarms, which significantly outperforms existing solutions in terms of robustness and maneuverability.

This achievement has been published online in Nature Machine Intelligence. Master Yuang Zhang, Yu Hu, and Dr. Yunlong Song are the co-first authors, and Professors Danping Zou and Weiyao Lin are the corresponding authors.

Paper link: https://www.nature.com/articles/s42256-025-01048-0

Project link: https://henryhuyu.github.io/DiffPhysDrone_Web/

Core Concept: The Simplest Way is the Best

Traditional autonomous navigation of drones often relies on:

Algorithm design of cascaded modules such as high-complexity positioning and mapping, trajectory planning and generation, and trajectory tracking
Expensive and bulky sensors + high-performance CPU/GPU computing platforms
Communication between multiple drones or centralized planning

The new method proposed by the research team explores a brand - new approach: using a 12×16 ultra-low-resolution depth map as input and an ultra-small neural network with only 3 layers of CNN to achieve end-to-end autonomous flight, which can be deployed on a cheap embedded computing platform costing only $150.

This method abandons the complex drone dynamics and uses a minimalist particle dynamics model to train the end-to-end network through a differentiable physics engine.

Finally, it achieves "train once, share weights among multiple drones" and zero-communication collaborative flight!

Stunning Performance: Dashing Through the Real World

In the single-drone scenario, the network model was deployed on a drone and tested in different real environments, including forests, urban parks, and indoor scenarios with static and dynamic obstacles.

The navigation success rate of this network model in unknown complex environments is as high as 90%, showing stronger robustness compared with the existing optimal methods.

In a real forest environment, the flight speed of the drone can reach up to 20 m/s, which is twice the speed of the existing solutions based on imitation learning. Zero-shot transfer was achieved in all test environments. The system can operate without GPS or VIO for positioning information and can adapt to dynamic obstacles.

Figure 1 Multiple drones flying

In the multi-drone collaborative scenario, the network model was deployed on 6 drones to perform tasks such as flying through complex obstacles in the same direction and exchanging positions.

This strategy demonstrated extremely high robustness in scenarios such as flying through doorways in the same direction, dealing with dynamic obstacles, and complex static obstacles. In the experiment of multiple drones flying through doorways and exchanging positions, it showed self-organizing behavior without communication or centralized planning.

Figure 2 Self-organizing collaboration of multiple drones

Figure 3 Dynamic obstacle avoidance

Key Idea: Embedding Physical Principles, Drones "Learn to Fly Themselves"

End-to-end differentiable simulation training: The policy network directly controls the movement of the drone, and backpropagation is realized through a physical simulator.

Lightweight design: The parameters of the entire end-to-end network are only 2MB, and it can be deployed on a computing platform costing $150 (less than 5% of the cost of the GPU solution).

Efficient training: It only takes 2 hours to converge on an RTX 4090 graphics card.

Figure 4 Low-cost computing platform

The overall training framework is shown in the figure below. The policy network is trained through interaction with the environment. At each time step, the policy network receives a depth image as input and outputs control commands (thrust acceleration and yaw angle) through the policy network.

The differentiable physics simulator simulates the particle motion of the drone according to the control commands and updates the state:

In the new state, a new depth image can be rendered and the cost function can be calculated.

The cost function consists of multiple sub - items, including speed tracking, obstacle avoidance, and smoothing. After the trajectory is collected, the cost function can calculate the gradient through the chain rule (red arrow in Figure 1) to realize backpropagation, thereby directly optimizing the policy parameters.

The Training Secret of "Simplicity is Beauty"

Simple model: Use particle dynamics to replace complex aircraft modeling.
Simple image: Low-resolution rendering + explicit geometric modeling to improve simulation efficiency.
Simple network: Three layers of convolution + GRU sequential module, small and efficient.

In addition, by introducing a local gradient attenuation mechanism during the training process, the problem of gradient explosion during training is effectively solved, allowing the drone's maneuvering strategy of "focusing on the present" to emerge naturally.

Method Comparison: Reinforcement Learning, Imitation Learning, or Physics - Driven?

The current mainstream training paradigms for embodied intelligence are mainly divided into two categories: Reinforcement Learning (RL) and Imitation Learning (IL). However, both types of methods have obvious bottlenecks in terms of efficiency and scalability:

Reinforcement learning (such as PPO) mostly adopts a model - free strategy, completely ignoring the physical structure of the environment or the controlled object. Its policy optimization mainly relies on sampling - based policy gradient estimation, which not only leads to extremely low data utilization but also seriously affects the convergence speed and stability of training.

Imitation learning (such as Agile [Antonio et al. (2021)]) relies on a large number of high - quality expert demonstrations as supervision signals. Obtaining such data is usually expensive and difficult to cover all possible scenarios, thus affecting the generalization ability and scalability of the model.

In contrast, the training framework based on a differentiable physical model proposed in this research effectively combines the advantages of physical priors and end - to - end learning.

By modeling the aircraft as a simple particle system and embedding the differentiable simulation process, the gradient of the policy network parameters can be directly backpropagated, thus achieving an efficient, stable, and physically consistent training process.

The research systematically compared the three methods (PPO, Agile, and the method in this research) in the experiment. The main conclusions are as follows:

Training efficiency: On the same hardware platform, this method can converge in about 2 hours, and the training time is much shorter than the training cycles required by PPO and Agile. Data utilization: Using only about 10% of the training data volume, this method outperforms the PPO + GRU scheme using the full amount of data in terms of policy performance.

Convergence performance: During the training process, this method shows lower variance and faster performance improvement, and the convergence curve is significantly better than the two mainstream methods.

Deployment effect: In real or near - real obstacle avoidance tasks, the final obstacle avoidance success rate of this method is significantly higher than that of PPO and Agile, showing stronger robustness and generalization ability.

This comparison result not only verifies the effectiveness of "physics - driven" but also shows that when we provide the correct training method for the agent, strong intelligence does not necessarily require a large amount of data and expensive trial - and - error.

Figure 5 The method in this research outperforms the existing method (PPO + GRU) with only 10% of the training data volume, and its convergence performance is much higher than the existing methods

Figure 6 Comparison of obstacle avoidance success rates of model deployment

Looking Through the Fog: Exploration of Interpretability

Although end - to - end neural networks show strong performance in autonomous flight obstacle avoidance tasks, the opacity of their decision - making process is still a major obstacle in actual deployment.

To this end, the researchers introduced the Grad - CAM activation map tool to visually analyze the perceptual attention of the policy network during the flight.

Figure 7 shows the input depth maps (upper row) and their corresponding activation maps (lower row) in different flight states.

Figure 7 By observing the activation map, the activated area is strongly correlated with the most dangerous obstacles

It can be observed that the high - response areas of the network are highly concentrated near the obstacles most likely to cause collisions in the flight path, such as tree trunks and column edges. This indicates that although there is no explicit supervision of these "dangerous areas" during the training process, the network has spontaneously learned to focus its attention on the areas with the greatest potential risks. This result conveys two important messages:

The network not only achieves successful obstacle avoidance at the behavioral level, but its perceptual strategy itself also has a certain structural rationality and physical interpretability; and the interpretability tool also helps us further understand the "hidden rules" behind the end - to - end strategy.

Thoughts and Inspirations: "Small Models" in the Era of Large Models

In an era when almost all technological paths are moving towards "bigness", foundation models, general intelligence, and the Scaling Law are gradually becoming beliefs.

People talk about the scale of parameters, the volume of data, and computing resources - as if the essence of intelligence lies in "the bigger, the better", and "small" has become a forgotten direction, and is even misunderstood as "insignificant".

However, nature never follows the aesthetics of a single scale.

It has not only given birth to intelligent creatures like humans with hundreds of millions of neurons but also endowed tiny creatures such as fruit flies, ants, and bees with amazing survival wisdom.

They do not rely on computing power or high - precision sensors but can make rapid and delicate responses in the complex world. This "intelligence in the sense of survival" may be the dimension that we are most likely to ignore when pursuing "strong intelligence" today.

From this research, we can get three profound inspirations:

1) Small models have their own rationality and are even the entry point for understanding "large models"

The human cognitive system is complex and huge, but the first step in understanding the human brain is not to directly model the human brain but to return to biological individuals like fruit flies with clear neural circuits and simple structural mechanisms. In a sense, fruit flies are not an exception in neuroscience; they are the starting point of neuroscience

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。