ROApr 16

Simple but Stable, Fast and Safe: Achieve End-to-end Control by High-Fidelity Differentiable Simulation

Fanxing Li, Shengyang Wang, Yuxiang Huang, Fangyu Sun, Shuyu Wu, Yufei Yan, Danping Zou, Wenxian Yu

arXiv:2604.1054843.8h-index: 2Has Code

Predicted impact top 46% in RO · last 90 daysOriginality Incremental advance

AI Analysis

For quadrotor control, this work provides a simple, stable, fast, and safe end-to-end approach that avoids dynamic infeasibility issues at high speeds, demonstrating strong real-world performance.

The paper presents an end-to-end policy for quadrotor obstacle avoidance that directly maps depth images to low-level bodyrate commands using reinforcement learning via differentiable simulation. The method achieves the highest success rate and lowest jerk among baselines, with zero-shot generalization to unseen outdoor environments at speeds up to 7.5 m/s.

Obstacle avoidance is a fundamental vision-based task essential for enabling quadrotors to perform advanced applications. When planning the trajectory, existing approaches both on optimization and learning typically regard quadrotor as a point-mass model, giving path or velocity commands then tracking the commands by outer-loop controller. However, at high speeds, planned trajectories sometimes become dynamically infeasible in actual flight, which beyond the capacity of controller. In this paper, we propose a novel end-to-end policy that directly maps depth images to low-level bodyrate commands by reinforcement learning via differentiable simulation. The high-fidelity simulation in training after parameter identification significantly reduces all the gaps between training, simulation and real world. Analytical process by differentiable simulation provides accurate gradient to ensure efficiently training the low-level policy without expert guidance. The policy employs a lightweight and the most simple inference pipeline that runs without explicit mapping, backbone networks, primitives, recurrent structures, or backend controllers, nor curriculum or privileged guidance. By inferring low-level command directly to the hardware controller, the method enables full flight envelope control and avoids the dynamic-infeasible issue.Experimental results demonstrate that the proposed approach achieves the highest success rate and the lowest jerk among state-of-the-art baselines across multiple benchmarks. The policy also exhibits strong generalization, successfully deploying zero-shot in unseen, outdoor environments while reaching speeds of up to 7.5m/s as well as stably flying in the super-dense forest. This work is released at https://github.com/Fanxing-LI/avoidance.

View on arXiv PDF Code

Similar