ROMar 22

VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor Control

Fanxing Li, Fangyu Sun, Tianbao Zhang, Shuyu Wu, Dexin Zuo, yufei Yan, Wenxian Yu, Danping Zou

arXiv:2603.211235.2h-index: 1

Predicted impact top 77% in RO · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of systematic development and evaluation in quadrotor control for robotics researchers, but it is incremental as it builds on existing differentiable simulation methods.

The paper tackles the fragmentation in first-order reinforcement learning for quadrotor control by introducing a unified differentiable framework for multi-task control, and proposes Amended Backpropagation Through Time (ABPT) to address bottlenecks like limited state coverage and gradient bias, showing gains in tasks with partially non-differentiable rewards and competitive performance in fully differentiable settings.

First-order reinforcement learning with differentiable simulation is promising for quadrotor control, but practical progress remains fragmented across task-specific settings. To support more systematic development and evaluation, we present a unified differentiable framework for multi-task quadrotor control. The framework is wrapped, extensible, and equipped with deployment-oriented dynamics, providing a common interface across four representative tasks: hovering, tracking, landing, and racing. We also present the suite of first-order learning algorithms, where we identify two practical bottlenecks of standard first-order training: limited state coverage caused by horizon initialization and gradient bias caused by partially non-differentiable rewards. To address these issues, we propose Amended Backpropagation Through Time (ABPT), which combines differentiable rollout optimization, a value-based auxiliary objective, and visited-state initialization to improve training robustness. Experimental results show that ABPT yields the clearest gains in tasks with partially non-differentiable rewards, while remaining competitive in fully differentiable settings. We further provide proof-of-concept real-world deployments showing initial transferability of policies learned in the proposed framework beyond simulation.

View on arXiv PDF

Similar