ROApr 16Code
Simple but Stable, Fast and Safe: Achieve End-to-end Control by High-Fidelity Differentiable SimulationFanxing Li, Shengyang Wang, Yuxiang Huang et al.
Obstacle avoidance is a fundamental vision-based task essential for enabling quadrotors to perform advanced applications. When planning the trajectory, existing approaches both on optimization and learning typically regard quadrotor as a point-mass model, giving path or velocity commands then tracking the commands by outer-loop controller. However, at high speeds, planned trajectories sometimes become dynamically infeasible in actual flight, which beyond the capacity of controller. In this paper, we propose a novel end-to-end policy that directly maps depth images to low-level bodyrate commands by reinforcement learning via differentiable simulation. The high-fidelity simulation in training after parameter identification significantly reduces all the gaps between training, simulation and real world. Analytical process by differentiable simulation provides accurate gradient to ensure efficiently training the low-level policy without expert guidance. The policy employs a lightweight and the most simple inference pipeline that runs without explicit mapping, backbone networks, primitives, recurrent structures, or backend controllers, nor curriculum or privileged guidance. By inferring low-level command directly to the hardware controller, the method enables full flight envelope control and avoids the dynamic-infeasible issue.Experimental results demonstrate that the proposed approach achieves the highest success rate and the lowest jerk among state-of-the-art baselines across multiple benchmarks. The policy also exhibits strong generalization, successfully deploying zero-shot in unseen, outdoor environments while reaching speeds of up to 7.5m/s as well as stably flying in the super-dense forest. This work is released at https://github.com/Fanxing-LI/avoidance.
ROJan 24, 2025Code
ABPT: Amended Backpropagation through Time with Partially Differentiable RewardsFanxing Li, Fangyu Sun, Tianbao Zhang et al.
Quadrotor control policies can be trained with high performance using the exact gradients of the rewards to directly optimize policy parameters via backpropagation-through-time (BPTT). However, designing a fully differentiable reward architecture is often challenging. Partially differentiable rewards will result in biased gradient propagation that degrades training performance. To overcome this limitation, we propose Amended Backpropagation-through-Time (ABPT), a novel approach that mitigates gradient bias while preserving the training efficiency of BPTT. ABPT combines 0-step and N-step returns, effectively reducing the bias by leveraging value gradients from the learned Q-value function. Additionally, it adopts entropy regularization and state initialization mechanisms to encourage exploration during training. We evaluate ABPT on four representative quadrotor flight tasks \li{in both real world and simulation}. Experimental results demonstrate that ABPT converges significantly faster and achieves higher ultimate rewards than existing learning algorithms, particularly in tasks involving partially differentiable rewards. The code will be released at http://github.com/Fanxing-LI/ABPT.
ROApr 14
E2E-Fly: An Integrated Training-to-Deployment System for End-to-End Quadrotor AutonomyFangyu Sun, Fanxing Li, Linzuo Zhang et al.
Training and transferring learning-based policies for quadrotors from simulation to reality remains challenging due to inefficient visual rendering, physical modeling inaccuracies, unmodeled sensor discrepancies, and the absence of a unified platform integrating differentiable physics learning into end-to-end training. While recent work has demonstrated various end-to-end quadrotor control tasks, few systems provide a systematic, zero-shot transfer pipeline, hindering reproducibility and real-world deployment. To bridge this gap, we introduce E2E-Fly, an integrated framework featuring an agile quadrotor platform coupled with a full-stack training, validation, and deployment workflow. The training framework incorporates a high-performance simulator with support for differentiable physics learning and reinforcement learning, alongside structured reward design tailored to common quadrotor tasks. We further introduce a two-stage validation strategy using sim-to-sim transfer and hardware-in-the-loop testing, and deploy policies onto two physical quadrotor platforms via a dedicated low-level control interface and a comprehensive sim-to-real alignment methodology, encompassing system identification, domain randomization, latency compensation, and noise modeling. To the best of our knowledge, this is the first work to systematically unify differentiable physical learning with training, validation, and real-world deployment for quadrotors. Finally, we demonstrate the effectiveness of our framework for training six end-to-end control tasks and deploy them in the real world.
ROMar 22
StableTracker: Learning to Stably Track Target via Differentiable SimulationFanxing Li, Shengyang Wang, Fangyu Sun et al.
Existing FPV object tracking methods heavily rely on handcrafted modular pipelines, which incur high onboard computation and cumulative errors. While learning-based approaches have mitigated computational delays, most still generate only high-level trajectories (position and yaw). This loose coupling with a separate controller sacrifices precise attitude control; consequently, even if target is localized precisely, accurate target estimation does not ensure that the body-fixed camera is consistently oriented toward the target, it still probably degrades and loses target when tracking high-maneuvering target. To address these challenges, we present StableTracker, a learning-based control policy that enables quadrotors to robustly follow a moving target from arbitrary viewpoints. The policy is trained using backpropagation-through-time via differentiable simulation, allowing the quadrotor to keep a fixed relative distance while maintaining the target at the center of the visual field in both horizontal and vertical directions, thereby functioning as an autonomous aerial camera. We compare StableTracker against state-of-the-art traditional algorithms and learning baselines. Simulation results demonstrate superior accuracy, stability, and generalization across varying safe distances, trajectories, and target velocities. Furthermore, real-world experiments on a quadrotor with an onboard computer validate the practicality of the proposed approach.
ROMar 22
VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor ControlFanxing Li, Fangyu Sun, Tianbao Zhang et al.
First-order reinforcement learning with differentiable simulation is promising for quadrotor control, but practical progress remains fragmented across task-specific settings. To support more systematic development and evaluation, we present a unified differentiable framework for multi-task quadrotor control. The framework is wrapped, extensible, and equipped with deployment-oriented dynamics, providing a common interface across four representative tasks: hovering, tracking, landing, and racing. We also present the suite of first-order learning algorithms, where we identify two practical bottlenecks of standard first-order training: limited state coverage caused by horizon initialization and gradient bias caused by partially non-differentiable rewards. To address these issues, we propose Amended Backpropagation Through Time (ABPT), which combines differentiable rollout optimization, a value-based auxiliary objective, and visited-state initialization to improve training robustness. Experimental results show that ABPT yields the clearest gains in tasks with partially non-differentiable rewards, while remaining competitive in fully differentiable settings. We further provide proof-of-concept real-world deployments showing initial transferability of policies learned in the proposed framework beyond simulation.
ROMar 9
Vector Field Augmented Differentiable Policy Learning for Vision-Based Drone RacingYang Su, Feng Yu, Yu Hu et al.
Autonomous drone racing in complex environments requires agile, high-speed flight while maintaining reliable obstacle avoidance. Differentiable-physics-based policy learning has recently demonstrated high sample efficiency and remarkable performance across various tasks, including agile drone flight and quadruped locomotion. However, applying such methods to drone racing remains difficult, as key objective like gate traversal are inherently hard to express as smooth, differentiable losses. To address these challenges, we propose DiffRacing, a novel vector field-augmented differentiable policy learning framework. DiffRacing integrates differentiable losses and vector fields into the training process to provide continuous and stable gradient signals, balancing obstacle avoidance and high-speed gate traversal. In addition, a differentiable Delta Action Model compensates for dynamics mismatch, enabling efficient sim-to-real transfer without explicit system identification. Extensive simulation and real-world experiments demonstrate that DiffRacing achieves superior sample efficiency, faster convergence, and robust flight performance, thereby demonstrating that vector fields can augment traditional gradient-based policy learning with a task-specific geometric prior.