An End-to-end Flight Control Network for High-speed UAV Obstacle Avoidance based on Event-Depth Fusion
For UAVs operating at high speeds in complex environments with mixed obstacles, this work provides a more reliable fusion approach that improves success rates by 10-20% over existing methods.
The paper proposes an end-to-end flight control network that fuses depth images and event data via bidirectional cross-attention for high-speed UAV obstacle avoidance. The method achieves 70-80% success rate at 17 m/s in simulation, outperforming single-modality and unidirectional fusion models by 10-20%.
Achieving safe, high-speed autonomous flight in complex environments with static, dynamic, or mixed obstacles remains challenging, as a single perception modality is incomplete. Depth cameras are effective for static objects but suffer from motion blur at high speeds. Conversely, event cameras excel at capturing rapid motion but struggle to perceive static scenes. To exploit the complementary strengths of both sensors, we propose an end-to-end flight control network that achieves feature-level fusion of depth images and event data through a bidirectional crossattention module. The end-to-end network is trained via imitation learning, which relies on high-quality supervision. Building on this insight, we design an efficient expert planner using Spherical Principal Search (SPS). This planner reduces computational complexity from $O(n^2)$ to $O(n)$ while generating smoother trajectories, achieving over 80% success rate at 17m/s--nearly 20% higher than traditional planners. Simulation experiments show that our method attains a 70-80% success rate at 17 m/s across varied scenes, surpassing single-modality and unidirectional fusion models by 10-20%. These results demonstrate that bidirectional fusion effectively integrates event and depth information, enabling more reliable obstacle avoidance in complex environments with both static and dynamic objects.