CVMar 7

FreeFly-Thinking : Aligning Chain-of-Thought Reasoning with Continuous UAV Navigation

Jiaxu Zhou, Shaobo Wang, Zhiyuan Yang, Zhenjun Yu, Tao Li

arXiv:2603.07181v1

Predicted impact top 34% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work tackles the problem of enabling UAVs to navigate complex outdoor environments using natural language instructions, which is a significant challenge for autonomous aerial systems.

This paper addresses Vision-Language Navigation for UAVs in complex outdoor environments, where existing models lack explicit reasoning. The authors introduce FreeFly-thinking, an end-to-end framework that translates egocentric images and language instructions into navigation actions, demonstrating strong performance on unseen test data.

Vision-Language Navigation aims to enable agents to understand natural language instructions and carry out appropriate navigation actions in real-world environments. Most work focuses on indoor settings, with little research in complex outdoor scenes. Current UAV Vision-and-Language Navigation models typically act as black boxes without explicit reasoning. We introduce FreeFly-thinking, an end-to-end VLN framework that converts the UAV agent's egocentric images and language instructions into a series of actions, inspired by environment of urban architecture proposed by OpenFly. We first construct a UAV dataset for navigation task, and then performing natural language chain of thought. We adopt a two-stage training strategy: Supervised fine-tuning and Reinforcement fine-tuning. Experiments on unseen test demonstrate a strong performance, presenting robustness and efficiency in UAV navigation issue.

View on arXiv PDF

Similar