Long Range Object-Level Monocular Depth Estimation for UAVs
This work addresses a critical need for accurate depth estimation in UAVs to enhance autonomous flight safety, though it is incremental as it builds on existing object detection frameworks.
The paper tackles the problem of estimating object depth from monocular images for UAV collision avoidance by proposing novel extensions to existing object detection methods, including Sigmoid/ReLU encodings and a Soft-Argmax loss, and demonstrates improved performance over state-of-the-art approaches on the Amazon Airborne Object Tracking dataset.
Computer vision-based object detection is a key modality for advanced Detect-And-Avoid systems that allow for autonomous flight missions of UAVs. While standard object detection frameworks do not predict the actual depth of an object, this information is crucial to avoid collisions. In this paper, we propose several novel extensions to state-of-the-art methods for monocular object detection from images at long range. Firstly, we propose Sigmoid and ReLU-like encodings when modeling depth estimation as a regression task. Secondly, we frame the depth estimation as a classification problem and introduce a Soft-Argmax function in the calculation of the training loss. The extensions are exemplarily applied to the YOLOX object detection framework. We evaluate the performance using the Amazon Airborne Object Tracking dataset. In addition, we introduce the Fitness score as a new metric that jointly assesses both object detection and depth estimation performance. Our results show that the proposed methods outperform state-of-the-art approaches w.r.t. existing, as well as the proposed metrics.