Analysis and Adaptation of YOLOv4 for Object Detection in Aerial Images
This work addresses the need for accurate and computationally efficient object detection for autonomous aerial flight systems, but it is incremental as it adapts an existing method to a specific domain.
The authors tackled object detection in aerial images by adapting YOLOv4, achieving a mean average precision of 45.64% and an inference speed of 8.7 FPS on a Tesla K80 GPU, with high accuracy for truncated and occluded objects.
The recent and rapid growth in Unmanned Aerial Vehicles (UAVs) deployment for various computer vision tasks has paved the path for numerous opportunities to make them more effective and valuable. Object detection in aerial images is challenging due to variations in appearance, pose, and scale. Autonomous aerial flight systems with their inherited limited memory and computational power demand accurate and computationally efficient detection algorithms for real-time applications. Our work shows the adaptation of the popular YOLOv4 framework for predicting the objects and their locations in aerial images with high accuracy and inference speed. We utilized transfer learning for faster convergence of the model on the VisDrone DET aerial object detection dataset. The trained model resulted in a mean average precision (mAP) of 45.64% with an inference speed reaching 8.7 FPS on the Tesla K80 GPU and was highly accurate in detecting truncated and occluded objects. We experimentally evaluated the impact of varying network resolution sizes and training epochs on the performance. A comparative study with several contemporary aerial object detectors proved that YOLOv4 performed better, implying a more suitable detection algorithm to incorporate on aerial platforms.