CVAIJan 6, 2024

Real Time Human Detection by Unmanned Aerial Vehicles

arXiv:2401.03275v114 citationsh-index: 122022 International Symposium on iNnovative Informatics of Biskra (ISNIB)
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of real-time human detection for public security applications using UAVs, but it is incremental as it applies an existing YOLO model to TIR data.

The paper tackled human detection in thermal infrared (TIR) images and videos from unmanned aerial vehicles (UAVs), achieving an average precision of 72.5% at IOU=0.5 and a detection speed of 161 frames per second using the YOLOv7 model.

One of the most important problems in computer vision and remote sensing is object detection, which identifies particular categories of diverse things in pictures. Two crucial data sources for public security are the thermal infrared (TIR) remote sensing multi-scenario photos and videos produced by unmanned aerial vehicles (UAVs). Due to the small scale of the target, complex scene information, low resolution relative to the viewable videos, and dearth of publicly available labeled datasets and training models, their object detection procedure is still difficult. A UAV TIR object detection framework for pictures and videos is suggested in this study. The Forward-looking Infrared (FLIR) cameras used to gather ground-based TIR photos and videos are used to create the ``You Only Look Once'' (YOLO) model, which is based on CNN architecture. Results indicated that in the validating task, detecting human object had an average precision at IOU (Intersection over Union) = 0.5, which was 72.5\%, using YOLOv7 (YOLO version 7) state of the art model \cite{1}, while the detection speed around 161 frames per second (FPS/second). The usefulness of the YOLO architecture is demonstrated in the application, which evaluates the cross-detection performance of people in UAV TIR videos under a YOLOv7 model in terms of the various UAVs' observation angles. The qualitative and quantitative evaluation of object detection from TIR pictures and videos using deep-learning models is supported favorably by this work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes