CVMar 24, 2021

Benchmarking Deep Trackers on Aerial Videos

arXiv:2103.12924v114 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for benchmarking trackers in aerial scenarios, which is incremental as it applies existing methods to new data.

The paper tackled the problem of evaluating deep learning-based visual object trackers on aerial videos, finding that they perform significantly worse compared to ground-level videos due to challenges like smaller target size and camera motion.

In recent years, deep learning-based visual object trackers have achieved state-of-the-art performance on several visual object tracking benchmarks. However, most tracking benchmarks are focused on ground level videos, whereas aerial tracking presents a new set of challenges. In this paper, we compare ten trackers based on deep learning techniques on four aerial datasets. We choose top performing trackers utilizing different approaches, specifically tracking by detection, discriminative correlation filters, Siamese networks and reinforcement learning. In our experiments, we use a subset of OTB2015 dataset with aerial style videos; the UAV123 dataset without synthetic sequences; the UAV20L dataset, which contains 20 long sequences; and DTB70 dataset as our benchmark datasets. We compare the advantages and disadvantages of different trackers in different tracking situations encountered in aerial data. Our findings indicate that the trackers perform significantly worse in aerial datasets compared to standard ground level videos. We attribute this effect to smaller target size, camera motion, significant camera rotation with respect to the target, out of view movement, and clutter in the form of occlusions or similar looking distractors near tracked object.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes