CVDec 8, 2020

Multi-modal Visual Tracking: Review and Experimental Comparison

arXiv:2012.04176v144 citations
AI Analysis

This paper provides a comprehensive review and experimental comparison of multi-modal visual tracking methods, which is useful for researchers and practitioners in computer vision working on robust object tracking.

This paper reviews multi-modal visual tracking algorithms, specifically visible-depth (RGB-D) and visible-thermal (RGB-T) tracking, and provides a unified taxonomy. It also conducts extensive experimental comparisons of trackers on five datasets: PTB, VOT19-RGBD, GTOT, RGBT234, and VOT19-RGBT.

Visual object tracking, as a fundamental task in computer vision, has drawn much attention in recent years. To extend trackers to a wider range of applications, researchers have introduced information from multiple modalities to handle specific scenes, which is a promising research prospect with emerging methods and benchmarks. To provide a thorough review of multi-modal track-ing, we summarize the multi-modal tracking algorithms, especially visible-depth (RGB-D) tracking and visible-thermal (RGB-T) tracking in a unified taxonomy from different aspects. Second, we provide a detailed description of the related benchmarks and challenges. Furthermore, we conduct extensive experiments to analyze the effectiveness of trackers on five datasets: PTB, VOT19-RGBD, GTOT, RGBT234, and VOT19-RGBT. Finally, we discuss various future directions from different perspectives, including model design and dataset construction for further research.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes