Multi-modal Visual Tracking: Review and Experimental Comparison
This paper provides a comprehensive review and experimental comparison of multi-modal visual tracking methods, which is useful for researchers and practitioners in computer vision working on robust object tracking.
This paper reviews multi-modal visual tracking algorithms, specifically visible-depth (RGB-D) and visible-thermal (RGB-T) tracking, and provides a unified taxonomy. It also conducts extensive experimental comparisons of trackers on five datasets: PTB, VOT19-RGBD, GTOT, RGBT234, and VOT19-RGBT.
Visual object tracking, as a fundamental task in computer vision, has drawn much attention in recent years. To extend trackers to a wider range of applications, researchers have introduced information from multiple modalities to handle specific scenes, which is a promising research prospect with emerging methods and benchmarks. To provide a thorough review of multi-modal track-ing, we summarize the multi-modal tracking algorithms, especially visible-depth (RGB-D) tracking and visible-thermal (RGB-T) tracking in a unified taxonomy from different aspects. Second, we provide a detailed description of the related benchmarks and challenges. Furthermore, we conduct extensive experiments to analyze the effectiveness of trackers on five datasets: PTB, VOT19-RGBD, GTOT, RGBT234, and VOT19-RGBT. Finally, we discuss various future directions from different perspectives, including model design and dataset construction for further research.