Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking
This work addresses robust object tracking in RGB-thermal video for applications like surveillance, though it appears incremental by combining existing cues with a novel fusion method.
The authors tackled the problem of robust RGB-T tracking by jointly modeling appearance and motion cues, achieving significantly better performance than state-of-the-art algorithms on three recent datasets.
In this study, we propose a novel RGB-T tracking framework by jointly modeling both appearance and motion cues. First, to obtain a robust appearance model, we develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities. The fusion weights are determined by using offline-trained global and local multimodal fusion networks, and then adopted to linearly combine the response maps of RGB and T modalities. Second, when the appearance cue is unreliable, we comprehensively take motion cues, i.e., target and camera motions, into account to make the tracker robust. We further propose a tracker switcher to switch the appearance and motion trackers flexibly. Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.