Saliency-Enhanced Robust Visual Tracking
This work addresses robustness issues in visual tracking for applications like surveillance or autonomous systems, but it is incremental as it builds on existing DCF trackers.
The paper tackled the problem of improving visual object tracking robustness in challenging cases like viewpoint changes by incorporating high-level semantic information from deep salient object detection into discrete correlation filter (DCF) based trackers, resulting in consistent improvements over a baseline tracker and superior performance to state-of-the-art methods with a small computational overhead (9.3 fps vs. 11 fps).
Discrete correlation filter (DCF) based trackers have shown considerable success in visual object tracking. These trackers often make use of low to mid level features such as histogram of gradients (HoG) and mid-layer activations from convolution neural networks (CNNs). We argue that including semantically higher level information to the tracked features may provide further robustness to challenging cases such as viewpoint changes. Deep salient object detection is one example of such high level features, as it make use of semantic information to highlight the important regions in the given scene. In this work, we propose an improvement over DCF based trackers by combining saliency based and other features based filter responses. This combination is performed with an adaptive weight on the saliency based filter responses, which is automatically selected according to the temporal consistency of visual saliency. We show that our method consistently improves a baseline DCF based tracker especially in challenging cases and performs superior to the state-of-the-art. Our improved tracker operates at 9.3 fps, introducing a small computational burden over the baseline which operates at 11 fps.