Tracking the Untrackable
This addresses a specific robustness issue in visual object tracking for applications like surveillance or autonomous systems, representing an incremental improvement over existing trackers.
The paper tackles the problem of visual object tracking failures during short-term occlusions by introducing a HAllucinating Features to Track (HAFT) model that forecasts future visual feature embeddings to anticipate target trajectories, achieving promising results on datasets including OTB100, VOT2018, LaSOT, TrackingNet, and UAV123.
Although short-term fully occlusion happens rare in visual object tracking, most trackers will fail under these circumstances. However, humans can still catch up the target by anticipating the trajectory of the target even the target is invisible. Recent psychology also has shown that humans build the mental image of the future. Inspired by that, we present a HAllucinating Features to Track (HAFT) model that enables to forecast the visual feature embedding of future frames. The anticipated future frames focus on the movement of the target while hallucinating the occluded part of the target. Jointly tracking on the hallucinated features and the real features improves the robustness of the tracker even when the target is highly occluded. Through extensive experimental evaluations, we achieve promising results on multiple datasets: OTB100, VOT2018, LaSOT, TrackingNet, and UAV123.