Learning regression and verification networks for long-term visual tracking
This work addresses the problem of long-term visual tracking for practical systems, which is incremental as it builds on existing tracking methods with a novel hybrid approach.
The authors tackled long-term visual tracking by proposing a framework that combines offline-trained regression and online-updated verification networks to determine target presence and perform re-detection when absent, achieving best performance on the VOT2018 long-term challenge and state-of-the-art results on the OxUvA dataset.
Compared with short-term tracking, the long-term tracking task requires determining the tracked object is present or absent, and then estimating the accurate bounding box if present or conducting image-wide re-detection if absent. Until now, few attempts have been done although this task is much closer to designing practical tracking systems. In this work, we propose a novel long-term tracking framework based on deep regression and verification networks. The offline-trained regression model is designed using the object-aware feature fusion and region proposal networks to generate a series of candidates and estimate their similarity scores effectively. The verification network evaluates these candidates to output the optimal one as the tracked object with its classification score, which is online updated to adapt to the appearance variations based on newly reliable observations. The similarity and classification scores are combined to obtain a final confidence value, based on which our tracker can determine the absence of the target accurately and conduct image-wide re-detection to capture the target successfully when it reappears. Extensive experiments show that our tracker achieves the best performance on the VOT2018 long-term challenge and state-of-the-art results on the OxUvA long-term dataset.