CVDec 4, 2017

Long-Term Visual Object Tracking Benchmark

arXiv:1712.01358v488 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for better evaluation of visual object tracking algorithms in real-world long-term scenarios, though it's incremental as it primarily provides a new dataset rather than a novel method.

The authors introduced TLP, a new long video dataset with 50 HD videos totaling over 400 minutes, which is 20x longer per sequence than existing datasets, and benchmarked 17 state-of-the-art trackers to assess long-term tracking performance.

We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for single object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking. The proposed dataset paves a way to suitably assess long term tracking performance and train better deep learning architectures (avoiding/reducing augmentation, which may not reflect real world behaviour). We benchmark the dataset on 17 state of the art trackers and rank them according to tracking accuracy and run time speeds. We further present thorough qualitative and quantitative evaluation highlighting the importance of long term aspect of tracking. Our most interesting observations are (a) existing short sequence benchmarks fail to bring out the inherent differences in tracking algorithms which widen up while tracking on long sequences and (b) the accuracy of trackers abruptly drops on challenging long sequences, suggesting the potential need of research efforts in the direction of long-term tracking.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes