Learning Spatial Distribution of Long-Term Trackers Scores
This work addresses the re-detection issue in long-term tracking for computer vision applications, presenting an incremental improvement over existing fusion-tracker approaches.
The paper tackles the re-detection problem in long-term visual tracking by generalizing fusion strategies to an arbitrary number of baseline trackers, using a learning phase to correlate outcomes even without targets. It achieves a recall of 0.738 on LTB-50 when learning from VOT-LT2022 and 0.619 in the reverse case, with results strongly competitive with state-of-the-art.
Long-Term tracking is a hot topic in Computer Vision. In this context, competitive models are presented every year, showing a constant growth rate in performances, mainly measured in standardized protocols as Visual Object Tracking (VOT) and Object Tracking Benchmark (OTB). Fusion-trackers strategy has been applied over last few years for overcoming the known re-detection problem, turning out to be an important breakthrough. Following this approach, this work aims to generalize the fusion concept to an arbitrary number of trackers used as baseline trackers in the pipeline, leveraging a learning phase to better understand how outcomes correlate with each other, even when no target is present. A model and data independence conjecture will be evidenced in the manuscript, yielding a recall of 0.738 on LTB-50 dataset when learning from VOT-LT2022, and 0.619 by reversing the two datasets. In both cases, results are strongly competitive with state-of-the-art and recall turns out to be the first on the podium.