CVMay 6, 2021

Spatio-Temporal Matching for Siamese Visual Tracking

arXiv:2105.02408v1
AI Analysis

This work addresses the challenge of robust object tracking in computer vision, offering incremental improvements for applications like surveillance and autonomous systems.

The paper tackled the problem of ambiguous similarity matching in Siamese visual trackers by proposing a spatio-temporal matching process that leverages 4-D information, achieving state-of-the-art performance on benchmarks like OTB100, VOT2018, VOT2020, GOT-10k, and LaSOT.

Similarity matching is a core operation in Siamese trackers. Most Siamese trackers carry out similarity learning via cross correlation that originates from the image matching field. However, unlike 2-D image matching, the matching network in object tracking requires 4-D information (height, width, channel and time). Cross correlation neglects the information from channel and time dimensions, and thus produces ambiguous matching. This paper proposes a spatio-temporal matching process to thoroughly explore the capability of 4-D matching in space (height, width and channel) and time. In spatial matching, we introduce a space-variant channel-guided correlation (SVC-Corr) to recalibrate channel-wise feature responses for each spatial location, which can guide the generation of the target-aware matching features. In temporal matching, we investigate the time-domain context relations of the target and the background and develop an aberrance repressed module (ARM). By restricting the abrupt alteration in the interframe response maps, our ARM can clearly suppress aberrances and thus enables more robust and accurate object tracking. Furthermore, a novel anchor-free tracking framework is presented to accommodate these innovations. Experiments on challenging benchmarks including OTB100, VOT2018, VOT2020, GOT-10k, and LaSOT demonstrate the state-of-the-art performance of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes