CVAIMMJan 26, 2023

Self-Supervised RGB-T Tracking with Cross-Input Consistency

arXiv:2301.11274v11 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the need for efficient tracking in applications like surveillance by reducing annotation costs, though it is incremental as it builds on existing Siamese correlation filter networks.

The paper tackles the problem of training RGB-T (RGB and thermal) trackers without annotated data by proposing a self-supervised method based on cross-input consistency, which outperforms seven supervised trackers on the GTOT dataset.

In this paper, we propose a self-supervised RGB-T tracking method. Different from existing deep RGB-T trackers that use a large number of annotated RGB-T image pairs for training, our RGB-T tracker is trained using unlabeled RGB-T video pairs in a self-supervised manner. We propose a novel cross-input consistency-based self-supervised training strategy based on the idea that tracking can be performed using different inputs. Specifically, we construct two distinct inputs using unlabeled RGB-T video pairs. We then track objects using these two inputs to generate results, based on which we construct our cross-input consistency loss. Meanwhile, we propose a reweighting strategy to make our loss function robust to low-quality training samples. We build our tracker on a Siamese correlation filter network. To the best of our knowledge, our tracker is the first self-supervised RGB-T tracker. Extensive experiments on two public RGB-T tracking benchmarks demonstrate that the proposed training strategy is effective. Remarkably, despite training only with a corpus of unlabeled RGB-T video pairs, our tracker outperforms seven supervised RGB-T trackers on the GTOT dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes