CVDec 2, 2021

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

arXiv:2112.00995v3509 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for better Transformer-based tracking methods for computer vision researchers, though it is incremental as it builds on existing Siamese frameworks.

The paper tackles the problem of under-explored Transformer representation learning in tracking by proposing SwinTrack, a fully-attentional tracker within a Siamese framework, which sets a new record with a 0.713 SUC score on LaSOT and achieves state-of-the-art results on other benchmarks.

Recently Transformer has been largely explored in tracking and shown state-of-the-art (SOTA) performance. However, existing efforts mainly focus on fusing and enhancing features generated by convolutional neural networks (CNNs). The potential of Transformer in representation learning remains under-explored. In this paper, we aim to further unleash the power of Transformer by proposing a simple yet efficient fully-attentional tracker, dubbed SwinTrack, within classic Siamese framework. In particular, both representation learning and feature fusion in SwinTrack leverage the Transformer architecture, enabling better feature interactions for tracking than pure CNN or hybrid CNN-Transformer frameworks. Besides, to further enhance robustness, we present a novel motion token that embeds historical target trajectory to improve tracking by providing temporal context. Our motion token is lightweight with negligible computation but brings clear gains. In our thorough experiments, SwinTrack exceeds existing approaches on multiple benchmarks. Particularly, on the challenging LaSOT, SwinTrack sets a new record with 0.713 SUC score. It also achieves SOTA results on other benchmarks. We expect SwinTrack to serve as a solid baseline for Transformer tracking and facilitate future research. Our codes and results are released at https://github.com/LitingLin/SwinTrack.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes