CVMay 8, 2022

Transformer Tracking with Cyclic Shifting Window Attention

arXiv:2205.03806v1216 citationsh-index: 24Has Code
Originality Highly original
AI Analysis

This work addresses visual object tracking for computer vision applications, offering a novel architectural improvement over existing transformer-based approaches.

The paper tackles the problem of visual object tracking by proposing a transformer architecture with multi-scale cyclic shifting window attention, which elevates attention from pixel to window level to better preserve object integrity. The method achieves state-of-the-art performance on five challenging datasets, including VOT2020, UAV123, LaSOT, TrackingNet, and GOT-10k.

Transformer architecture has been showing its great strength in visual object tracking, for its effective attention mechanism. Existing transformer-based approaches adopt the pixel-to-pixel attention strategy on flattened image features and unavoidably ignore the integrity of objects. In this paper, we propose a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level. The cross-window multi-scale attention has the advantage of aggregating attention at different scales and generates the best fine-scale match for the target object. Furthermore, the cyclic shifting strategy brings greater accuracy by expanding the window samples with positional information, and at the same time saves huge amounts of computational power by removing redundant calculations. Extensive experiments demonstrate the superior performance of our method, which also sets the new state-of-the-art records on five challenging datasets, along with the VOT2020, UAV123, LaSOT, TrackingNet, and GOT-10k benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes