CVAIMMOct 27, 2022

ProContEXT: Exploring Progressive Context Transformer for Tracking

CMUUW
arXiv:2210.15511v455 citationsh-index: 27
Originality Highly original
AI Analysis

This addresses tracking failures in fast-changing and crowded scenes for computer vision applications, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the problem of visual object tracking failing in dynamic scenes by proposing ProContEXT, a transformer-based tracker that uses progressive context encoding to exploit spatial and temporal contexts, achieving state-of-the-art performance on benchmarks like GOT-10k and TrackingNet.

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. To this end, we revamped the tracking framework with Progressive Context Encoding Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories. Specifically, ProContEXT leverages a context-aware self-attention module to encode the spatial and temporal context, refining and updating the multi-scale static and dynamic templates to progressively perform accurately tracking. It explores the complementary between spatial and temporal context, raising a new pathway to multi-context modeling for transformer-based trackers. In addition, ProContEXT revised the token pruning technique to reduce computational complexity. Extensive experiments on popular benchmark datasets such as GOT-10k and TrackingNet demonstrate that the proposed ProContEXT achieves state-of-the-art performance.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes