CVAIMay 26, 2025

CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features

arXiv:2505.19434v111 citationsh-index: 3Has CodeICML
Originality Incremental advance
AI Analysis

This work addresses the problem of computational inefficiency in multi-modal tracking for computer vision researchers, representing an incremental improvement over existing methods.

The paper tackles the challenge of efficiently modeling spatiotemporal features in RGB-X tracking by proposing CSTrack, which integrates RGB and other modalities into compact spatial and temporal features, achieving new state-of-the-art results on mainstream benchmarks.

Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (\eg, depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design. Existing methods often employ two parallel branches to separately process the RGB and X input streams, requiring the model to simultaneously handle two dispersed feature spaces, which complicates both the model structure and computation process. More critically, intra-modality spatial modeling within each dispersed space incurs substantial computational overhead, limiting resources for inter-modality spatial modeling and temporal modeling. To address this, we propose a novel tracker, CSTrack, which focuses on modeling Compact Spatiotemporal features to achieve simple yet effective tracking. Specifically, we first introduce an innovative Spatial Compact Module that integrates the RGB-X dual input streams into a compact spatial feature, enabling thorough intra- and inter-modality spatial modeling. Additionally, we design an efficient Temporal Compact Module that compactly represents temporal features by constructing the refined target distribution heatmap. Extensive experiments validate the effectiveness of our compact spatiotemporal modeling method, with CSTrack achieving new SOTA results on mainstream RGB-X benchmarks. The code and models will be released at: https://github.com/XiaokunFeng/CSTrack.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes