CVFeb 22, 2024

Reading Relevant Feature from Global Representation Memory for Visual Object Tracking

arXiv:2402.14392v327 citationsh-index: 11NIPS
AI Analysis

This addresses inefficiencies in tracking for video analysis applications, representing an incremental improvement over prior methods.

The paper tackles the problem of redundancy in visual object tracking by proposing a relevance attention mechanism and global representation memory that adaptively selects relevant historical information, achieving competitive performance on five datasets at 71 FPS.

Reference features from a template or historical frames are crucial for visual object tracking. Prior works utilize all features from a fixed template or memory for visual object tracking. However, due to the dynamic nature of videos, the required reference historical information for different search regions at different time steps is also inconsistent. Therefore, using all features in the template and memory can lead to redundancy and impair tracking performance. To alleviate this issue, we propose a novel tracking paradigm, consisting of a relevance attention mechanism and a global representation memory, which can adaptively assist the search region in selecting the most relevant historical information from reference features. Specifically, the proposed relevance attention mechanism in this work differs from previous approaches in that it can dynamically choose and build the optimal global representation memory for the current frame by accessing cross-frame information globally. Moreover, it can flexibly read the relevant historical information from the constructed memory to reduce redundancy and counteract the negative effects of harmful information. Extensive experiments validate the effectiveness of the proposed method, achieving competitive performance on five challenging datasets with 71 FPS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes