CVNov 26, 2024

A Distractor-Aware Memory for Visual Object Tracking with SAM2

arXiv:2411.17576v270 citationsh-index: 23CVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of tracking robustness in the presence of distractors for the visual object tracking community, representing an incremental improvement over existing memory-based methods.

The paper tackles the problem of distractors in visual object tracking by proposing a distractor-aware memory model and introspection-based update strategy for SAM2, resulting in a tracker called SAM2.1++ that outperforms SAM2.1 and related extensions on seven benchmarks, setting a new state-of-the-art on six of them.

Memory-based trackers are video object segmentation methods that form the target model by concatenating recently tracked frames into a memory buffer and localize the target by attending the current image to the buffered frames. While already achieving top performance on many benchmarks, it was the recent release of SAM2 that placed memory-based trackers into focus of the visual object tracking community. Nevertheless, modern trackers still struggle in the presence of distractors. We argue that a more sophisticated memory model is required, and propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness. The resulting tracker is denoted as SAM2.1++. We also propose a new distractor-distilled DiDi dataset to study the distractor problem better. SAM2.1++ outperforms SAM2.1 and related SAM memory extensions on seven benchmarks and sets a solid new state-of-the-art on six of them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes