SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2
This addresses memory control in visual object tracking for improved handling of distractors, occlusions, and motion, though it is incremental as it builds on SAM 2.
The paper tackled the problem of optimizing memory updates in Segment Anything Model 2 (SAM 2) for visual object tracking by using reinforcement learning instead of hand-crafted rules, achieving a relative improvement over SAM 2 that exceeds by more than three times the gains of existing heuristics.
Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks and has become the state-of-the-art for visual object tracking. The model stores information from previous frames in a memory bank, enabling temporal consistency across video sequences. Recent methods augment SAM 2 with hand-crafted update rules to better handle distractors, occlusions, and object motion. We propose a fundamentally different approach using reinforcement learning for optimizing memory updates in SAM 2 by framing memory control as a sequential decision-making problem. In an overfitting setup with a separate agent per video, our method achieves a relative improvement over SAM 2 that exceeds by more than three times the gains of existing heuristics. These results reveal the untapped potential of the memory bank and highlight reinforcement learning as a powerful alternative to hand-crafted update rules for memory control in visual object tracking.