CVAIApr 20

OAMVOS:2nd Report for 5th PVUW MOSE Track

arXiv:2604.2283770.2
AI Analysis

For video object segmentation in challenging scenarios with occlusion and reappearance, this work provides an incremental improvement over existing SAM-based trackers.

The paper addresses the fragility of SAM-based dense trackers under long occlusion, fast motion, and small objects, and presents an occlusion- and reappearance-aware extension that improves memory control. The method achieves a 3.7% J&F gain on the MOSE validation set and ranks 1st in the 5th PVUW MOSE Track.

SAM-based dense trackers provide strong short-term mask propagation but remain fragile under long occlusion, fast motion, viewpoint change, and distractors. The problem is especially severe for small objects, where a few incorrect memory updates can dominate later predictions. This report presents an occlusion- and reappearance-aware extension of DAM4SAM that improves memory control rather than changing the backbone. The method augments the original SAM3 tracker with four ingredients: a reliability-aware tracking state machine, branch-based recovery, delayed DRM promotion, and a selective policy for native SAM3 memory selection. During stable tracking, the model follows the original single-path propagation process. Once confidence drops, the tracker enters an ambiguous or recovery mode, maintains a small set of candidate branches, and commits memory only after a branch is reconfirmed. For small-object disappearance and reappearance, native memory selection is temporarily bypassed so older anchors remain accessible. In addition, the first conditioning frame is explicitly preserved, and the conditioning-memory budget is moderately enlarged to improve long-gap recovery. The resulting design keeps DAM4SAM efficient in easy cases while improving robustness in sequences dominated by occlusion and reappearance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes