OAMVOS:2nd Report for 5th PVUW MOSE Track
For video object segmentation in challenging scenarios with occlusion and reappearance, this work provides an incremental improvement over existing SAM-based trackers.
The paper addresses the fragility of SAM-based dense trackers under long occlusion, fast motion, and small objects, and presents an occlusion- and reappearance-aware extension that improves memory control. The method achieves a 3.7% J&F gain on the MOSE validation set and ranks 1st in the 5th PVUW MOSE Track.
SAM-based dense trackers provide strong short-term mask propagation but remain fragile under long occlusion, fast motion, viewpoint change, and distractors. The problem is especially severe for small objects, where a few incorrect memory updates can dominate later predictions. This report presents an occlusion- and reappearance-aware extension of DAM4SAM that improves memory control rather than changing the backbone. The method augments the original SAM3 tracker with four ingredients: a reliability-aware tracking state machine, branch-based recovery, delayed DRM promotion, and a selective policy for native SAM3 memory selection. During stable tracking, the model follows the original single-path propagation process. Once confidence drops, the tracker enters an ambiguous or recovery mode, maintains a small set of candidate branches, and commits memory only after a branch is reconfirmed. For small-object disappearance and reappearance, native memory selection is temporarily bypassed so older anchors remain accessible. In addition, the first conditioning frame is explicitly preserved, and the conditioning-memory budget is moderately enlarged to improve long-gap recovery. The resulting design keeps DAM4SAM efficient in easy cases while improving robustness in sequences dominated by occlusion and reappearance.