CVMar 24

Re-Prompting SAM 3 via Object Retrieval: 3rd of the 5th PVUW MOSE Track

arXiv:2603.2378865.0h-index: 2
AI Analysis

This is an incremental improvement for video object segmentation in computer vision, addressing specific challenges in the MOSEv2 benchmark.

The paper tackles complex semi-supervised video object segmentation by developing an automatic re-prompting framework based on SAM 3 to improve robustness against target disappearance, reappearance, and distractors, achieving a J&F score of 51.17% and ranking 3rd in the MOSEv2 track.

This technical report explores the MOSEv2 track of the PVUW 2026 Challenge, which targets complex semi-supervised video object segmentation. Built on SAM~3, we develop an automatic re-prompting framework to improve robustness under target disappearance and reappearance, severe transformation, and strong same-category distractors. Our method first applies the SAM~3 detector to later frames to identify same-category object candidates, and then performs DINOv3-based object-level matching with a transformation-aware target feature pool to retrieve reliable target anchors. These anchors are injected back into the SAM~3 tracker together with the first-frame mask, enabling multi-anchor propagation rather than relying solely on the initial prompt. This simple directly benefits several core challenges of MOSEv2. Our solution achieves a J&F of 51.17% on the test set, ranking 3rd in the MOSEv2 track.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes