CVNov 17, 2025

Segment Anything Across Shots: A Method and Benchmark

arXiv:2511.13715v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

It addresses a practical limitation in video segmentation for real-world applications by enabling cross-shot generalization, though it is incremental as it builds on existing VOS methods.

This work tackles the problem of multi-shot semi-supervised video object segmentation (MVOS), where existing methods struggle with shot discontinuities, and proposes the SAAS model with a transition mimicking augmentation strategy, achieving state-of-the-art performance on benchmarks like YouMVOS and Cut-VOS.

This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. The existing VOS methods mainly focus on single-shot videos and struggle with shot discontinuities, thereby limiting their real-world applicability. We propose a transition mimicking data augmentation strategy (TMA) which enables cross-shot generalization with single-shot data to alleviate the severe annotated multi-shot data sparsity, and the Segment Anything Across Shots (SAAS) model, which can detect and comprehend shot transitions effectively. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed SAAS achieves state-of-the-art performance by effectively mimicking, understanding, and segmenting across complex transitions. The code and datasets are released at https://henghuiding.com/SAAS/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes