CVSep 28, 2025

2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC

arXiv:2509.23838v1h-index: 32
Originality Synthesis-oriented
AI Analysis

This addresses robustness issues in video object segmentation for applications like video editing, but it is incremental as it applies an existing method to a new dataset.

The paper tackled the problem of semi-supervised video object segmentation by evaluating the Segment Concept (SeC) framework's zero-shot performance on the MOSEv2 dataset, achieving 39.7 JFn and ranking 2nd place in a challenge.

Semi-supervised Video Object Segmentation aims to segment a specified target throughout a video sequence, initialized by a first-frame mask. Previous methods rely heavily on appearance-based pattern matching and thus exhibit limited robustness against challenges such as drastic visual changes, occlusions, and scene shifts. This failure is often attributed to a lack of high-level conceptual understanding of the target. The recently proposed Segment Concept (SeC) framework mitigated this limitation by using a Large Vision-Language Model (LVLM) to establish a deep semantic understanding of the object for more persistent segmentation. In this work, we evaluate its zero-shot performance on the challenging coMplex video Object SEgmentation v2 (MOSEv2) dataset. Without any fine-tuning on the training set, SeC achieved 39.7 \JFn on the test set and ranked 2nd place in the Complex VOS track of the 7th Large-scale Video Object Segmentation Challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes