CVIVJul 25, 2025

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation

arXiv:2507.18944v13 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses occlusion challenges in video object segmentation for computer vision applications, offering incremental improvements in accuracy and speed.

The paper tackles the problem of semi-supervised video object segmentation, where existing methods struggle with occlusion and object interactions, by proposing OASIS, a method that improves segmentation accuracy through boundary refinement and uncertainty estimation, achieving F values of 91.6 on DAVIS-17 and 86.6 on YouTubeVOS 2019 with 48 FPS.

Given an object mask, Semi-supervised Video Object Segmentation (SVOS) technique aims to track and segment the object across video frames, serving as a fundamental task in computer vision. Although recent memory-based methods demonstrate potential, they often struggle with scenes involving occlusion, particularly in handling object interactions and high feature similarity. To address these issues and meet the real-time processing requirements of downstream applications, in this paper, we propose a novel bOundary Amendment video object Segmentation method with Inherent Structure refinement, hereby named OASIS. Specifically, a lightweight structure refinement module is proposed to enhance segmentation accuracy. With the fusion of rough edge priors captured by the Canny filter and stored object features, the module can generate an object-level structure map and refine the representations by highlighting boundary features. Evidential learning for uncertainty estimation is introduced to further address challenges in occluded regions. The proposed method, OASIS, maintains an efficient design, yet extensive experiments on challenging benchmarks demonstrate its superior performance and competitive inference speed compared to other state-of-the-art methods, i.e., achieving the F values of 91.6 (vs. 89.7 on DAVIS-17 validation set) and G values of 86.6 (vs. 86.2 on YouTubeVOS 2019 validation set) while maintaining a competitive speed of 48 FPS on DAVIS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes