CVAIApr 14, 2025

MASSeg : 2nd Technical Report for 4th PVUW MOSE Track

arXiv:2504.10254v1h-index: 23
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for video object segmentation in computer vision, addressing specific issues like occlusion and small targets.

The paper tackles challenges in complex video object segmentation, such as small object recognition and occlusion handling, by proposing an improved model MASSeg and an enhanced dataset MOSE+, achieving a J&F score of 0.8628 on the MOSE test set.

Complex video object segmentation continues to face significant challenges in small object recognition, occlusion handling, and dynamic scene modeling. This report presents our solution, which ranked second in the MOSE track of CVPR 2025 PVUW Challenge. Based on an existing segmentation framework, we propose an improved model named MASSeg for complex video object segmentation, and construct an enhanced dataset, MOSE+, which includes typical scenarios with occlusions, cluttered backgrounds, and small target instances. During training, we incorporate a combination of inter-frame consistent and inconsistent data augmentation strategies to improve robustness and generalization. During inference, we design a mask output scaling strategy to better adapt to varying object sizes and occlusion levels. As a result, MASSeg achieves a J score of 0.8250, F score of 0.9007, and a J&F score of 0.8628 on the MOSE test set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes