CVAIMMJul 25, 2023

Spectrum-guided Multi-granularity Referring Video Object Segmentation

arXiv:2307.13537v179 citationsh-index: 71Has Code
Originality Highly original
AI Analysis

This addresses the efficiency and accuracy limitations in video object segmentation for applications like video editing and autonomous systems, though it is incremental with novel method improvements.

The paper tackles the feature drift problem in referring video object segmentation (R-VOS) by proposing a Spectrum-guided Multi-granularity (SgMg) approach, which achieves state-of-the-art performance with a 2.8% improvement on Ref-YouTube-VOS and enables multi-object R-VOS that runs about 3 times faster.

Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (low-resolution) vision-language features to segment the decoded high-resolution features. We discovered that this causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation. This negatively affects the ability of segmentation kernels. To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks. In addition, we propose Spectrum-guided Cross-modal Fusion (SCF) to perform intra-frame global interactions in the spectral domain for effective multimodal representation. Finally, we extend SgMg to perform multi-object R-VOS, a new paradigm that enables simultaneous segmentation of multiple referred objects in a video. This not only makes R-VOS faster, but also more practical. Extensive experiments show that SgMg achieves state-of-the-art performance on four video benchmark datasets, outperforming the nearest competitor by 2.8% points on Ref-YouTube-VOS. Our extended SgMg enables multi-object R-VOS, runs about 3 times faster while maintaining satisfactory performance. Code is available at https://github.com/bo-miao/SgMg.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes