CVApr 23, 2025

RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory

arXiv:2504.16471v14 citationsh-index: 21ICMR
Originality Incremental advance
AI Analysis

This work improves segmentation accuracy for applications like robotics or autonomous systems, but it is incremental as it builds on existing methods with enhancements.

The paper tackles RGB-D video object segmentation by integrating RGB and depth modalities to address cross-modal information exploration and object drift issues, achieving state-of-the-art performance on the latest benchmark.

The RGB-Depth (RGB-D) Video Object Segmentation (VOS) aims to integrate the fine-grained texture information of RGB with the spatial geometric clues of depth modality, boosting the performance of segmentation. However, off-the-shelf RGB-D segmentation methods fail to fully explore cross-modal information and suffer from object drift during long-term prediction. In this paper, we propose a novel RGB-D VOS method via multi-store feature memory for robust segmentation. Specifically, we design the hierarchical modality selection and fusion, which adaptively combines features from both modalities. Additionally, we develop a segmentation refinement module that effectively utilizes the Segmentation Anything Model (SAM) to refine the segmentation mask, ensuring more reliable results as memory to guide subsequent segmentation tasks. By leveraging spatio-temporal embedding and modality embedding, mixed prompts and fused images are fed into SAM to unleash its potential in RGB-D VOS. Experimental results show that the proposed method achieves state-of-the-art performance on the latest RGB-D VOS benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes