CVOct 10, 2021

Modality-Guided Subnetwork for Salient Object Detection

arXiv:2110.04904v217 citations
Originality Incremental advance
AI Analysis

This addresses the inconvenience of extra hardware and computational costs in saliency detection for computer vision applications, though it is incremental as it builds on existing two-stream designs.

The paper tackles the high cost of depth sensors and computation in RGBD-based salient object detection by introducing a modality-guided subnetwork (MGSnet) that works for both RGB and RGBD data, achieving state-of-the-art performance with real-time inference for RGB models.

Recent RGBD-based models for saliency detection have attracted research attention. The depth clues such as boundary clues, surface normal, shape attribute, etc., contribute to the identification of salient objects with complicated scenarios. However, most RGBD networks require multi-modalities from the input side and feed them separately through a two-stream design, which inevitably results in extra costs on depth sensors and computation. To tackle these inconveniences, we present in this paper a novel fusion design named modality-guided subnetwork (MGSnet). It has the following superior designs: 1) Our model works for both RGB and RGBD data, and dynamically estimating depth if not available. Taking the inner workings of depth-prediction networks into account, we propose to estimate the pseudo-geometry maps from RGB input - essentially mimicking the multi-modality input. 2) Our MGSnet for RGB SOD results in real-time inference but achieves state-of-the-art performance compared to other RGB models. 3) The flexible and lightweight design of MGS facilitates the integration into RGBD two-streaming models. The introduced fusion design enables a cross-modality interaction to enable further progress but with a minimal cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes