CVIVJan 5, 2022

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation

arXiv:2201.01427v218 citations
AI Analysis

This work addresses the challenge of comprehensive multimodal fusion in RGBD segmentation, offering an incremental improvement for computer vision applications.

The paper tackles the problem of RGBD semantic segmentation by proposing an attention-based dual supervised decoder to better utilize multimodal information, achieving superior performance on NYUDv2 and SUN-RGBD datasets compared to state-of-the-art methods.

Encoder-decoder models have been widely used in RGBD semantic segmentation, and most of them are designed via a two-stream network. In general, jointly reasoning the color and geometric information from RGBD is beneficial for semantic segmentation. However, most existing approaches fail to comprehensively utilize multimodal information in both the encoder and decoder. In this paper, we propose a novel attention-based dual supervised decoder for RGBD semantic segmentation. In the encoder, we design a simple yet effective attention-based multimodal fusion module to extract and fuse deeply multi-level paired complementary information. To learn more robust deep representations and rich multi-modal information, we introduce a dual-branch decoder to effectively leverage the correlations and complementary cues of different tasks. Extensive experiments on NYUDv2 and SUN-RGBD datasets demonstrate that our method achieves superior performance against the state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes