CVApr 18, 2025

HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework

arXiv:2504.13579v16 citationsh-index: 4Has CodeIEEE Signal Processing Letters
Originality Incremental advance
AI Analysis

This addresses the problem of efficient RGB-D semantic segmentation for indoor scene understanding, representing an incremental improvement over existing methods.

The paper tackles the challenge of effectively integrating RGB and depth information for indoor semantic segmentation by proposing HDBFormer, a heterogeneous dual-branch framework that processes each modality differently, achieving state-of-the-art performance on NYUDepthv2 and SUN-RGBD datasets.

In RGB-D semantic segmentation for indoor scenes, a key challenge is effectively integrating the rich color information from RGB images with the spatial distance information from depth images. However, most existing methods overlook the inherent differences in how RGB and depth images express information. Properly distinguishing the processing of RGB and depth images is essential to fully exploiting their unique and significant characteristics. To address this, we propose a novel heterogeneous dual-branch framework called HDBFormer, specifically designed to handle these modality differences. For RGB images, which contain rich detail, we employ both a basic and detail encoder to extract local and global features. For the simpler depth images, we propose LDFormer, a lightweight hierarchical encoder that efficiently extracts depth features with fewer parameters. Additionally, we introduce the Modality Information Interaction Module (MIIM), which combines transformers with large kernel convolutions to interact global and local information across modalities efficiently. Extensive experiments show that HDBFormer achieves state-of-the-art performance on the NYUDepthv2 and SUN-RGBD datasets. The code is available at: https://github.com/Weishuobin/HDBFormer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes