Global-Local Propagation Network for RGB-D Semantic Segmentation
This work addresses the challenge of effectively integrating depth data for indoor scene segmentation, representing an incremental improvement over existing multi-stage fusion methods.
The paper tackled the problem of insufficient depth information utilization in RGB-D semantic segmentation by proposing a Global-Local Propagation Network (GLPNet) with local and global context fusion modules, achieving state-of-the-art performance on NYU-Depth v2 and SUN-RGBD datasets.
Depth information matters in RGB-D semantic segmentation task for providing additional geometric information to color images. Most existing methods exploit a multi-stage fusion strategy to propagate depth feature to the RGB branch. However, at the very deep stage, the propagation in a simple element-wise addition manner can not fully utilize the depth information. We propose Global-Local propagation network (GLPNet) to solve this problem. Specifically, a local context fusion module(L-CFM) is introduced to dynamically align both modalities before element-wise fusion, and a global context fusion module(G-CFM) is introduced to propagate the depth information to the RGB branch by jointly modeling the multi-modal global context features. Extensive experiments demonstrate the effectiveness and complementarity of the proposed fusion modules. Embedding two fusion modules into a two-stream encoder-decoder structure, our GLPNet achieves new state-of-the-art performance on two challenging indoor scene segmentation datasets, i.e., NYU-Depth v2 and SUN-RGBD dataset.