Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation
This work addresses semantic segmentation for autonomous driving or robotics by improving performance with multi-modal data, though it appears incremental as it builds on existing fusion methods.
The authors tackled RGB-D semantic segmentation by proposing LDFNet, a fusion-based network that incorporates luminance, depth, and color information, which outperforms state-of-the-art systems on the Cityscapes dataset with faster inference speed.
Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we propose a novel solution named LDFNet, which incorporates Luminance, Depth and Color information by a fusion-based network. It includes a sub-network to process depth maps and employs luminance images to assist the depth information in processes. LDFNet outperforms the other state-of-art systems on the Cityscapes dataset, and its inference speed is faster than most of the existing networks. The experimental results show the effectiveness of the proposed multi-modal fusion network and its potential for practical applications.