Incorporating Depth into both CNN and CRF for Indoor Semantic Segmentation
This work addresses indoor scene understanding for robotics or AR applications, but it appears incremental as it builds on existing RGB-D and CRF methods.
The authors tackled indoor semantic segmentation by proposing DFCN-DCRF, a neural network that integrates depth into both CNN and CRF components, achieving state-of-the-art performance in comparative experiments.
To improve segmentation performance, a novel neural network architecture (termed DFCN-DCRF) is proposed, which combines an RGB-D fully convolutional neural network (DFCN) with a depth-sensitive fully-connected conditional random field (DCRF). First, a DFCN architecture which fuses depth information into the early layers and applies dilated convolution for later contextual reasoning is designed. Then, a depth-sensitive fully-connected conditional random field (DCRF) is proposed and combined with the previous DFCN to refine the preliminary result. Comparative experiments show that the proposed DFCN-DCRF has the best performance compared with most state-of-the-art methods.