EdgeNet: Semantic Scene Completion from a Single RGB-D Image
This work addresses the challenge of generating complete 3D semantic representations from limited viewpoints, which is important for robotics and augmented reality, though it is incremental as it builds on existing end-to-end approaches.
The paper tackles the problem of semantic scene completion from a single RGB-D image by proposing EdgeNet, which uses edge detection and a flipped truncated signed distance to encode color information in 3D space, resulting in a 6.9% improvement over state-of-the-art methods on real data.
Semantic scene completion is the task of predicting a complete 3D representation of volumetric occupancy with corresponding semantic labels for a scene from a single point of view. Previous works on Semantic Scene Completion from RGB-D data used either only depth or depth with colour by projecting the 2D image into the 3D volume resulting in a sparse data representation. In this work, we present a new strategy to encode colour information in 3D space using edge detection and flipped truncated signed distance. We also present EdgeNet, a new end-to-end neural network architecture capable of handling features generated from the fusion of depth and edge information. Experimental results show improvement of 6.9% over the state-of-the-art result on real data, for end-to-end approaches.