Dilated Fully Convolutional Neural Network for Depth Estimation from a Single Image
This work improves depth estimation for 3D scene understanding, but it is incremental as it builds on existing CNN methods with specific architectural modifications.
The paper tackles the problem of depth estimation from a single image by addressing resolution loss and memory issues in traditional CNNs, resulting in predictions considerably closer to ground truth on the NYU Depth V2 dataset.
Depth prediction plays a key role in understanding a 3D scene. Several techniques have been developed throughout the years, among which Convolutional Neural Network has recently achieved state-of-the-art performance on estimating depth from a single image. However, traditional CNNs suffer from the lower resolution and information loss caused by the pooling layers. And oversized parameters generated from fully connected layers often lead to a exploded memory usage problem. In this paper, we present an advanced Dilated Fully Convolutional Neural Network to address the deficiencies. Taking advantages of the exponential expansion of the receptive field in dilated convolutions, our model can minimize the loss of resolution. It also reduces the amount of parameters significantly by replacing the fully connected layers with the fully convolutional layers. We show experimentally on NYU Depth V2 datasets that the depth prediction obtained from our model is considerably closer to ground truth than that from traditional CNNs techniques.