High Quality Monocular Depth Estimation via Transfer Learning
This work addresses the need for accurate depth estimation in applications like scene understanding and reconstruction, though it is incremental as it builds on existing encoder-decoder architectures with transfer learning.
The paper tackles the problem of low-resolution and blurry monocular depth estimation by introducing a convolutional neural network that uses transfer learning to achieve high-resolution depth maps from single RGB images, outperforming state-of-the-art methods on two datasets with fewer parameters and training iterations.
Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction. Existing solutions for depth estimation often produce blurry approximations of low resolution. This paper presents a convolutional neural network for computing a high-resolution depth map given a single RGB image with the help of transfer learning. Following a standard encoder-decoder architecture, we leverage features extracted using high performing pre-trained networks when initializing our encoder along with augmentation and training strategies that lead to more accurate results. We show how, even for a very simple decoder, our method is able to achieve detailed high-resolution depth maps. Our network, with fewer parameters and training iterations, outperforms state-of-the-art on two datasets and also produces qualitatively better results that capture object boundaries more faithfully. Code and corresponding pre-trained weights are made publicly available.