Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries
This work addresses a specific bottleneck in depth estimation for computer vision applications, offering incremental improvements over existing methods.
The paper tackles the problem of low spatial resolution and blurry object boundaries in single image depth estimation by proposing an improved network architecture with multi-scale feature fusion and a complementary loss function. Experimental results show that these improvements achieve higher accuracy than current state-of-the-art methods, particularly in finer resolution reconstruction of small objects and boundaries.
This paper considers the problem of single image depth estimation. The employment of convolutional neural networks (CNNs) has recently brought about significant advancements in the research of this problem. However, most existing methods suffer from loss of spatial resolution in the estimated depth maps; a typical symptom is distorted and blurry reconstruction of object boundaries. In this paper, toward more accurate estimation with a focus on depth maps with higher spatial resolution, we propose two improvements to existing approaches. One is about the strategy of fusing features extracted at different scales, for which we propose an improved network architecture consisting of four modules: an encoder, decoder, multi-scale feature fusion module, and refinement module. The other is about loss functions for measuring inference errors used in training. We show that three loss terms, which measure errors in depth, gradients and surface normals, respectively, contribute to improvement of accuracy in an complementary fashion. Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries.