Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion
This work addresses depth estimation from single images, a key problem in computer vision for applications like autonomous driving and robotics, but it appears incremental as it builds on existing methods with architectural improvements.
The paper tackled monocular depth estimation by introducing a novel network architecture with multi-scale feature fusion and a new loss function, achieving state-of-the-art performance with fewer parameters on datasets like Make 3D, NYU Depth V2, and Kitti.
Depth estimation from monocular images is a challenging problem in computer vision. In this paper, we tackle this problem using a novel network architecture using multi scale feature fusion. Our network uses two different blocks, first which uses different filter sizes for convolution and merges all the individual feature maps. The second block uses dilated convolutions in place of fully connected layers thus reducing computations and increasing the receptive field. We present a new loss function for training the network which uses a depth regression term, SSIM loss term and a multinomial logistic loss term combined. We train and test our network on Make 3D dataset, NYU Depth V2 dataset and Kitti dataset using standard evaluation metrics for depth estimation comprised of RMSE loss and SILog loss. Our network outperforms previous state of the art methods with lesser parameters.