CV AI ROJan 9, 2025

A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision

Ali Rohan, Md Junayed Hasan, Andrei Petrovski

arXiv:2501.05147v18.46 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

It provides a comprehensive synthesis for researchers in computer vision, addressing a gap as previous reviews were not systematic or focused on specific techniques.

This systematic literature review surveyed 59 high-quality studies on deep learning-based depth estimation, identifying 20 datasets, 29 evaluation metrics, and 35 base models, with KITTI, NYU Depth V2, and Make 3D as the most used datasets and ResNet-50 as a top model.

Depth estimation (DE) provides spatial information about a scene and enables tasks such as 3D reconstruction, object detection, and scene understanding. Recently, there has been an increasing interest in using deep learning (DL)-based methods for DE. Traditional techniques rely on handcrafted features that often struggle to generalise to diverse scenes and require extensive manual tuning. However, DL models for DE can automatically extract relevant features from input data, adapt to various scene conditions, and generalise well to unseen environments. Numerous DL-based methods have been developed, making it necessary to survey and synthesize the state-of-the-art (SOTA). Previous reviews on DE have mainly focused on either monocular or stereo-based techniques, rather than comprehensively reviewing DE. Furthermore, to the best of our knowledge, there is no systematic literature review (SLR) that comprehensively focuses on DE. Therefore, this SLR study is being conducted. Initially, electronic databases were searched for relevant publications, resulting in 1284 publications. Using defined exclusion and quality criteria, 128 publications were shortlisted and further filtered to select 59 high-quality primary studies. These studies were analysed to extract data and answer defined research questions. Based on the results, DL methods were developed for mainly three different types of DE: monocular, stereo, and multi-view. 20 publicly available datasets were used to train, test, and evaluate DL models for DE, with KITTI, NYU Depth V2, and Make 3D being the most used datasets. 29 evaluation metrics were used to assess the performance of DE. 35 base models were reported in the primary studies, and the top five most-used base models were ResNet-50, ResNet-18, ResNet-101, U-Net, and VGG-16. Finally, the lack of ground truth data was among the most significant challenges reported by primary studies.

View on arXiv PDF

Similar