Versatile Depth Estimator Based on Common Relative Depth Estimation and Camera-Specific Relative-to-Metric Depth Conversion
This addresses the issue of camera-specific performance degradation in depth estimation for computer vision applications, though it is incremental as it builds on existing relative depth estimation methods.
The paper tackles the problem of monocular depth estimation performance dropping when using different cameras by proposing a versatile depth estimator (VDE) that combines a common relative depth estimator with camera-specific converters, achieving state-of-the-art performance with only a 1.12% parameter increase per camera.
A typical monocular depth estimator is trained for a single camera, so its performance drops severely on images taken with different cameras. To address this issue, we propose a versatile depth estimator (VDE), composed of a common relative depth estimator (CRDE) and multiple relative-to-metric converters (R2MCs). The CRDE extracts relative depth information, and each R2MC converts the relative information to predict metric depths for a specific camera. The proposed VDE can cope with diverse scenes, including both indoor and outdoor scenes, with only a 1.12\% parameter increase per camera. Experimental results demonstrate that VDE supports multiple cameras effectively and efficiently and also achieves state-of-the-art performance in the conventional single-camera scenario.