CamLessMonoDepth: Monocular Depth Estimation with Unknown Camera Parameters
This work addresses a practical limitation for applications in computer vision where camera parameters are unknown, enabling the use of image sequences from the wild for training.
The authors tackled the problem of monocular depth estimation without known camera parameters by proposing a method that implicitly estimates camera intrinsics along with depth and pose from monocular image sequences, achieving state-of-the-art performance on the KITTI benchmark.
Perceiving 3D information is of paramount importance in many applications of computer vision. Recent advances in monocular depth estimation have shown that gaining such knowledge from a single camera input is possible by training deep neural networks to predict inverse depth and pose, without the necessity of ground truth data. The majority of such approaches, however, require camera parameters to be fed explicitly during training. As a result, image sequences from wild cannot be used during training. While there exist methods which also predict camera intrinsics, their performance is not on par with novel methods taking camera parameters as input. In this work, we propose a method for implicit estimation of pinhole camera intrinsics along with depth and pose, by learning from monocular image sequences alone. In addition, by utilizing efficient sub-pixel convolutions, we show that high fidelity depth estimates can be obtained. We also embed pixel-wise uncertainty estimation into the framework, to emphasize the possible applicability of this work in practical domain. Finally, we demonstrate the possibility of accurate prediction of depth information without prior knowledge of camera intrinsics, while outperforming the existing state-of-the-art approaches on KITTI benchmark.