Monocular Depth Prediction through Continuous 3D Loss
This work addresses the challenge of accurate and affordable depth sensing for applications like autonomous driving or robotics, though it is incremental as it builds on existing methods.
The paper tackles the problem of monocular depth prediction by introducing a continuous 3D loss function that uses sparse LIDAR points as supervision, improving accuracy and producing more consistent 3D geometric structures compared to baselines like DORN, BTS, and Monodepth2.
This paper reports a new continuous 3D loss function for learning depth from monocular images. The dense depth prediction from a monocular image is supervised using sparse LIDAR points, which enables us to leverage available open source datasets with camera-LIDAR sensor suites during training. Currently, accurate and affordable range sensor is not readily available. Stereo cameras and LIDARs measure depth either inaccurately or sparsely/costly. In contrast to the current point-to-point loss evaluation approach, the proposed 3D loss treats point clouds as continuous objects; therefore, it compensates for the lack of dense ground truth depth due to LIDAR's sparsity measurements. We applied the proposed loss in three state-of-the-art monocular depth prediction approaches DORN, BTS, and Monodepth2. Experimental evaluation shows that the proposed loss improves the depth prediction accuracy and produces point-clouds with more consistent 3D geometric structures compared with all tested baselines, implying the benefit of the proposed loss on general depth prediction networks. A video demo of this work is available at https://youtu.be/5HL8BjSAY4Y.