Learning depth from monocular video sequences
This work addresses depth estimation for autonomous driving or robotics, but it is incremental as it builds on existing self-supervised methods with specific improvements.
The paper tackled the problem of learning single image depth estimation from monocular video sequences by proposing a novel training loss, model for pixel motion, and network architecture, achieving state-of-the-art results on the KITTI dataset in a self-supervised setting.
Learning single image depth estimation model from monocular video sequence is a very challenging problem. In this paper, we propose a novel training loss which enables us to include more images for supervision during the training process. We propose a simple yet effective model to account the frame to frame pixel motion. We also design a novel network architecture for single image estimation. When combined, our method produces state of the art results for monocular depth estimation on the KITTI dataset in the self-supervised setting.