SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware Feature Extraction
This work addresses the challenge of accurate depth estimation without ground truth data, which is important for applications like autonomous driving, but it appears incremental as it builds on existing self-supervised approaches.
The paper tackles the problem of inaccurate depth values in self-supervised monocular depth estimation by proposing SAFENet, which leverages semantic information to improve depth prediction, achieving competitive or superior performance compared to state-of-the-art methods on the KITTI dataset.
Self-supervised monocular depth estimation has emerged as a promising method because it does not require groundtruth depth maps during training. As an alternative for the groundtruth depth map, the photometric loss enables to provide self-supervision on depth prediction by matching the input image frames. However, the photometric loss causes various problems, resulting in less accurate depth values compared with supervised approaches. In this paper, we propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss. Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge. Therefore, we introduce multi-task learning schemes to incorporate semantic-awareness into the representation of depth features. Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods. Furthermore, extensive experiments on different datasets show its better generalization ability and robustness to various conditions, such as low-light or adverse weather.