Residual-Guided Learning Representation for Self-Supervised Monocular Depth Estimation
This work addresses a specific bottleneck in monocular depth estimation for applications like autonomous driving, but it is incremental as it builds on existing self-supervised methods.
The paper tackled unstable depth predictions in textureless or occluded regions in self-supervised monocular depth estimation by proposing a residual guidance loss that transfers discriminability from auto-encoded features, resulting in verified superiority and orthogonality on the KITTI benchmark.
Photometric consistency loss is one of the representative objective functions commonly used for self-supervised monocular depth estimation. However, this loss often causes unstable depth predictions in textureless or occluded regions due to incorrect guidance. Recent self-supervised learning approaches tackle this issue by utilizing feature representations explicitly learned from auto-encoders, expecting better discriminability than the input image. Despite the use of auto-encoded features, we observe that the method does not embed features as discriminative as auto-encoded features. In this paper, we propose residual guidance loss that enables the depth estimation network to embed the discriminative feature by transferring the discriminability of auto-encoded features. We conducted experiments on the KITTI benchmark and verified our method's superiority and orthogonality on other state-of-the-art methods.