Towards Sharper Object Boundaries in Self-Supervised Depth Estimation
This addresses the issue of inaccurate 3D scene understanding for applications like autonomous driving, though it is incremental as it builds on existing self-supervised pipelines.
The paper tackles the problem of blurred object boundaries in monocular depth estimation by proposing a method that models per-pixel depth as a mixture distribution, achieving up to 35% higher boundary sharpness and improved point cloud quality on KITTI and VKITTIv2 datasets using only self-supervision.
Accurate monocular depth estimation is crucial for 3D scene understanding, but existing methods often blur depth at object boundaries, introducing spurious intermediate 3D points. While achieving sharp edges usually requires very fine-grained supervision, our method produces crisp depth discontinuities using only self-supervision. Specifically, we model per-pixel depth as a mixture distribution, capturing multiple plausible depths and shifting uncertainty from direct regression to the mixture weights. This formulation integrates seamlessly into existing pipelines via variance-aware loss functions and uncertainty propagation. Extensive evaluations on KITTI and VKITTIv2 show that our method achieves up to 35% higher boundary sharpness and improves point cloud quality compared to state-of-the-art baselines.