Scale-Invariant Monocular Depth Estimation via SSI Depth
This work addresses the challenge of generalizable depth estimation for real-world computational photography, though it appears incremental as it builds on existing SSI methods.
The paper tackles the problem of scale-invariant monocular depth estimation by leveraging shift-and-scale-invariant inputs and a sparse ordinal loss to improve detail generation and generalization, achieving high performance in zero-shot evaluation for computational photography applications.
Existing methods for scale-invariant monocular depth estimation (SI MDE) often struggle due to the complexity of the task, and limited and non-diverse datasets, hindering generalizability in real-world scenarios. This is while shift-and-scale-invariant (SSI) depth estimation, simplifying the task and enabling training with abundant stereo datasets achieves high performance. We present a novel approach that leverages SSI inputs to enhance SI depth estimation, streamlining the network's role and facilitating in-the-wild generalization for SI depth estimation while only using a synthetic dataset for training. Emphasizing the generation of high-resolution details, we introduce a novel sparse ordinal loss that substantially improves detail generation in SSI MDE, addressing critical limitations in existing approaches. Through in-the-wild qualitative examples and zero-shot evaluation we substantiate the practical utility of our approach in computational photography applications, showcasing its ability to generate highly detailed SI depth maps and achieve generalization in diverse scenarios.