CVMar 20, 2024

DepthFM: Fast Monocular Depth Estimation with Flow Matching

arXiv:2403.13788v291 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses depth estimation for computer vision applications, offering an incremental improvement in efficiency and data usage.

The paper tackles the problem of monocular depth estimation by proposing a flow matching approach that improves sampling efficiency and reduces data dependency, achieving competitive zero-shot performance on standard benchmarks.

Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport. Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions. We are the first to explore flow matching in this field, and we demonstrate that its interpolation trajectories enhance both training and sampling efficiency while preserving high performance. While generative models typically require extensive training data, we mitigate this dependency by integrating external knowledge from a pre-trained image diffusion model, enabling effective transfer even across differing objectives. To further boost our model performance, we employ synthetic data and utilize image-depth pairs generated by a discriminative model on an in-the-wild image dataset. As a generative model, our model can reliably estimate depth confidence, which provides an additional advantage. Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes