CVMar 17

Iris: Bringing Real-World Priors into Diffusion Model for Monocular Depth Estimation

arXiv:2603.1634073.7h-index: 15
AI Analysis

This work addresses the challenge of accurate depth estimation from single images for applications like robotics and AR/VR, though it is incremental as it builds on existing diffusion-based methods.

The paper tackles the problem of monocular depth estimation by integrating real-world priors into a diffusion model, resulting in a deterministic framework called Iris that preserves fine details and generalizes strongly from synthetic to real scenes with limited training data, achieving significant improvements in performance.

In this paper, we propose \textbf{Iris}, a deterministic framework for Monocular Depth Estimation (MDE) that integrates real-world priors into the diffusion model. Conventional feed-forward methods rely on massive training data, yet still miss details. Previous diffusion-based methods leverage rich generative priors yet struggle with synthetic-to-real domain transfer. Iris, in contrast, preserves fine details, generalizes strongly from synthetic to real scenes, and remains efficient with limited training data. To this end, we introduce a two-stage Priors-to-Geometry Deterministic (PGD) schedule: the prior stage uses Spectral-Gated Distillation (SGD) to transfer low-frequency real priors while leaving high-frequency details unconstrained, and the geometry stage applies Spectral-Gated Consistency (SGC) to enforce high-frequency fidelity while refining with synthetic ground truth. The two stages share weights and are executed with a high-to-low timestep schedule. Extensive experimental results confirm that Iris achieves significant improvements in MDE performance with strong in-the-wild generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes