CVAug 4, 2025

Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

arXiv:2508.02323v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the problem of expensive 3D ground truth or multi-view supervision in volumetric reconstruction for applications like autonomous driving and robotics, offering a novel approach but with incremental technical elements.

The paper tackles monocular 3D scene reconstruction from single images by distilling synthetic geometry from pre-trained 2D diffusion and depth models into a feed-forward model, achieving results that match or outperform state-of-the-art multi-view supervised methods on KITTI-360 and Waymo datasets.

Volumetric scene reconstruction from a single image is crucial for a broad range of applications like autonomous driving and robotics. Recent volumetric reconstruction methods achieve impressive results, but generally require expensive 3D ground truth or multi-view supervision. We propose to leverage pre-trained 2D diffusion models and depth prediction models to generate synthetic scene geometry from a single image. This can then be used to distill a feed-forward scene reconstruction model. Our experiments on the challenging KITTI-360 and Waymo datasets demonstrate that our method matches or outperforms state-of-the-art baselines that use multi-view supervision, and offers unique advantages, for example regarding dynamic scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes