CVMar 11, 2025

CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction

arXiv:2503.08005v22 citationsh-index: 8
AI Analysis

This work addresses a critical bottleneck in 3D reconstruction for computer vision applications, offering an incremental improvement over existing Large Reconstruction Models.

The paper tackles the problem of 3D object reconstruction from single-view images by addressing inconsistencies in multi-view images generated by 2D diffusion models, which degrade reconstruction quality. The proposed CDI3D framework integrates view interpolation to enhance consistency, significantly outperforming previous state-of-the-art methods on benchmarks with improved texture fidelity and geometric accuracy.

3D object reconstruction from single-view image is a fundamental task in computer vision with wide-ranging applications. Recent advancements in Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content. However, challenges remain as 2D diffusion models often struggle to produce dense images with strong multi-view consistency, and LRMs tend to amplify these inconsistencies during the 3D reconstruction process. Addressing these issues is critical for achieving high-quality and efficient 3D reconstruction. In this paper, we present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view interpolation. To tackle the aforementioned challenges, we propose to integrate 2D diffusion-based view interpolation into the LRM pipeline to enhance the quality and consistency of the generated mesh. Specifically, our approach introduces a Dense View Interpolation (DVI) module, which synthesizes interpolated images between main views generated by the 2D diffusion model, effectively densifying the input views with better multi-view consistency. We also design a tilt camera pose trajectory to capture views with different elevations and perspectives. Subsequently, we employ a tri-plane-based mesh reconstruction strategy to extract robust tokens from these interpolated and original views, enabling the generation of high-quality 3D meshes with superior texture and geometry. Extensive experiments demonstrate that our method significantly outperforms previous state-of-the-art approaches across various benchmarks, producing 3D content with enhanced texture fidelity and geometric accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes