CVMar 18

World Reconstruction From Inconsistent Views

arXiv:2603.1673663.13 citationsh-index: 7
Predicted impact top 53% in CV · last 90 daysOriginality Highly original
AI Analysis

This addresses the challenge of creating 3D-consistent environments from inconsistent video frames for applications in 3D reconstruction and world generation.

The paper tackles the problem of 3D inconsistency in frames generated by video diffusion models, which hinders 3D world reconstruction, and proposes a method that non-rigidly aligns frames to produce sharp pointcloud reconstructions, achieving higher quality 3D scenes than baselines.

Video diffusion models generate high-quality and diverse worlds; however, individual frames often lack 3D consistency across the output sequence, which makes the reconstruction of 3D worlds difficult. To this end, we propose a new method that handles these inconsistencies by non-rigidly aligning the video frames into a globally-consistent coordinate frame that produces sharp and detailed pointcloud reconstructions. First, a geometric foundation model lifts each frame into a pixel-wise 3D pointcloud, which contains unaligned surfaces due to these inconsistencies. We then propose a tailored non-rigid iterative frame-to-model ICP to obtain an initial alignment across all frames, followed by a global optimization that further sharpens the pointcloud. Finally, we leverage this pointcloud as initialization for 3D reconstruction and propose a novel inverse deformation rendering loss to create high quality and explorable 3D environments from inconsistent views. We demonstrate that our 3D scenes achieve higher quality than baselines, effectively turning video models into 3D-consistent world generators.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes