GR AI CVApr 1, 2025

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan

arXiv:2504.01016v117.834 citationsh-index: 13

Originality Highly original

AI Analysis

This addresses the limitation of existing methods in providing metrically grounded geometry for reconstruction and downstream tasks in open-world videos, which is incremental as it builds on prior video depth estimation work.

The paper tackles the problem of achieving geometric fidelity in video depth estimation for open-world videos, proposing GeometryCrafter, which recovers high-fidelity point map sequences with temporal coherence, enabling accurate 3D/4D reconstruction and camera parameter estimation, and demonstrates state-of-the-art 3D accuracy, temporal consistency, and generalization in evaluations.

Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from open-world videos, enabling accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. At the core of our approach lies a point map Variational Autoencoder (VAE) that learns a latent space agnostic to video latent distributions for effective point map encoding and decoding. Leveraging the VAE, we train a video diffusion model to model the distribution of point map sequences conditioned on the input videos. Extensive evaluations on diverse datasets demonstrate that GeometryCrafter achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.

View on arXiv PDF

Similar