CVApr 15, 2025

Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

arXiv:2504.11092v27 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the challenge of 4D reconstruction from limited monocular video data for applications in computer vision and graphics, representing an incremental advance by integrating geometric and generative priors.

The paper tackles the problem of reconstructing 4D dynamic scenes from monocular videos by introducing Vivid4D, which enhances reconstruction through video inpainting to synthesize multi-view videos, resulting in improved scene reconstruction and completion.

Reconstructing 4D dynamic scenes from casually captured monocular videos is valuable but highly challenging, as each timestamp is observed from a single viewpoint. We introduce Vivid4D, a novel approach that enhances 4D monocular video synthesis by augmenting observation views - synthesizing multi-view videos from a monocular input. Unlike existing methods that either solely leverage geometric priors for supervision or use generative priors while overlooking geometry, we integrate both. This reformulates view augmentation as a video inpainting task, where observed views are warped into new viewpoints based on monocular depth priors. To achieve this, we train a video inpainting model on unposed web videos with synthetically generated masks that mimic warping occlusions, ensuring spatially and temporally consistent completion of missing regions. To further mitigate inaccuracies in monocular depth priors, we introduce an iterative view augmentation strategy and a robust reconstruction loss. Experiments demonstrate that our method effectively improves monocular 4D scene reconstruction and completion. See our project page: https://xdimlab.github.io/Vivid4D/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes