CVApr 4, 2024

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

arXiv:2404.03736v262 citationsh-index: 13ECCV
Originality Incremental advance
AI Analysis

This work addresses the problem of video-to-4D generation for applications in computer vision and graphics, offering an incremental improvement over prior approaches.

The paper tackles the challenge of generating dynamic 3D objects from single-view videos by proposing SC4D, a sparse-controlled framework that decouples motion and appearance, achieving superior quality and efficiency compared to existing methods.

Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes