Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video
This work addresses the challenge of creating authentic 4D scene synthesis for applications in computer vision and graphics, offering a method to enhance physical realism in deformable 3D reconstruction, though it is incremental by building on existing video-conditioned and generative model techniques.
The paper tackles the problem of generating physically consistent 4D content from a single video by reanimating deformable 3D scenes, using the original video to correct synthetic motion artifacts, and demonstrates improved geometry consistency, motion quality, and 3D tracking performance on datasets like DAVIS and PointOdyssey.
Creating deformable 3D content has gained increasing attention with the rise of text-to-image and image-to-video generative models. While these models provide rich semantic priors for appearance, they struggle to capture the physical realism and motion dynamics needed for authentic 4D scene synthesis. In contrast, real-world videos can provide physically grounded geometry and articulation cues that are difficult to hallucinate. One question is raised: \textit{Can we generate physically consistent 4D content by leveraging the motion priors of the real-world video}? In this work, we explore the task of reanimating deformable 3D scenes from a single video, using the original sequence as a supervisory signal to correct artifacts from synthetic motion. We introduce \textbf{Restage4D}, a geometry-preserving pipeline for video-conditioned 4D restaging. Our approach uses a video-rewinding training strategy to temporally bridge a real base video and a synthetic driving video via a shared motion representation. We further incorporate an occlusion-aware rigidity loss and a disocclusion backtracing mechanism to improve structural and geometry consistency under challenging motion. We validate Restage4D on DAVIS and PointOdyssey, demonstrating improved geometry consistency, motion quality, and 3D tracking performance. Our method not only preserves deformable structure under novel motion, but also automatically corrects errors introduced by generative models, revealing the potential of video prior in 4D restaging task. Source code and trained models will be released.