CVSep 29, 2025

PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos

Ting-Hsuan Liao, Haowen Liu, Yiran Xu, Songwei Ge, Gengshan Yang, Jia-Bin Huang

arXiv:2509.25183v110.25 citationsh-index: 18SIGGRAPH Asia

Originality Incremental advance

AI Analysis

This addresses the challenge of dynamic scene understanding and 3D content creation from casual videos, which is incremental as it builds on existing methods by handling long sequences with deformation and limited views.

The paper tackles the problem of reconstructing deformable 3D objects from unposed monocular videos, achieving high-fidelity, articulated 3D representations that are robust and generalize well across challenging scenarios.

We present PAD3R, a method for reconstructing deformable 3D objects from casually captured, unposed monocular videos. Unlike existing approaches, PAD3R handles long video sequences featuring substantial object deformation, large-scale camera movement, and limited view coverage that typically challenge conventional systems. At its core, our approach trains a personalized, object-centric pose estimator, supervised by a pre-trained image-to-3D model. This guides the optimization of deformable 3D Gaussian representation. The optimization is further regularized by long-term 2D point tracking over the entire input video. By combining generative priors and differentiable rendering, PAD3R reconstructs high-fidelity, articulated 3D representations of objects in a category-agnostic way. Extensive qualitative and quantitative results show that PAD3R is robust and generalizes well across challenging scenarios, highlighting its potential for dynamic scene understanding and 3D content creation.

View on arXiv PDF

Similar