CVSep 29, 2025

PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos

arXiv:2509.25183v14 citationsh-index: 18SIGGRAPH Asia
Originality Incremental advance
AI Analysis

This addresses the challenge of dynamic scene understanding and 3D content creation from casual videos, which is incremental as it builds on existing methods by handling long sequences with deformation and limited views.

The paper tackles the problem of reconstructing deformable 3D objects from unposed monocular videos, achieving high-fidelity, articulated 3D representations that are robust and generalize well across challenging scenarios.

We present PAD3R, a method for reconstructing deformable 3D objects from casually captured, unposed monocular videos. Unlike existing approaches, PAD3R handles long video sequences featuring substantial object deformation, large-scale camera movement, and limited view coverage that typically challenge conventional systems. At its core, our approach trains a personalized, object-centric pose estimator, supervised by a pre-trained image-to-3D model. This guides the optimization of deformable 3D Gaussian representation. The optimization is further regularized by long-term 2D point tracking over the entire input video. By combining generative priors and differentiable rendering, PAD3R reconstructs high-fidelity, articulated 3D representations of objects in a category-agnostic way. Extensive qualitative and quantitative results show that PAD3R is robust and generalizes well across challenging scenarios, highlighting its potential for dynamic scene understanding and 3D content creation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes