GRAICVLGJun 11, 2025

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

arXiv:2506.09997v19 citationsh-index: 25
Originality Highly original
AI Analysis

This addresses the challenge of creating digital replicas of moving objects in real-time for applications like robotics or AR/VR, representing a novel advance beyond static scene reconstruction.

The paper tackles the problem of real-time 3D reconstruction of dynamic scenes from monocular videos, introducing DGS-LRM, a feed-forward method that predicts deformable 3D Gaussian splats, achieving reconstruction quality comparable to optimization-based methods and outperforming state-of-the-art predictive methods on real-world examples.

We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms. To address these challenges, we introduce several key technical contributions: an enhanced large-scale synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision; a per-pixel deformable 3D Gaussian representation that is easy to learn, supports high-quality dynamic view synthesis, and enables long-range 3D tracking; and a large transformer network that achieves real-time, generalizable dynamic scene reconstruction. Extensive qualitative and quantitative experiments demonstrate that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods, while significantly outperforming the state-of-the-art predictive dynamic reconstruction method on real-world examples. Its predicted physically grounded 3D deformation is accurate and can readily adapt for long-range 3D tracking tasks, achieving performance on par with state-of-the-art monocular video 3D tracking methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes