FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction
This addresses the challenge of capturing complex point motions and fine-grained dynamic details in 4D reconstruction from sparse input views, which is incremental as it builds on existing 3D Gaussian and optical flow techniques.
The paper tackles the problem of generating novel views of dynamic scenes by reconstructing 4D motions of 3D Gaussian primitives, achieving higher-fidelity and more temporally coherent reconstructions with finer detail preservation compared to state-of-the-art methods.
We introduce FLAG-4D, a novel framework for generating novel views of dynamic scenes by reconstructing how 3D Gaussian primitives evolve through space and time. Existing methods typically rely on a single Multilayer Perceptron (MLP) to model temporal deformations, and they often struggle to capture complex point motions and fine-grained dynamic details consistently over time, especially from sparse input views. Our approach, FLAG-4D, overcomes this by employing a dual-deformation network that dynamically warps a canonical set of 3D Gaussians over time into new positions and anisotropic shapes. This dual-deformation network consists of an Instantaneous Deformation Network (IDN) for modeling fine-grained, local deformations and a Global Motion Network (GMN) for capturing long-range dynamics, refined through mutual learning. To ensure these deformations are both accurate and temporally smooth, FLAG-4D incorporates dense motion features from a pretrained optical flow backbone. We fuse these motion cues from adjacent timeframes and use a deformation-guided attention mechanism to align this flow information with the current state of each evolving 3D Gaussian. Extensive experiments demonstrate that FLAG-4D achieves higher-fidelity and more temporally coherent reconstructions with finer detail preservation than state-of-the-art methods.