CVMar 5

MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Juntong Fang, Zequn Chen, Weiqi Zhang, Donglin Di, Xuancheng Zhang, Chengmin Yang, Yu-Shen Liu

arXiv:2603.05078v14 citations

Originality Highly original

AI Analysis

This work provides an efficient, real-time solution for reconstructing dynamic 4D scenes from monocular videos, which is beneficial for applications requiring dynamic scene understanding where existing optimization methods are too computationally expensive.

This paper addresses the challenge of reconstructing dynamic 4D scenes from monocular videos, particularly when moving objects corrupt camera pose estimation. The proposed feedforward network, MoRe, efficiently recovers dynamic 3D scenes by disentangling dynamic motion from static structure using an attention-forcing strategy, achieving high-quality dynamic reconstructions with exceptional efficiency.

Reconstructing dynamic 4D scenes remains challenging due to the presence of moving objects that corrupt camera pose estimation. Existing optimization methods alleviate this issue with additional supervision, but they are mostly computationally expensive and impractical in real-time applications. To address these limitations, we propose MoRe, a feedforward 4D reconstruction network that efficiently recovers dynamic 3D scenes from monocular videos. Built upon a strong static reconstruction backbone, MoRe employs an attention-forcing strategy to disentangle dynamic motion from static structure. To further enhance robustness, we fine-tune the model on large-scale, diverse datasets encompassing both dynamic and static scenes. Moreover, our grouped causal attention captures temporal dependencies and adapts to varying token lengths across frames, ensuring temporally coherent geometry reconstruction. Extensive experiments on multiple benchmarks demonstrate that MoRe achieves high-quality dynamic reconstructions with exceptional efficiency.

View on arXiv PDF

Similar