CVJun 23, 2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Ziqiao Ma, Xuweiyi Chen, Shoubin Yu, Sai Bi, Kai Zhang, Chen Ziwen, Sihan Xu, Jianing Yang, Zexiang Xu, Kalyan Sunkavalli, Mohit Bansal, Joyce Chai

arXiv:2506.18890v19 citationsh-index: 19

Originality Highly original

AI Analysis

This addresses the challenge of scalable 4D reconstruction for computer vision applications, though it appears incremental as it builds on prior 3D reconstruction methods by extending to time.

The paper tackles the problem of reconstructing 4D (space-time) representations of objects from limited views and timestamps, achieving accurate and efficient reconstruction with 24-frame sequences processed in under 1.5 seconds on a single A100 GPU.

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at some times to any view at any time? We provide an affirmative answer with 4D-LRM, the first large-scale 4D reconstruction model that takes input from unconstrained views and timestamps and renders arbitrary novel view-time combinations. Unlike prior 4D approaches, e.g., optimization-based, geometry-based, or generative, that struggle with efficiency, generalization, or faithfulness, 4D-LRM learns a unified space-time representation and directly predicts per-pixel 4D Gaussian primitives from posed image tokens across time, enabling fast, high-quality rendering at, in principle, infinite frame rate. Our results demonstrate that scaling spatiotemporal pretraining enables accurate and efficient 4D reconstruction. We show that 4D-LRM generalizes to novel objects, interpolates across time, and handles diverse camera setups. It reconstructs 24-frame sequences in one forward pass with less than 1.5 seconds on a single A100 GPU.

View on arXiv PDF

Similar