NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
This addresses the problem of expensive or cumbersome data requirements for 4D modeling, making it more accessible for applications in computer vision and AI, though it appears incremental in improving scalability.
The paper tackles the scalability limitation in 4D world modeling by proposing NeoVerse, a versatile model that uses in-the-wild monocular videos for 4D reconstruction and novel-trajectory video generation, achieving state-of-the-art performance on standard benchmarks.
In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks. Our project page is available at https://neoverse-4d.github.io