GEN3D: Generating Domain-Free 3D Scenes from a Single Image
This addresses the need for diverse 3D scenes for embodied AI and world models, though it appears incremental as it builds on existing techniques like Gaussian splatting.
The paper tackles the problem of generating 3D scenes from a single image, which is limited by dependence on multi-view captures, and demonstrates that their method produces high-fidelity novel views with strong generalization across diverse datasets.
Despite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. Additionally, 3D scene generation is vital for advancing embodied AI and world models, which depend on diverse, high-quality scenes for learning and evaluation. In this work, we propose Gen3d, a novel method for generation of high-quality, wide-scope, and generic 3D scenes from a single image. After the initial point cloud is created by lifting the RGBD image, Gen3d maintains and expands its world model. The 3D scene is finalized through optimizing a Gaussian splatting representation. Extensive experiments on diverse datasets demonstrate the strong generalization capability and superior performance of our method in generating a world model and Synthesizing high-fidelity and consistent novel views.