CV AI LGMar 20, 2025

A Recipe for Generating 3D Worlds From a Single Image

Katja Schwarz, Denys Rozumnyi, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

arXiv:2503.16611v120.417 citationsh-index: 45

Originality Incremental advance

AI Analysis

This work addresses the challenge of creating 3D environments for applications like VR from minimal input, though it is incremental as it builds on existing generative models.

The paper tackles the problem of generating immersive 3D worlds from a single image by framing it as an in-context learning task for 2D inpainting models, resulting in high-quality 3D environments that outperform state-of-the-art methods on multiple quantitative image quality metrics.

We introduce a recipe for generating immersive 3D worlds from a single image by framing the task as an in-context learning problem for 2D inpainting models. This approach requires minimal training and uses existing generative models. Our process involves two steps: generating coherent panoramas using a pre-trained diffusion model and lifting these into 3D with a metric depth estimator. We then fill unobserved regions by conditioning the inpainting model on rendered point clouds, requiring minimal fine-tuning. Tested on both synthetic and real images, our method produces high-quality 3D environments suitable for VR display. By explicitly modeling the 3D structure of the generated environment from the start, our approach consistently outperforms state-of-the-art, video synthesis-based methods along multiple quantitative image quality metrics. Project Page: https://katjaschwarz.github.io/worlds/

View on arXiv PDF

Similar