CVAILGMar 20, 2025

A Recipe for Generating 3D Worlds From a Single Image

arXiv:2503.16611v117 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating 3D environments for applications like VR from minimal input, though it is incremental as it builds on existing generative models.

The paper tackles the problem of generating immersive 3D worlds from a single image by framing it as an in-context learning task for 2D inpainting models, resulting in high-quality 3D environments that outperform state-of-the-art methods on multiple quantitative image quality metrics.

We introduce a recipe for generating immersive 3D worlds from a single image by framing the task as an in-context learning problem for 2D inpainting models. This approach requires minimal training and uses existing generative models. Our process involves two steps: generating coherent panoramas using a pre-trained diffusion model and lifting these into 3D with a metric depth estimator. We then fill unobserved regions by conditioning the inpainting model on rendered point clouds, requiring minimal fine-tuning. Tested on both synthetic and real images, our method produces high-quality 3D environments suitable for VR display. By explicitly modeling the 3D structure of the generated environment from the start, our approach consistently outperforms state-of-the-art, video synthesis-based methods along multiple quantitative image quality metrics. Project Page: https://katjaschwarz.github.io/worlds/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes