CVJun 9, 2025

Dreamland: Controllable World Creation with Simulator and Generative Models

arXiv:2506.08006v14 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the need for more controllable world generation for applications like scene editing and embodied AI agent training, representing an incremental advancement by integrating existing components in a novel way.

The paper tackles the problem of limited element-wise controllability in large-scale video generative models for dynamic world creation by proposing Dreamland, a hybrid framework that combines a physics-based simulator with pretrained generative models, resulting in 50.8% improved image quality and 17.9% stronger controllability compared to baselines.

Large-scale video generative models can synthesize diverse and realistic visual content for dynamic world creation, but they often lack element-wise controllability, hindering their use in editing scenes and training embodied AI agents. We propose Dreamland, a hybrid world generation framework combining the granular control of a physics-based simulator and the photorealistic content output of large-scale pretrained generative models. In particular, we design a layered world abstraction that encodes both pixel-level and object-level semantics and geometry as an intermediate representation to bridge the simulator and the generative model. This approach enhances controllability, minimizes adaptation cost through early alignment with real-world distributions, and supports off-the-shelf use of existing and future pretrained generative models. We further construct a D3Sim dataset to facilitate the training and evaluation of hybrid generation pipelines. Experiments demonstrate that Dreamland outperforms existing baselines with 50.8% improved image quality, 17.9% stronger controllability, and has great potential to enhance embodied agent training. Code and data will be made available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes