CVGROct 27, 2025

TRELLISWorld: Training-Free World Generation from Object Generators

arXiv:2510.23880v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the need for training-free, general-purpose 3D scene generation for applications like virtual prototyping and AR/VR, though it builds incrementally on existing object-level models.

The paper tackles the problem of generating 3D scenes from text without requiring scene-level training data, by repurposing text-to-3D object diffusion models as modular tile generators and blending overlapping regions, resulting in scalable synthesis of large, coherent scenes with local semantic control.

Text-driven 3D scene generation holds promise for a wide range of applications, from virtual prototyping to AR/VR and simulation. However, existing methods are often constrained to single-object generation, require domain-specific training, or lack support for full 360-degree viewability. In this work, we present a training-free approach to 3D scene synthesis by repurposing general-purpose text-to-3D object diffusion models as modular tile generators. We reformulate scene generation as a multi-tile denoising problem, where overlapping 3D regions are independently generated and seamlessly blended via weighted averaging. This enables scalable synthesis of large, coherent scenes while preserving local semantic control. Our method eliminates the need for scene-level datasets or retraining, relies on minimal heuristics, and inherits the generalization capabilities of object-level priors. We demonstrate that our approach supports diverse scene layouts, efficient generation, and flexible editing, establishing a simple yet powerful foundation for general-purpose, language-driven 3D scene construction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes