CVMMOct 31, 2025

HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition

arXiv:2510.27148v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient and controllable 3D scene construction for applications in gaming, film, and virtual reality, offering an incremental improvement over existing methods.

The paper tackles the challenge of balancing scene complexity with minimal user input in 3D scene generation by proposing HiGS, a hierarchical generative framework that enables iterative expansion through key semantic objects, resulting in improved layout plausibility, style consistency, and user preference over single-stage methods.

Three-dimensional scene generation holds significant potential in gaming, film, and virtual reality. However, most existing methods adopt a single-step generation process, making it difficult to balance scene complexity with minimal user input. Inspired by the human cognitive process in scene modeling, which progresses from global to local, focuses on key elements, and completes the scene through semantic association, we propose HiGS, a hierarchical generative framework for multi-step associative semantic spatial composition. HiGS enables users to iteratively expand scenes by selecting key semantic objects, offering fine-grained control over regions of interest while the model completes peripheral areas automatically. To support structured and coherent generation, we introduce the Progressive Hierarchical Spatial-Semantic Graph (PHiSSG), which dynamically organizes spatial relationships and semantic dependencies across the evolving scene structure. PHiSSG ensures spatial and geometric consistency throughout the generation process by maintaining a one-to-one mapping between graph nodes and generated objects and supporting recursive layout optimization. Experiments demonstrate that HiGS outperforms single-stage methods in layout plausibility, style consistency, and user preference, offering a controllable and extensible paradigm for efficient 3D scene construction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes