CVAINov 30, 2023

Detailed Human-Centric Text Description-Driven Large Scene Synthesis

arXiv:2311.18654v12 citationsh-index: 10
Originality Highly original
AI Analysis

This addresses the challenge of generating controllable and faithful large scenes from text for applications in AI-driven content creation, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of synthesizing large-scale images from detailed human-centric text descriptions without user-provided controls, achieving high faithfulness, controllability, and naturalness, with significant quantitative and qualitative improvements over prior methods.

Text-driven large scene image synthesis has made significant progress with diffusion models, but controlling it is challenging. While using additional spatial controls with corresponding texts has improved the controllability of large scene synthesis, it is still challenging to faithfully reflect detailed text descriptions without user-provided controls. Here, we propose DetText2Scene, a novel text-driven large-scale image synthesis with high faithfulness, controllability, and naturalness in a global context for the detailed human-centric text description. Our DetText2Scene consists of 1) hierarchical keypoint-box layout generation from the detailed description by leveraging large language model (LLM), 2) view-wise conditioned joint diffusion process to synthesize a large scene from the given detailed text with LLM-generated grounded keypoint-box layout and 3) pixel perturbation-based pyramidal interpolation to progressively refine the large scene for global coherence. Our DetText2Scene significantly outperforms prior arts in text-to-large scene synthesis qualitatively and quantitatively, demonstrating strong faithfulness with detailed descriptions, superior controllability, and excellent naturalness in a global context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes