CVAIMay 24, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

arXiv:2305.15393v2378 citations
Originality Highly original
AI Analysis

This addresses the problem of user controllability in visual generation for AI practitioners and users, representing a novel method for a known bottleneck.

The paper tackles the problem of reducing user burden in visual generation by using Large Language Models as visual planners to generate layouts from text conditions, achieving 20-40% improvements over text-to-image models and comparable performance to human users in layout design for numerical and spatial correctness.

Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual generative models. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance the visual planning skills of LLMs. LayoutGPT can generate plausible layouts in multiple domains, ranging from 2D images to 3D indoor scenes. LayoutGPT also shows superior performance in converting challenging language concepts like numerical and spatial relations to layout arrangements for faithful text-to-image generation. When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness. Lastly, LayoutGPT achieves comparable performance to supervised methods in 3D indoor scene synthesis, demonstrating its effectiveness and potential in multiple visual domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes