CVMar 7, 2023

CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

arXiv:2303.03565v220 citationsh-index: 24
AI Analysis

This addresses the problem of generating visually coherent and functionally plausible indoor scenes for applications like immersive 3D experiences and training embodied agents, representing a novel method for a known bottleneck.

The paper tackles indoor scene synthesis by introducing an auto-regressive model that uses CLIP embeddings to incorporate instance-level visual attributes like color and style, achieving state-of-the-art results on the 3D-FRONT dataset with over 50% improvement in auto-completion metrics.

Indoor scene synthesis involves automatically picking and placing furniture appropriately on a floor plan, so that the scene looks realistic and is functionally plausible. Such scenes can serve as homes for immersive 3D experiences, or be used to train embodied agents. Existing methods for this task rely on labeled categories of furniture, e.g. bed, chair or table, to generate contextually relevant combinations of furniture. Whether heuristic or learned, these methods ignore instance-level visual attributes of objects, and as a result may produce visually less coherent scenes. In this paper, we introduce an auto-regressive scene model which can output instance-level predictions, using general purpose image embedding based on CLIP. This allows us to learn visual correspondences such as matching color and style, and produce more functionally plausible and aesthetically pleasing scenes. Evaluated on the 3D-FRONT dataset, our model achieves SOTA results in scene synthesis and improves auto-completion metrics by over 50%. Moreover, our embedding-based approach enables zero-shot text-guided scene synthesis and editing, which easily generalizes to furniture not seen during training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes