CVNov 8, 2025

MALeR: Improving Compositional Fidelity in Layout-Guided Generation

arXiv:2511.06002v1h-index: 24ACM Trans Graph
Originality Incremental advance
AI Analysis

This addresses the challenge of unintended subjects, out-of-distribution artifacts, and attribute leakage in layout-guided image generation, which is incremental as it builds on existing methods to improve specific bottlenecks.

The paper tackles the problem of generating compositional scenes with multiple subjects and attributes in layout-guided text-to-image models, achieving superior performance in compositional accuracy, generation consistency, and attribute binding compared to previous work.

Recent advances in text-to-image models have enabled a new era of creative and controllable image generation. However, generating compositional scenes with multiple subjects and attributes remains a significant challenge. To enhance user control over subject placement, several layout-guided methods have been proposed. However, these methods face numerous challenges, particularly in compositional scenes. Unintended subjects often appear outside the layouts, generated images can be out-of-distribution and contain unnatural artifacts, or attributes bleed across subjects, leading to incorrect visual outputs. In this work, we propose MALeR, a method that addresses each of these challenges. Given a text prompt and corresponding layouts, our method prevents subjects from appearing outside the given layouts while being in-distribution. Additionally, we propose a masked, attribute-aware binding mechanism that prevents attribute leakage, enabling accurate rendering of subjects with multiple attributes, even in complex compositional scenes. Qualitative and quantitative evaluation demonstrates that our method achieves superior performance in compositional accuracy, generation consistency, and attribute binding compared to previous work. MALeR is particularly adept at generating images of scenes with multiple subjects and multiple attributes per subject.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes