CVSep 20, 2025

InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention

arXiv:2509.16691v25 citationsh-index: 9Has Code
Originality Highly original
AI Analysis

This addresses the need for more accurate and controllable image generation from layouts, which is incremental as it builds on existing diffusion models with new attention mechanisms and benchmarks.

The paper tackles the problem of suboptimal performance in Layout-to-Image generation by proposing InstanceAssemble, a novel architecture that uses instance-assembling attention for precise control with bounding boxes and multimodal content, achieving state-of-the-art results under complex layout conditions.

Diffusion models have demonstrated remarkable capabilities in generating high-quality images. Recent advancements in Layout-to-Image (L2I) generation have leveraged positional conditions and textual descriptions to facilitate precise and controllable image synthesis. Despite overall progress, current L2I methods still exhibit suboptimal performance. Therefore, we propose InstanceAssemble, a novel architecture that incorporates layout conditions via instance-assembling attention, enabling position control with bounding boxes (bbox) and multimodal content control including texts and additional visual content. Our method achieves flexible adaption to existing DiT-based T2I models through light-weighted LoRA modules. Additionally, we propose a Layout-to-Image benchmark, Denselayout, a comprehensive benchmark for layout-to-image generation, containing 5k images with 90k instances in total. We further introduce Layout Grounding Score (LGS), an interpretable evaluation metric to more precisely assess the accuracy of L2I generation. Experiments demonstrate that our InstanceAssemble method achieves state-of-the-art performance under complex layout conditions, while exhibiting strong compatibility with diverse style LoRA modules. The code and pretrained models are publicly available at https://github.com/FireRedTeam/InstanceAssemble.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes