CVAIJul 16, 2025

RaDL: Relation-aware Disentangled Learning for Multi-Instance Text-to-Image Generation

arXiv:2507.11947v1h-index: 2SMC
Originality Incremental advance
AI Analysis

It solves the problem of multi-instance generation for users of text-to-image models, offering an incremental improvement over existing methods.

The paper tackles the challenge of generating multiple instances in text-to-image models by addressing relationship discrepancies and attribute leakage, resulting in RaDL, which shows significant improvements in positional accuracy and attribute consideration on benchmarks like COCO-Position and DrawBench.

With recent advancements in text-to-image (T2I) models, effectively generating multiple instances within a single image prompt has become a crucial challenge. Existing methods, while successful in generating positions of individual instances, often struggle to account for relationship discrepancy and multiple attributes leakage. To address these limitations, this paper proposes the relation-aware disentangled learning (RaDL) framework. RaDL enhances instance-specific attributes through learnable parameters and generates relation-aware image features via Relation Attention, utilizing action verbs extracted from the global prompt. Through extensive evaluations on benchmarks such as COCO-Position, COCO-MIG, and DrawBench, we demonstrate that RaDL outperforms existing methods, showing significant improvements in positional accuracy, multiple attributes consideration, and the relationships between instances. Our results present RaDL as the solution for generating images that consider both the relationships and multiple attributes of each instance within the multi-instance image.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes