CVMay 27, 2025

ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation

arXiv:2505.20935v1h-index: 4
Originality Highly original
AI Analysis

This addresses a key limitation in generative AI for creating complex scenes with multiple objects, offering a practical improvement for applications like design and content creation.

The paper tackles the problem of text-to-image diffusion models struggling with multi-instance generation by introducing ISAC, a training-free method that resolves instance merging and omission, achieving up to 52% multi-class accuracy and 83% multi-instance accuracy.

Text-to-image diffusion models excel at generating single-instance scenes but struggle with multi-instance scenarios, often merging or omitting objects. Unlike previous training-free approaches that rely solely on semantic-level guidance without addressing instance individuation, our training-free method, Instance-to-Semantic Attention Control (ISAC), explicitly resolves incomplete instance formation and semantic entanglement through an instance-first modeling approach. This enables ISAC to effectively leverage a hierarchical, tree-structured prompt mechanism, disentangling multiple object instances and individually aligning them with their corresponding semantic labels. Without employing any external models, ISAC achieves up to 52% average multi-class accuracy and 83% average multi-instance accuracy by effectively forming disentangled instances. The code will be made available upon publication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes