ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation
This addresses a key limitation in generative AI for creating complex scenes with multiple objects, offering a practical improvement for applications like design and content creation.
The paper tackles the problem of text-to-image diffusion models struggling with multi-instance generation by introducing ISAC, a training-free method that resolves instance merging and omission, achieving up to 52% multi-class accuracy and 83% multi-instance accuracy.
Text-to-image diffusion models excel at generating single-instance scenes but struggle with multi-instance scenarios, often merging or omitting objects. Unlike previous training-free approaches that rely solely on semantic-level guidance without addressing instance individuation, our training-free method, Instance-to-Semantic Attention Control (ISAC), explicitly resolves incomplete instance formation and semantic entanglement through an instance-first modeling approach. This enables ISAC to effectively leverage a hierarchical, tree-structured prompt mechanism, disentangling multiple object instances and individually aligning them with their corresponding semantic labels. Without employing any external models, ISAC achieves up to 52% average multi-class accuracy and 83% average multi-instance accuracy by effectively forming disentangled instances. The code will be made available upon publication.