CVAILGFeb 5, 2024

InstanceDiffusion: Instance-level Control for Image Generation

Meta AI
arXiv:2402.03290v1209 citationsh-index: 30CVPR
Originality Highly original
AI Analysis

This addresses the need for fine-grained control in image generation for applications like design and content creation, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of text-to-image diffusion models lacking control over individual instances in generated images by introducing InstanceDiffusion, which adds precise instance-level control using various location specifications. The result shows significant improvements, outperforming previous state-of-the-art models by 20.4% AP50 for box inputs and 25.4% IoU for mask inputs on the COCO dataset.

Text-to-image diffusion models produce high quality images but do not offer control over individual instances in the image. We introduce InstanceDiffusion that adds precise instance-level control to text-to-image diffusion models. InstanceDiffusion supports free-form language conditions per instance and allows flexible ways to specify instance locations such as simple single points, scribbles, bounding boxes or intricate instance segmentation masks, and combinations thereof. We propose three major changes to text-to-image models that enable precise instance-level control. Our UniFusion block enables instance-level conditions for text-to-image models, the ScaleU block improves image fidelity, and our Multi-instance Sampler improves generations for multiple instances. InstanceDiffusion significantly surpasses specialized state-of-the-art models for each location condition. Notably, on the COCO dataset, we outperform previous state-of-the-art by 20.4% AP$_{50}^\text{box}$ for box inputs, and 25.4% IoU for mask inputs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes