CVMay 8, 2025

InstanceGen: Image Generation with Instance-level Instructions

arXiv:2505.05678v36 citationsh-index: 18Has CodeSIGGRAPH
Originality Incremental advance
AI Analysis

This addresses the challenge of generating accurate images from complex prompts for users in fields like design or content creation, representing an incremental improvement over existing structural guidance methods.

The paper tackles the problem of pretrained text-to-image models struggling with complex prompts involving multiple objects and instance-level attributes, and proposes a technique that combines image-based structural guidance with LLM-based instance-level instructions to generate images that adhere to all parts of the text prompt, including object counts, attributes, and spatial relations.

Despite rapid advancements in the capabilities of generative models, pretrained text-to-image models still struggle in capturing the semantics conveyed by complex prompts that compound multiple objects and instance-level attributes. Consequently, we are witnessing growing interests in integrating additional structural constraints, typically in the form of coarse bounding boxes, to better guide the generation process in such challenging cases. In this work, we take the idea of structural guidance a step further by making the observation that contemporary image generation models can directly provide a plausible fine-grained structural initialization. We propose a technique that couples this image-based structural guidance with LLM-based instance-level instructions, yielding output images that adhere to all parts of the text prompt, including object counts, instance-level attributes, and spatial relations between instances.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes