Adaptive Prompt Elicitation for Text-to-Image Generation
This addresses the challenge for general users who struggle with ambiguous inputs and model idiosyncrasies in text-to-image generation, offering an incremental improvement to the existing prompt-based interaction paradigm.
The paper tackles the problem of aligning text-to-image generation with user intent by proposing Adaptive Prompt Elicitation (APE), which uses visual queries to help users refine prompts, resulting in 19.8% higher alignment without increasing workload.
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.