CL CV LG MMNov 17, 2022

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

Ninareh Mehrabi, Palash Goyal, Apurv Verma, Jwala Dhamala, Varun Kumar, Qian Hu, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Rahul Gupta

AmazonGeorgia Tech

arXiv:2211.12503v12.38 citationsh-index: 78

Originality Incremental advance

AI Analysis

This work addresses a specific problem in text-to-image generation for users by improving model reliability in handling ambiguous prompts, though it is incremental as it builds on existing systems.

The paper tackles the problem of ambiguities in text-to-image generative models by curating a benchmark dataset and proposing a framework that solicits user clarifications to mitigate these issues, resulting in more faithful images aligned with human intention as shown through evaluations.

Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benchmark dataset covering different types of ambiguities that occur in these systems. We then propose a framework to mitigate ambiguities in the prompts given to the systems by soliciting clarifications from the user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with human intention in the presence of ambiguities.

View on arXiv PDF

Similar