Underspecification in Scene Description-to-Depiction Tasks
It addresses ethical concerns and task validity for multimodal image+text systems, but is incremental as it focuses on conceptual mapping rather than new methods.
This position paper tackles the problem of underspecification in scene description-to-depiction tasks by mapping a conceptual framework to address textual and visual ambiguity, proposing strategies like generating ambiguous or diverse images.
Questions regarding implicitness, ambiguity and underspecification are crucial for understanding the task validity and ethical concerns of multimodal image+text systems, yet have received little attention to date. This position paper maps out a conceptual framework to address this gap, focusing on systems which generate images depicting scenes from scene descriptions. In doing so, we account for how texts and images convey meaning differently. We outline a set of core challenges concerning textual and visual ambiguity, as well as risks that may be amplified by ambiguous and underspecified elements. We propose and discuss strategies for addressing these challenges, including generating visually ambiguous images, and generating a set of diverse images.