The Infinite Index: Information Retrieval on Generative Text-To-Image Models
This addresses the challenge of retrieving desired images from generative models for users like game designers, but it is incremental as it builds on existing prompt engineering concepts.
The paper tackles the problem of prompt engineering for generative text-to-image models by reframing it as interactive text-based retrieval on an 'infinite index', and demonstrates this approach in a case study on image generation for game design with an expert.
Conditional generative models such as DALL-E and Stable Diffusion generate images based on a user-defined text, the prompt. Finding and refining prompts that produce a desired image has become the art of prompt engineering. Generative models do not provide a built-in retrieval model for a user's information need expressed through prompts. In light of an extensive literature review, we reframe prompt engineering for generative models as interactive text-based retrieval on a novel kind of "infinite index". We apply these insights for the first time in a case study on image generation for game design with an expert. Finally, we envision how active learning may help to guide the retrieval of generated images.