Best Prompts for Text-to-Image Models and How to Find Them
This addresses the challenge of prompt engineering for users of generative models, but it is incremental as it builds on existing methods with human feedback.
The paper tackles the problem of optimizing text prompts for text-to-image models to enhance aesthetic appeal, presenting a human-in-the-loop genetic algorithm that improves image quality for given descriptions.
Recent progress in generative models, especially in text-guided diffusion models, has enabled the production of aesthetically-pleasing imagery resembling the works of professional human artists. However, one has to carefully compose the textual description, called the prompt, and augment it with a set of clarifying keywords. Since aesthetics are challenging to evaluate computationally, human feedback is needed to determine the optimal prompt formulation and keyword combination. In this paper, we present a human-in-the-loop approach to learning the most useful combination of prompt keywords using a genetic algorithm. We also show how such an approach can improve the aesthetic appeal of images depicting the same descriptions.