PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization
This addresses the challenge for users of text-to-image models who need to refine prompts manually, though it is incremental as it builds on existing multi-agent and Chain-of-Thought methods.
The paper tackles the problem of crafting detailed prompts for text-to-image models by proposing PromptSculptor, a multi-agent framework that automates iterative prompt optimization, resulting in significantly enhanced output quality and reduced iterations for user satisfaction.
The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context-often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.