PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space
For users of text-to-image generation models, this method improves prompt inversion by producing more interpretable and effective prompts without requiring access to model internals.
PromptEvolver introduces a genetic algorithm-based method for prompt inversion that generates natural-language prompts achieving high-fidelity reconstructions of target images, outperforming existing methods across multiple benchmarks.
Text-to-image generation has progressed rapidly, but faithfully generating complex scenes requires extensive trial-and-error to find the exact prompt. In the prompt inversion task, the goal is to recover a textual prompt that can faithfully reconstruct a given target image. Currently, existing methods frequently yield suboptimal reconstructions and produce unnatural, hard-to-interpret prompts that hinder transparency and controllability. In this work, we present PromptEvolver, a prompt inversion approach that generates natural-language prompts while achieving high-fidelity reconstructions of the target image. Our method uses a genetic algorithm to optimize the prompt, leveraging a strong vision-language model to guide the evolution process. Importantly, it works on black-box generation models by requiring only image outputs. Finally, we evaluate PromptEvolver across multiple prompt inversion benchmarks and show that it consistently outperforms competing methods.