CVAIMay 23, 2025

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

arXiv:2505.17540v117 citationsh-index: 16
Originality Highly original
AI Analysis

This addresses the issue of under-specified prompts in text-to-image generation for users, though it is incremental as it builds on prior LLM-based methods.

The paper tackles the problem of text-to-image models failing to capture user intentions from short prompts by proposing RePrompt, a reinforcement learning framework that enhances prompts through reasoning, resulting in significant improvements in spatial layout fidelity and compositional generalization across benchmarks.

Despite recent progress in text-to-image (T2I) generation, existing models often struggle to faithfully capture user intentions from short and under-specified prompts. While prior work has attempted to enhance prompts using large language models (LLMs), these methods frequently generate stylistic or unrealistic content due to insufficient grounding in visual semantics and real-world composition. Inspired by recent advances in reasoning for language model, we propose RePrompt, a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning. Instead of relying on handcrafted rules or stylistic rewrites, our method trains a language model to generate structured, self-reflective prompts by optimizing for image-level outcomes. The tailored reward models assesse the generated images in terms of human preference, semantic alignment, and visual composition, providing indirect supervision to refine prompt generation. Our approach enables end-to-end training without human-annotated data. Experiments on GenEval and T2I-Compbench show that RePrompt significantly boosts spatial layout fidelity and compositional generalization across diverse T2I backbones, establishing new state-of-the-art results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes