CVJan 11, 2024

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim

arXiv:2401.05675v224.044 citationsh-index: 10ECCV

Originality Highly original

AI Analysis

This work addresses the problem of over-optimization and manual weight adjustment in multi-reward RL for text-to-image generation, offering an automated solution with broad applicability in AI-driven content creation.

The paper tackles the challenge of manually tuning reward weights in multi-reward reinforcement learning for text-to-image generation by proposing Parrot, a framework that uses multi-objective optimization to approximate Pareto optimal trade-offs, resulting in significant improvements in image quality across metrics like aesthetics and text-image alignment.

Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.

View on arXiv PDF

Similar