CVApr 27

POCA: Pareto-Optimal Curriculum Alignment for Visual Text Generation

Yaohou Fan, Qingzhong Wang, Yongsong Huang, Junyi Liu, Tomo Miyazaki, Shinichiro Omachi

arXiv:2604.2417173.4

Predicted impact top 38% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in visual text generation, POCA provides a more stable and effective multi-reward alignment method that avoids the instability of weighted-sum approaches and improves training efficiency.

POCA addresses the trade-off between text accuracy and image coherence in visual text generation by formulating it as a multi-objective problem, using Pareto-optimal set identification and adaptive curriculum alignment. It achieves significant improvements across CLIP, HPS scores, and sentence accuracy.

Current visual text generation models struggle with the trade-off between text accuracy and overall image coherence. We find that achieving high text accuracy can reduce aesthetic quality and instruction-following capability. Although reinforcement learning approaches can alleviate the problem through aligning with multiple rewards, they are often unstable for text generation, as existing approaches normally optimize multiple rewards in a weighted-sum way. In addition, it is difficult to balance the weight of each reward. Moreover, reinforcement learning requires a set of training instructions. A large number of prompts require more training time and computing resources, while a small set leads to poor performance. Hence, how to select the prompts for efficient training is an unsolved problem. In this study, we propose Pareto-Optimal Curriculum Alignment (POCA), a framework that addresses this issue as a multi-objective problem by: 1) identifying the Pareto-optimal set to avoid simple scalarization and 2) designing an adaptive curriculum alignment strategy to manage a learning sequence of a multi-reward dataset using automatic difficulty assessment, which is crucial for optimal convergence as RL methods explore in a limited data environment. In synergy, POCA finds the Pareto-optimal set in a unified reward space, which eliminates inconsistent signals to find the best trade-off solution from different rewards under an easy-to-hard optimization landscape. The experimental results show that POCA significantly improves all metrics such as CLIP, HPS scores and sentence accuracy.

View on arXiv PDF

Similar