From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
This addresses instability in autoregressive image generation for AI researchers, though it appears incremental as it builds on existing CoT and RL methods.
The paper tackled the unclear interaction between Chain-of-Thought exploration and Reinforcement Learning optimization in text-to-image generation, proposing an entropy-guided fine-tuning strategy that achieves state-of-the-art performance on benchmarks.
Combining Chain-of-Thought (CoT) with Reinforcement Learning (RL) improves text-to-image (T2I) generation, yet the underlying interaction between CoT's exploration and RL's optimization remains unclear. We present a systematic entropy-based analysis that yields three key insights: (1) CoT expands the generative exploration space, while RL contracts it toward high-reward regions; (2) final reward is strongly negatively correlated with both the mean and variance of image-token entropy, highlighting the need to reduce uncertainty and instability; and (3) the entropy of the textual CoT directly governs downstream image quality, with lower-entropy CoTs leading to better generations. Motivated by these findings, we propose Entropy-Guided Group Relative Policy Optimization (EG-GRPO), a fine-tuning strategy that reallocates optimization budget by uncertainty: low-entropy tokens are excluded from reward-driven updates to preserve stability, while high-entropy tokens receive an entropy bonus that encourages structured exploration without collapse. Experiments on standard T2I benchmarks demonstrate that EG-GRPO achieves state-of-the-art performance.