CVOct 15, 2025

Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation

Yifu Luo, Xinhao Hu, Keyu Fan, Haoyuan Sun, Zeyu Chen, Bo Xia, Tiantian Zhang, Yongzhe Chang, Xueqian Wang

arXiv:2510.13418v115.57 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This work addresses a gap in reinforcement learning for text-to-image generation by focusing on masked generative models, offering incremental improvements in a domain-specific area.

The authors tackled the problem of applying reinforcement learning to masked generative models for text-to-image generation, proposing Mask-GRPO, which improved a base model to outperform state-of-the-art approaches on benchmarks and preference alignment.

Reinforcement learning (RL) has garnered increasing attention in text-to-image (T2I) generation. However, most existing RL approaches are tailored to either diffusion models or autoregressive models, overlooking an important alternative: masked generative models. In this work, we propose Mask-GRPO, the first method to incorporate Group Relative Policy Optimization (GRPO)-based RL into this overlooked paradigm. Our core insight is to redefine the transition probability, which is different from current approaches, and formulate the unmasking process as a multi-step decision-making problem. To further enhance our method, we explore several useful strategies, including removing the KL constraint, applying the reduction strategy, and filtering out low-quality samples. Using Mask-GRPO, we improve a base model, Show-o, with substantial improvements on standard T2I benchmarks and preference alignment, outperforming existing state-of-the-art approaches. The code is available on https://github.com/xingzhejun/Mask-GRPO

View on arXiv PDF Code

Similar