CVAIMay 8, 2025

Flow-GRPO: Training Flow Matching Models via Online RL

arXiv:2505.05470v5440 citationsh-index: 20Has Code
Originality Highly original
AI Analysis

This work addresses the challenge of enhancing performance and alignment in generative models for text-to-image applications, representing a novel integration rather than an incremental improvement.

The paper tackled the problem of training flow matching models by integrating online reinforcement learning, resulting in significant improvements in text-to-image tasks, such as increasing GenEval accuracy from 63% to 95% for compositional generation and visual text rendering accuracy from 59% to 92%.

We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistical sampling for RL exploration; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original number of inference steps, significantly improving sampling efficiency without sacrificing performance. Empirically, Flow-GRPO is effective across multiple text-to-image tasks. For compositional generation, RL-tuned SD3.5-M generates nearly perfect object counts, spatial relations, and fine-grained attributes, increasing GenEval accuracy from $63\%$ to $95\%$. In visual text rendering, accuracy improves from $59\%$ to $92\%$, greatly enhancing text generation. Flow-GRPO also achieves substantial gains in human preference alignment. Notably, very little reward hacking occurred, meaning rewards did not increase at the cost of appreciable image quality or diversity degradation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes