LGAIOct 20, 2025

Fine-tuning Flow Matching Generative Models with Intermediate Feedback

arXiv:2510.18072v14 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of improving text-to-image generation for AI applications, though it appears incremental as it builds on existing flow matching models with specific enhancements.

The paper tackled the challenge of fine-tuning flow-based generative models with intermediate feedback by introducing AC-Flow, an actor-critic framework that achieved state-of-the-art performance in text-to-image alignment tasks and generalization to unseen human preference models on Stable Diffusion 3.

Flow-based generative models have shown remarkable success in text-to-image generation, yet fine-tuning them with intermediate feedback remains challenging, especially for continuous-time flow matching models. Most existing approaches solely learn from outcome rewards, struggling with the credit assignment problem. Alternative methods that attempt to learn a critic via direct regression on cumulative rewards often face training instabilities and model collapse in online settings. We present AC-Flow, a robust actor-critic framework that addresses these challenges through three key innovations: (1) reward shaping that provides well-normalized learning signals to enable stable intermediate value learning and gradient control, (2) a novel dual-stability mechanism that combines advantage clipping to prevent destructive policy updates with a warm-up phase that allows the critic to mature before influencing the actor, and (3) a scalable generalized critic weighting scheme that extends traditional reward-weighted methods while preserving model diversity through Wasserstein regularization. Through extensive experiments on Stable Diffusion 3, we demonstrate that AC-Flow achieves state-of-the-art performance in text-to-image alignment tasks and generalization to unseen human preference models. Our results demonstrate that even with a computationally efficient critic model, we can robustly finetune flow models without compromising generative quality, diversity, or stability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes