CVMay 19

FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models

Yi Sun, Zhiqi Zhang, Xinhao Zhong, Yimin Zhou, Shuoyang Sun, Bin Chen, Shu-Tao Xia, Ke Xu

arXiv:2605.1973931.4

Predicted impact top 19% in CV · last 90 daysOriginality Highly original

AI Analysis

This work addresses the safety risk of generating harmful content in text-to-image flow matching models, offering a more effective and scalable erasure method compared to existing inference-time interventions or supervised fine-tuning.

FlowErase-RL reformulates concept erasure in flow matching models as a reward optimization problem using a GRPO-based framework with a dynamic dual-path reward mechanism, achieving state-of-the-art erasure performance on nudity, object, and artistic style tasks while maintaining image quality and scaling to multi-concept scenarios.

Recent advances in flow matching models have significantly improved text-to-image generation quality, but also introduce growing safety risks due to the generation of harmful or undesirable content. Existing concept erasure methods are either inference-time interventions with limited effectiveness or rely on supervised fine-tuning (SFT), which requires precisely aligned data and struggles with scalability and multi-concept settings. In this paper, we propose \emph{FlowErase-RL}, the first GRPO-based framework for concept erasure in flow matching models. We reformulate concept erasure as a reward optimization problem and introduce a \textbf{dynamic dual-path reward mechanism} that jointly optimizes (i) a Concept Erasure (CE) reward to suppress target concepts and (ii) a Non-target Space (NS) reward to preserve generative fidelity. The two reward paths are adaptively balanced during training via a performance-driven switching strategy, enabling stable optimization without explicit supervision. Extensive experiments on nudity, object, and artistic style erasure demonstrate that our method achieves state-of-the-art erasure performance while maintaining strong image quality and semantic alignment. Moreover, it exhibits robust resistance to adversarial attacks and scales effectively to multi-concept scenarios. Our results establish a new paradigm for safe and controllable generation in flow matching models.

View on arXiv PDF

Similar