LGCVGROct 21, 2025

From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

arXiv:2510.18263v13 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem for image generation, offering incremental improvements over existing reinforcement learning methods.

The paper tackled the trade-off between identity preservation and prompt adherence in subject-driven image generation by proposing Customized-GRPO, which outperformed naive GRPO baselines and achieved a superior balance in generating images.

Subject-driven image generation models face a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability). While online reinforcement learning (RL), specifically GPRO, offers a promising solution, we find that a naive application of GRPO leads to competitive degradation, as the simple linear aggregation of rewards with static weights causes conflicting gradient signals and a misalignment with the temporal dynamics of the diffusion process. To overcome these limitations, we propose Customized-GRPO, a novel framework featuring two key innovations: (i) Synergy-Aware Reward Shaping (SARS), a non-linear mechanism that explicitly penalizes conflicted reward signals and amplifies synergistic ones, providing a sharper and more decisive gradient. (ii) Time-Aware Dynamic Weighting (TDW), which aligns the optimization pressure with the model's temporal dynamics by prioritizing prompt-following in the early, identity preservation in the later. Extensive experiments demonstrate that our method significantly outperforms naive GRPO baselines, successfully mitigating competitive degradation. Our model achieves a superior balance, generating images that both preserve key identity features and accurately adhere to complex textual prompts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes