LGAIMar 13, 2025

Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

arXiv:2503.11720v45 citationsh-index: 42Has Code
Originality Incremental advance
AI Analysis

This work addresses the issue of opaque and limited feedback in preference-based fine-tuning for diffusion models, offering a more informative approach for researchers and practitioners in generative AI.

The paper tackles the problem of fine-tuning text-to-image diffusion models by introducing Rich Preference Optimization (RPO), which uses detailed critiques to generate synthetic preference pairs, resulting in improved model performance as demonstrated in fine-tuning state-of-the-art diffusion models.

We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. In contrast, our approach begins with generating detailed critiques of synthesized images, from which we extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models. Our code is available at https://github.com/Diffusion-RLHF/RPO.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes