CVAILGMay 22, 2024

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

arXiv:2405.13637v634 citationsh-index: 30Has CodeCVPR
Originality Incremental advance
AI Analysis

This addresses the problem of enhancing fine-tuning for text-to-image models like diffusion and consistency models, offering a more efficient alternative to RLHF, though it is incremental as it builds on existing DPO methods.

The paper tackles improving text-to-image generation by proposing Curriculum DPO, a method that uses curriculum learning to sample increasingly difficult pairs of examples based on ranking differences, and it outperforms state-of-the-art methods on nine benchmarks in text alignment, aesthetics, and human preference.

Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to train the generative model. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on nine benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at https://github.com/CroitoruAlin/Curriculum-DPO.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes