CVFeb 13, 2024

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

arXiv:2402.08265v256 citationsh-index: 11ICML
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving preference alignment in text-to-image generation, which is an incremental advancement over prior methods that ignored the sequential nature of the process.

The paper tackles the problem of aligning text-to-image diffusion models with human preferences by proposing a dense reward perspective that emphasizes initial steps in the generation process, resulting in competitive performance with baselines in single and multiple prompt generation tasks.

Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency of preference alignment. In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In particular, we introduce temporal discounting into DPO-style explicit-reward-free objectives, to break the temporal symmetry therein and suit the T2I generation hierarchy. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines, both quantitatively and qualitatively. Further investigations are conducted to illustrate the insight of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes