LGCVDec 10, 2024

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

MILA
arXiv:2412.07775v629 citationsh-index: 56ICLR
Originality Incremental advance
AI Analysis

This addresses the challenge of reward finetuning for diffusion models, which is incremental but improves upon existing methods by enhancing diversity and efficiency.

The paper tackles the problem of aligning pretrained diffusion models with reward functions while preserving diversity and prior knowledge, proposing a method that achieves fast, diversity-preserving finetuning of Stable Diffusion on realistic rewards.

While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetune pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. In response to this challenge, we take inspiration from recent successes in generative flow networks (GFlowNets) and propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet (abbreviated as $\nabla$-GFlowNet), that leverages the rich signal in reward gradients for probabilistic diffusion finetuning. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model, on different realistic reward functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes