LGJan 28

DeRaDiff: Denoising Time Realignment of Diffusion Models

Ratnavibusena Don Shahain Manujith, Yang Zhang, Teoh Tze Tzun, Kenji Kawaguchi

arXiv:2601.20198v11 citationsh-index: 8

Originality Highly original

AI Analysis

This provides an efficient solution for practitioners who need to align diffusion models with human preferences without expensive hyperparameter tuning.

The paper tackles the problem of choosing the right regularization strength when aligning diffusion models with human preferences, which is expensive to tune via sweeping. The proposed DeRaDiff method modulates regularization strength during sampling to approximate models trained at different strengths, eliminating the need for costly alignment sweeps.

Recent advances align diffusion models with human preferences to increase aesthetic appeal and mitigate artifacts and biases. Such methods aim to maximize a conditional output distribution aligned with higher rewards whilst not drifting far from a pretrained prior. This is commonly enforced by KL (Kullback Leibler) regularization. As such, a central issue still remains: how does one choose the right regularization strength? Too high of a strength leads to limited alignment and too low of a strength leads to "reward hacking". This renders the task of choosing the correct regularization strength highly non-trivial. Existing approaches sweep over this hyperparameter by aligning a pretrained model at multiple regularization strengths and then choose the best strength. Unfortunately, this is prohibitively expensive. We introduce DeRaDiff, a denoising time realignment procedure that, after aligning a pretrained model once, modulates the regularization strength during sampling to emulate models trained at other regularization strengths without any additional training or finetuning. Extending decoding-time realignment from language to diffusion models, DeRaDiff operates over iterative predictions of continuous latents by replacing the reverse step reference distribution by a geometric mixture of an aligned and reference posterior, thus giving rise to a closed form update under common schedulers and a single tunable parameter, lambda, for on the fly control. Our experiments show that across multiple text image alignment and image-quality metrics, our method consistently provides a strong approximation for models aligned entirely from scratch at different regularization strengths. Thus, our method yields an efficient way to search for the optimal strength, eliminating the need for expensive alignment sweeps and thereby substantially reducing computational costs.

View on arXiv PDF

Similar