CVAIApr 15, 2025

ADT: Tuning Diffusion Models with Adversarial Supervision

arXiv:2504.11423v15 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses a key limitation in diffusion models for image generation, though it is an incremental improvement as it fine-tunes existing models rather than introducing a new paradigm.

The paper tackles the training-inference divergence in diffusion models, which hinders alignment between inference and training data distributions, by proposing Adversarial Diffusion Tuning (ADT), a fine-tuning framework that uses adversarial supervision to align outputs with training data, resulting in significant improvements in distribution alignment and image quality on Stable Diffusion models.

Diffusion models have achieved outstanding image generation by reversing a forward noising process to approximate true data distributions. During training, these models predict diffusion scores from noised versions of true samples in a single forward pass, while inference requires iterative denoising starting from white noise. This training-inference divergences hinder the alignment between inference and training data distributions, due to potential prediction biases and cumulative error accumulation. To address this problem, we propose an intuitive but effective fine-tuning framework, called Adversarial Diffusion Tuning (ADT), by stimulating the inference process during optimization and aligning the final outputs with training data by adversarial supervision. Specifically, to achieve robust adversarial training, ADT features a siamese-network discriminator with a fixed pre-trained backbone and lightweight trainable parameters, incorporates an image-to-image sampling strategy to smooth discriminative difficulties, and preserves the original diffusion loss to prevent discriminator hacking. In addition, we carefully constrain the backward-flowing path for back-propagating gradients along the inference path without incurring memory overload or gradient explosion. Finally, extensive experiments on Stable Diffusion models (v1.5, XL, and v3), demonstrate that ADT significantly improves both distribution alignment and image quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes