CVNov 18, 2024

Decoupling Training-Free Guided Diffusion by ADMM

U of Toronto
arXiv:2411.12773v15 citationsh-index: 7CVPR
Originality Highly original
AI Analysis

This addresses the challenge of balancing guidance in diffusion models for conditional generation, offering a plug-and-play solution that is incremental but improves performance in specific domains.

The paper tackles the problem of conditional generation using off-the-shelf unconditional diffusion models by decoupling the unconditional model and guidance loss into separate variables, reformulating it as a constrained optimization solved via ADMM. The result is a method that consistently generates high-quality samples with strong adherence to conditioning, outperforming existing methods across tasks like image generation and motion synthesis.

In this paper, we consider the conditional generation problem by guiding off-the-shelf unconditional diffusion models with differentiable loss functions in a plug-and-play fashion. While previous research has primarily focused on balancing the unconditional diffusion model and the guided loss through a tuned weight hyperparameter, we propose a novel framework that distinctly decouples these two components. Specifically, we introduce two variables ${x}$ and ${z}$, to represent the generated samples governed by the unconditional generation model and the guidance function, respectively. This decoupling reformulates conditional generation into two manageable subproblems, unified by the constraint ${x} = {z}$. Leveraging this setup, we develop a new algorithm based on the Alternating Direction Method of Multipliers (ADMM) to adaptively balance these components. Additionally, we establish the equivalence between the diffusion reverse step and the proximal operator of ADMM and provide a detailed convergence analysis of our algorithm under certain mild assumptions. Our experiments demonstrate that our proposed method ADMMDiff consistently generates high-quality samples while ensuring strong adherence to the conditioning criteria. It outperforms existing methods across a range of conditional generation tasks, including image generation with various guidance and controllable motion synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes