CVDec 1, 2024

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling

arXiv:2412.00759v324 citationsh-index: 1CVPR
Originality Incremental advance
AI Analysis

This addresses the need for computationally efficient alignment of diffusion models for image generation, though it appears incremental as a training-free improvement over existing methods.

The paper tackles the problem of aligning text-to-image diffusion models with human preferences without requiring additional training, proposing DyMO which uses dynamic scheduling of multiple objectives during inference. The method achieves effective and robust alignment across diverse pre-trained models and metrics.

Text-to-image diffusion model alignment is critical for improving the alignment between the generated images and human preferences. While training-based methods are constrained by high computational costs and dataset requirements, training-free alignment methods remain underexplored and are often limited by inaccurate guidance. We propose a plug-and-play training-free alignment method, DyMO, for aligning the generated images and human preferences during inference. Apart from text-aware human preference scores, we introduce a semantic alignment objective for enhancing the semantic alignment in the early stages of diffusion, relying on the fact that the attention maps are effective reflections of the semantics in noisy images. We propose dynamic scheduling of multiple objectives and intermediate recurrent steps to reflect the requirements at different steps. Experiments with diverse pre-trained diffusion models and metrics demonstrate the effectiveness and robustness of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes