LG AI CV STJan 10, 2025

Test-time Alignment of Diffusion Models without Reward Over-optimization

arXiv:2501.05803v338.782 citationsh-index: 7Has CodeICLR

Originality Highly original

AI Analysis

This provides a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities, addressing a key challenge in generative AI.

The paper tackles the problem of aligning diffusion models with specific objectives without reward over-optimization, proposing a training-free, test-time method based on Sequential Monte Carlo that achieves comparable or superior target rewards while preserving diversity and generalization.

Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS.

View on arXiv PDF Code

Similar