Test-time Alignment of Diffusion Models without Reward Over-optimization
This provides a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities, addressing a key challenge in generative AI.
The paper tackles the problem of aligning diffusion models with specific objectives without reward over-optimization, proposing a training-free, test-time method based on Sequential Monte Carlo that achieves comparable or superior target rewards while preserving diversity and generalization.
Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS.