Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking
This work addresses a specific problem in diffusion model alignment for image generation, representing an incremental advancement over prior methods.
The paper tackled the problem of aligning diffusion models with human preference by identifying issues in existing direct preference optimization methods and proposing a Tailored Preference Optimization framework, resulting in significant improvements in generating aesthetically pleasing and human-preferred images.
Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps, and directly apply DPO to these noisy samples for fine-tuning. However, we theoretically identify inherent issues in this assumption and its impacts on the effectiveness of preference alignment. We first demonstrate the inherent issues from two perspectives: gradient direction and preference order, and then propose a Tailored Preference Optimization (TailorPO) framework for aligning diffusion models with human preference, underpinned by some theoretical insights. Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues through a simple yet efficient design. Additionally, we incorporate the gradient guidance of diffusion models into preference alignment to further enhance the optimization effectiveness. Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.