CVMay 17, 2025

Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning

Fu-Yun Wang, Keqiang Sun, Yao Teng, Xihui Liu, Jiale Yuan, Jiaming Song, Hongsheng Li

arXiv:2505.11777v28.42 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This work addresses a practical bottleneck in preference optimization for diffusion models, making it more accessible in domains with scarce or difficult-to-acquire data, though it is incremental as it builds on prior negative preference optimization methods.

The paper tackles the problem of aligning diffusion models with human preferences without requiring costly explicit preference annotations, by introducing Self-NPO, a data-free method that achieves comparable performance to prior approaches with less than 1% training cost.

Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation. Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences. While existing PO methods primarily concentrate on producing favorable outputs, they often overlook the significance of classifier-free guidance (CFG) in mitigating undesirable results. Diffusion-NPO addresses this gap by introducing negative preference optimization (NPO), training models to generate outputs opposite to human preferences and thereby steering them away from unfavorable outcomes through CFG. However, prior NPO approaches rely on costly and fragile procedures for obtaining explicit preference annotations (e.g., manual pairwise labeling or reward model training), limiting their practicality in domains where such data are scarce or difficult to acquire. In this work, we propose Self-NPO, specifically truncated diffusion fine-tuning, a data-free approach of negative preference optimization by directly learning from the model itself, eliminating the need for manual data labeling or reward model training. This data-free approach is highly efficient (less than 1% training cost of Diffusion-NPO) and achieves comparable performance to Diffusion-NPO in a data-free manner. We demonstrate that Self-NPO integrates seamlessly into widely used diffusion models, including SD1.5, SDXL, and CogVideoX, as well as models already optimized for human preferences, consistently enhancing both their generation quality and alignment with human preferences. Code is available at https://github.com/G-U-N/Diffusion-NPO.

View on arXiv PDF Code

Similar