Dynamic Importance in Diffusion U-Net for Enhanced Image Synthesis
This work addresses efficiency and quality issues in image generation and editing for users of diffusion models, representing an incremental improvement over existing methods.
The paper tackled the problem of inefficient inference and suboptimal image quality in diffusion models by proposing a method to dynamically re-weight Transformer blocks in U-Net architectures, resulting in improved signal-to-noise ratio and enhanced aesthetic quality with identity consistency.
Traditional diffusion models typically employ a U-Net architecture. Previous studies have unveiled the roles of attention blocks in the U-Net. However, they overlook the dynamic evolution of their importance during the inference process, which hinders their further exploitation to improve image applications. In this study, we first theoretically proved that, re-weighting the outputs of the Transformer blocks within the U-Net is a "free lunch" for improving the signal-to-noise ratio during the sampling process. Next, we proposed Importance Probe to uncover and quantify the dynamic shifts in importance of the Transformer blocks throughout the denoising process. Finally, we design an adaptive importance-based re-weighting schedule tailored to specific image generation and editing tasks. Experimental results demonstrate that, our approach significantly improves the efficiency of the inference process, and enhances the aesthetic quality of the samples with identity consistency. Our method can be seamlessly integrated into any U-Net-based architecture. Code: https://github.com/Hytidel/UNetReweighting