Diffusion Model as Representation Learner
It addresses the problem of leveraging generative models for representation learning in computer vision, offering a novel transfer paradigm that is incremental but impactful for the field.
The paper investigates the representation power of Diffusion Probabilistic Models (DPMs) and introduces RepFusion, a knowledge transfer method that uses DPM representations for recognition tasks, achieving state-of-the-art performance on benchmarks like image classification and semantic segmentation.
Diffusion Probabilistic Models (DPMs) have recently demonstrated impressive results on various generative tasks.Despite its promises, the learned representations of pre-trained DPMs, however, have not been fully understood. In this paper, we conduct an in-depth investigation of the representation power of DPMs, and propose a novel knowledge transfer method that leverages the knowledge acquired by generative DPMs for recognition tasks. Our study begins by examining the feature space of DPMs, revealing that DPMs are inherently denoising autoencoders that balance the representation learning with regularizing model capacity. To this end, we introduce a novel knowledge transfer paradigm named RepFusion. Our paradigm extracts representations at different time steps from off-the-shelf DPMs and dynamically employs them as supervision for student networks, in which the optimal time is determined through reinforcement learning. We evaluate our approach on several image classification, semantic segmentation, and landmark detection benchmarks, and demonstrate that it outperforms state-of-the-art methods. Our results uncover the potential of DPMs as a powerful tool for representation learning and provide insights into the usefulness of generative models beyond sample generation. The code is available at \url{https://github.com/Adamdad/Repfusion}.