Deconstructing Denoising Diffusion Models for Self-Supervised Learning
This work addresses the problem of understanding and simplifying complex models for self-supervised learning in machine learning, though it is incremental as it builds on existing methods.
The study deconstructs Denoising Diffusion Models to transform them into classical Denoising Autoencoders, finding that only a few modern components are critical for self-supervised representation learning, resulting in a highly simplified approach.
In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. Our philosophy is to deconstruct a DDM, gradually transforming it into a classical Denoising Autoencoder (DAE). This deconstructive procedure allows us to explore how various components of modern DDMs influence self-supervised representation learning. We observe that only a very few modern components are critical for learning good representations, while many others are nonessential. Our study ultimately arrives at an approach that is highly simplified and to a large extent resembles a classical DAE. We hope our study will rekindle interest in a family of classical methods within the realm of modern self-supervised learning.