LGMay 30, 2025

On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning

arXiv:2506.00136v1h-index: 29
Originality Incremental advance
AI Analysis

This work addresses the problem of balancing representation quality and generation efficiency in diffusion models for researchers and practitioners in generative AI, though it is incremental as it builds on existing diffusion autoencoder frameworks.

The paper tackles the challenge of improving both representation learning and generative efficiency in diffusion autoencoders by connecting them with forward-process diffusion models, resulting in a new model (DMZ) that achieves effective downstream task performance and more efficient generation with fewer denoising steps.

Diffusion autoencoders (DAs) are variants of diffusion generative models that use an input-dependent latent variable to capture representations alongside the diffusion process. These representations, to varying extents, can be used for tasks such as downstream classification, controllable generation, and interpolation. However, the generative performance of DAs relies heavily on how well the latent variables can be modelled and subsequently sampled from. Better generative modelling is also the primary goal of another class of diffusion models -- those that learn their forward (noising) process. While effective at adjusting the noise process in an input-dependent manner, they must satisfy additional constraints derived from the terminal conditions of the diffusion process. Here, we draw a connection between these two classes of models and show that certain design decisions (latent variable choice, conditioning method, etc.) in the DA framework -- leading to a model we term DMZ -- allow us to obtain the best of both worlds: effective representations as evaluated on downstream tasks, including domain transfer, as well as more efficient modelling and generation with fewer denoising steps compared to standard DMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes