CVAIOct 14, 2024

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

arXiv:2410.10733v8190 citationsh-index: 24Has CodeICLR
Originality Highly original
AI Analysis

This work addresses efficiency bottlenecks for users of high-resolution diffusion models in image generation, offering significant speed improvements without accuracy loss.

The paper tackles the problem of accelerating high-resolution diffusion models by introducing Deep Compression Autoencoder (DC-AE), which achieves up to 128x spatial compression while maintaining reconstruction quality, resulting in 19.1x inference speedup and 17.9x training speedup on ImageNet 512x512 with improved FID compared to existing methods.

We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at https://github.com/mit-han-lab/efficientvit.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes