CVNov 19, 2024

Diffusion Product Quantization

arXiv:2411.12306v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of large model sizes for diffusion models, enabling more efficient deployment, though it is incremental as it builds on existing quantization techniques.

The paper tackles extreme compression of diffusion models to reduce size while maintaining performance, achieving over 24 times reduction in model size to as low as 1 bit with competitive generative results on ImageNet.

In this work, we explore the quantization of diffusion models in extreme compression regimes to reduce model size while maintaining performance. We begin by investigating classical vector quantization but find that diffusion models are particularly susceptible to quantization error, with the codebook size limiting generation quality. To address this, we introduce product quantization, which offers improved reconstruction precision and larger capacity -- crucial for preserving the generative capabilities of diffusion models. Furthermore, we propose a method to compress the codebook by evaluating the importance of each vector and removing redundancy, ensuring the model size remaining within the desired range. We also introduce an end-to-end calibration approach that adjusts assignments during the forward pass and optimizes the codebook using the DDPM loss. By compressing the model to as low as 1 bit (resulting in over 24 times reduction in model size), we achieve a balance between compression and quality. We apply our compression method to the DiT model on ImageNet and consistently outperform other quantization approaches, demonstrating competitive generative performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes