CV AIDec 9, 2023

Efficient Quantization Strategies for Latent Diffusion Models

Yuewei Yang, Xiaoliang Dai, Jialiang Wang, Peizhao Zhang, Hongbo Zhang

arXiv:2312.05431v113.618 citationsh-index: 29

Originality Incremental advance

AI Analysis

This work addresses the critical need for compact generative models for edge deployment, but it is incremental as it builds on existing PTQ methods by adapting them to LDMs.

The paper tackles the problem of deploying large Latent Diffusion Models (LDMs) on edge devices by proposing an efficient Post Training Quantization (PTQ) strategy that uses Signal-to-Quantization-Noise Ratio (SQNR) to identify and mitigate quantization noise in sensitive parts, resulting in a highly efficient and effective compression method.

Latent Diffusion Models (LDMs) capture the dynamic evolution of latent variables over time, blending patterns and multimodality in a generative system. Despite the proficiency of LDM in various applications, such as text-to-image generation, facilitated by robust text encoders and a variational autoencoder, the critical need to deploy large generative models on edge devices compels a search for more compact yet effective alternatives. Post Training Quantization (PTQ), a method to compress the operational size of deep learning models, encounters challenges when applied to LDM due to temporal and structural complexities. This study proposes a quantization strategy that efficiently quantize LDMs, leveraging Signal-to-Quantization-Noise Ratio (SQNR) as a pivotal metric for evaluation. By treating the quantization discrepancy as relative noise and identifying sensitive part(s) of a model, we propose an efficient quantization approach encompassing both global and local strategies. The global quantization process mitigates relative quantization noise by initiating higher-precision quantization on sensitive blocks, while local treatments address specific challenges in quantization-sensitive and time-sensitive modules. The outcomes of our experiments reveal that the implementation of both global and local treatments yields a highly efficient and effective Post Training Quantization (PTQ) of LDMs.

View on arXiv PDF

Similar