CVAug 31, 2024

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

arXiv:2409.00492v12 citationsh-index: 41
Originality Incremental advance
AI Analysis

This work addresses the issue of model inaccessibility for resource-limited users by providing a more efficient compression method, though it is incremental as it builds on existing post-training quantization techniques.

The paper tackles the problem of compressing large text-to-image diffusion models to make them more accessible in resource-limited environments by using vector quantization, achieving similar image quality and textual alignment at around 3 bits compared to previous 4-bit compression techniques.

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in size and already contain billions of parameters. As a result, state-of-the-art text-to-image models are becoming less accessible in practice, especially in resource-limited environments. Post-training quantization (PTQ) tackles this issue by compressing the pretrained model weights into lower-bit representations. Recent diffusion quantization techniques primarily rely on uniform scalar quantization, providing decent performance for the models compressed to 4 bits. This work demonstrates that more versatile vector quantization (VQ) may achieve higher compression rates for large-scale text-to-image diffusion models. Specifically, we tailor vector-based PTQ methods to recent billion-scale text-to-image models (SDXL and SDXL-Turbo), and show that the diffusion models of 2B+ parameters compressed to around 3 bits using VQ exhibit the similar image quality and textual alignment as previous 4-bit compression techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes