LGAICLJun 25, 2024

CDQuant: Greedy Coordinate Descent for Accurate LLM Quantization

arXiv:2406.17542v36 citations
Originality Incremental advance
AI Analysis

This work addresses the computational and storage constraints of deploying LLMs, offering an incremental improvement over existing quantization methods like GPTQ.

The paper tackles the problem of compressing large language models (LLMs) for deployment by introducing CDQuant, a post-training quantization method that uses greedy coordinate descent to minimize reconstruction loss, resulting in consistent outperformance over GPTQ in 2-4 bit weight quantization and further gains when integrated into other state-of-the-art techniques.

Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses greedy coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to models with hundreds of billions of parameters. We perform extensive evaluation on Gemma, and PaLM2 model families, and demonstrate that CDQuant consistently outperforms GPTQ in 2-4 bit weight quantization. Moreover, CDQuant improves the performance of state-of-the-art PTQ techniques such as QuIP and FrameQuant when used as a replacement for their GPTQ component, resulting in further gains in quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes