LG CLMay 5, 2025

Radio: Rate-Distortion Optimization for Large Language Model Compression

arXiv:2505.03031v111.45 citationsh-index: 1ICML

Originality Incremental advance

AI Analysis

This addresses the challenge of reducing compute costs and environmental impact for deploying LLMs, though it appears incremental as it builds on existing quantization methods with a new theoretical perspective.

The paper tackles the problem of compressing large language models (LLMs) for deployment on resource-limited devices by proposing a quantization technique based on rate-distortion optimization, which scales to models with hundreds of billions of parameters and allows flexible post-training compression to user-specified size or accuracy.

In recent years, the compression of large language models (LLMs) has emerged as a key problem in facilitating LLM deployment on resource-limited devices, reducing compute costs, and mitigating the environmental footprint due to large-scale AI infrastructure. Here, we establish the foundations of LLM quantization from a rate-distortion theory perspective and propose a quantization technique based on simple rate-distortion optimization. Our technique scales to models containing hundreds of billions of weight parameters and offers users the flexibility to compress models, post-training, to a model size or accuracy specified by the user.

View on arXiv PDF

Similar