LGCLFeb 23, 2025

Compression Scaling Laws:Unifying Sparsity and Quantization

arXiv:2502.16440v110 citationsh-index: 52
Originality Incremental advance
AI Analysis

This provides a unified framework for comparing and combining compression techniques, which is useful for researchers and practitioners optimizing model efficiency, though it builds incrementally on existing scaling law work.

The paper investigates how weight/activation quantization and weight sparsity affect scaling laws in large language models, showing that quantization follows similar 'effective parameter' scaling patterns as sparsity, with weight-only quantization achieving strong parameter efficiency but full quantization showing diminishing returns at low bitwidths.

We investigate how different compression techniques -- such as weight and activation quantization, and weight sparsity -- affect the scaling behavior of large language models (LLMs) during pretraining. Building on previous work showing that weight sparsity acts as a constant multiplier on model size in scaling laws, we demonstrate that this "effective parameter" scaling pattern extends to quantization as well. Specifically, we establish that weight-only quantization achieves strong parameter efficiency multipliers, while full quantization of both weights and activations shows diminishing returns at lower bitwidths. Our results suggest that different compression techniques can be unified under a common scaling law framework, enabling principled comparison and combination of these methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes