LGJan 30

Float8@2bits: Entropy Coding Enables Data-Free Model Compression

arXiv:2601.22787v11 citationsh-index: 11
Originality Highly original
AI Analysis

This enables practical, fast, and robust compression for large models, addressing a bottleneck in accessibility and efficiency for deployment.

The paper tackled the problem of extreme post-training model compression below 4 bits without data dependency, achieving state-of-the-art results by decoupling numerical precision from storage cost via entropy coding, compressing a 70B parameter model in under 30 minutes.

Post-training compression is currently divided into two contrasting regimes. On the one hand, fast, data-free, and model-agnostic methods (e.g., NF4 or HQQ) offer maximum accessibility but suffer from functional collapse at extreme bit-rates below 4 bits. On the other hand, techniques leveraging calibration data or extensive recovery training achieve superior fidelity but impose high computational constraints and face uncertain robustness under data distribution shifts. We introduce EntQuant, the first framework to unite the advantages of these distinct paradigms. By matching the performance of data-dependent methods with the speed and universality of data-free techniques, EntQuant enables practical utility in the extreme compression regime. Our method decouples numerical precision from storage cost via entropy coding, compressing a 70B parameter model in less than 30 minutes. We demonstrate that EntQuant does not only achieve state-of-the-art results on standard evaluation sets and models, but also retains functional performance on more complex benchmarks with instruction-tuned models, all at modest inference overhead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes