LGAICLITSep 19, 2023

Language Modeling Is Compression

arXiv:2309.10668v2256 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work provides a compression-based perspective for analyzing scaling laws and in-context learning in AI, offering insights for researchers in machine learning and data compression.

The paper tackles the problem of evaluating large language models as general-purpose compressors, showing that models like Chinchilla 70B compress ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, outperforming domain-specific methods.

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes