LG DIS-NNMar 23, 2023

The Quantization Model of Neural Scaling

Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark

arXiv:2303.13506v336.1152 citationsh-index: 86Has Code

Originality Incremental advance

AI Analysis

This provides a theoretical explanation for scaling phenomena in AI, which is foundational but incremental as it builds on existing scaling law observations.

The paper tackles the problem of explaining neural scaling laws by proposing the Quantization Model, which attributes the power law dropoff of loss and emergence of new capabilities to discrete knowledge chunks called quanta, and validates this with toy datasets and language model gradients.

We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks ($\textbf{quanta}$). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model gradients, we automatically decompose model behavior into a diverse set of skills (quanta). We tentatively find that the frequency at which these quanta are used in the training distribution roughly follows a power law corresponding with the empirical scaling exponent for language models, a prediction of our theory.

View on arXiv PDF Code

Similar