LGAIJun 28, 2024

Information-Theoretic Foundations for Neural Scaling Laws

arXiv:2407.01456v11 citations
Originality Incremental advance
AI Analysis

This provides foundational clarity for researchers in machine learning on scaling laws, though it is incremental as it builds on existing empirical work.

The paper tackles the lack of rigorous theoretical support for neural scaling laws by developing information-theoretic foundations, showing that the optimal relation between data and model size is linear up to logarithmic factors for data from an infinite-width two-layer neural network.

Neural scaling laws aim to characterize how out-of-sample error behaves as a function of model and training dataset size. Such scaling laws guide allocation of a computational resources between model and data processing to minimize error. However, existing theoretical support for neural scaling laws lacks rigor and clarity, entangling the roles of information and optimization. In this work, we develop rigorous information-theoretic foundations for neural scaling laws. This allows us to characterize scaling laws for data generated by a two-layer neural network of infinite width. We observe that the optimal relation between data and model size is linear, up to logarithmic factors, corroborating large-scale empirical investigations. Concise yet general results of the kind we establish may bring clarity to this topic and inform future investigations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes