LGMLFeb 8, 2021

Learning Curve Theory

arXiv:2102.04074v192 citations
AI Analysis

This work provides a theoretical framework for understanding the empirical scaling laws observed in machine learning, which is a foundational problem for the entire field.

This paper introduces a theoretical model to explain the observed power-law scaling of learning curves with respect to data size $n$, where error decreases as $n^{-\beta}$. The model can exhibit arbitrary power $\beta>0$, allowing for investigation into the universality of these power laws.

Recently a number of empirical "universal" scaling law papers have been published, most notably by OpenAI. `Scaling laws' refers to power-law decreases of training or test error w.r.t. more data, larger neural networks, and/or more compute. In this work we focus on scaling w.r.t. data size $n$. Theoretical understanding of this phenomenon is largely lacking, except in finite-dimensional models for which error typically decreases with $n^{-1/2}$ or $n^{-1}$, where $n$ is the sample size. We develop and theoretically analyse the simplest possible (toy) model that can exhibit $n^{-β}$ learning curves for arbitrary power $β>0$, and determine whether power laws are universal or depend on the data distribution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes