LGDIS-NNMLApr 26, 2024

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

arXiv:2404.17563v314 citationsh-index: 6NIPS
Originality Incremental advance
AI Analysis

This provides a theoretical framework for understanding emergence and scaling laws in deep learning, which is foundational for AI research, though it is incremental as it builds on existing concepts of skills and scaling.

The authors tackled the phenomenon of emergence in deep learning by developing an exactly solvable model that represents new abilities as basis functions, and they derived analytic expressions for emergence and scaling laws, which matched simulations of a two-layer neural network on multitask sparse parity tasks with a single fit parameter.

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes