A Resource Model For Neural Scaling Law
This work provides a theoretical framework for characterizing neural scaling in AI, which could aid in diagnosing and optimizing neural networks, though it appears incremental as it builds on existing empirical observations.
The paper tackles the problem of understanding neural scaling laws by proposing a resource model where composite tasks are decomposed into subtasks competing for neurons. The model successfully replicates the scaling law of Chinchilla models, predicting performance improvements as model size increases.
Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.