LGCVNov 6, 2023

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

arXiv:2311.03233v34 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently scaling deep learning models for practitioners by optimizing compute usage, though it is incremental as it builds on existing scaling laws.

The paper tackles the problem of compute-optimal model training by introducing adaptive models that change shape during training, resulting in a significant reduction in required compute to achieve target performance compared to static models.

In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of compute. This leads to the notion of a `compute-optimal' model, i.e. a model that allocates a given level of compute during training optimally to maximize performance. In this work, we extend the concept of optimality by allowing for an `adaptive' model, i.e. a model that can change its shape during training. By doing so, we can design adaptive models that optimally traverse between the underlying scaling laws and outpace their `static' counterparts, leading to a significant reduction in the required compute to reach a given target performance. We show that our approach generalizes across modalities and different shape parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes