LGAISep 13, 2022

Revisiting Neural Scaling Laws in Language and Vision

arXiv:2209.06640v2164 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work addresses the challenge of reliably predicting performance gains from scaling for researchers and practitioners in machine learning, though it is incremental as it builds on existing scaling law methods.

The authors tackled the problem of predicting the benefits of scaling in deep learning by proposing a more rigorous methodology based on extrapolation loss, rather than best-fitting parameters, and demonstrated that their recipe for estimating scaling law parameters extrapolates more accurately across various domains, including image classification, neural machine translation, and language modeling.

The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules. To predict the benefit of scale empirically, we argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting (interpolating) parameters. We then present a recipe for estimating scaling law parameters reliably from learning curves. We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains, including image classification, neural machine translation (NMT) and language modeling, in addition to tasks from the BIG-Bench evaluation benchmark. Finally, we release a benchmark dataset comprising of 90 evaluation tasks to facilitate research in this domain.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes