LG AIMay 29, 2025

Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks

Dongwoo Lee, Dong Bok Lee, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Frank Hutter, Seon Joo Kim, Hae Beom Lee

arXiv:2505.23032v37.11 citationsh-index: 12Has CodeICML

Originality Incremental advance

AI Analysis

This work addresses the need for reliable, uncertainty-aware scaling predictions for decision-making in deep learning, such as resource allocation, though it is incremental in applying Bayesian methods to this domain.

The paper tackled the problem of predicting neural scaling laws with uncertainty quantification, using a Bayesian framework based on Prior-data Fitted Networks, and demonstrated superior performance in data-limited scenarios like Bayesian active learning.

Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.

View on arXiv PDF Code

Similar