Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks
This work addresses the need for reliable, uncertainty-aware scaling predictions for decision-making in deep learning, such as resource allocation, though it is incremental in applying Bayesian methods to this domain.
The paper tackled the problem of predicting neural scaling laws with uncertainty quantification, using a Bayesian framework based on Prior-data Fitted Networks, and demonstrated superior performance in data-limited scenarios like Bayesian active learning.
Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.