LGCLOct 19, 2025

Zero-Shot Performance Prediction for Probabilistic Scaling Laws

arXiv:2510.16743v11 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the problem of reducing computational overhead for NLP researchers and practitioners, though it appears incremental as it builds on existing probabilistic methods and scaling law concepts.

The paper tackles the problem of predicting learning curves for NLP models to reduce computational costs, formulating it as a multitask learning problem with a two-layer hierarchy and using latent variable multi-output Gaussian Processes to enable zero-shot prediction. The approach is validated on three small-scale NLP datasets with up to 30 learning curves, showing it can provide predictions close to ground truth scaling laws.

The prediction of learning curves for Natural Language Processing (NLP) models enables informed decision-making to meet specific performance objectives, while reducing computational overhead and lowering the costs associated with dataset acquisition and curation. In this work, we formulate the prediction task as a multitask learning problem, where each task's data is modelled as being organized within a two-layer hierarchy. To model the shared information and dependencies across tasks and hierarchical levels, we employ latent variable multi-output Gaussian Processes, enabling to account for task correlations and supporting zero-shot prediction of learning curves (LCs). We demonstrate that this approach facilitates the development of probabilistic scaling laws at lower costs. Applying an active learning strategy, LCs can be queried to reduce predictive uncertainty and provide predictions close to ground truth scaling laws. We validate our framework on three small-scale NLP datasets with up to $30$ LCs. These are obtained from nanoGPT models, from bilingual translation using mBART and Transformer models, and from multilingual translation using M2M100 models of varying sizes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes