MLLGFeb 16

Universal priors: solving empirical Bayes via Bayesian inference and pretraining

arXiv:2602.15136v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for using pretrained models in empirical Bayes, addressing adaptation to unknown distributions, though it is incremental as it builds on prior empirical findings.

The paper theoretically justifies that a transformer pretrained on synthetic data performs well on empirical Bayes problems, showing that training under universal priors yields a near-optimal regret bound of O~(1/n) uniformly across test distributions.

We theoretically justify the recent empirical finding of [Teh et al., 2025] that a transformer pretrained on synthetically generated data achieves strong performance on empirical Bayes (EB) problems. We take an indirect approach to this question: rather than analyzing the model architecture or training dynamics, we ask why a pretrained Bayes estimator, trained under a prespecified training distribution, can adapt to arbitrary test distributions. Focusing on Poisson EB problems, we identify the existence of universal priors such that training under these priors yields a near-optimal regret bound of $\widetilde{O}(\frac{1}{n})$ uniformly over all test distributions. Our analysis leverages the classical phenomenon of posterior contraction in Bayesian statistics, showing that the pretrained transformer adapts to unknown test distributions precisely through posterior contraction. This perspective also explains the phenomenon of length generalization, in which the test sequence length exceeds the training length, as the model performs Bayesian inference using a generalized posterior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes