LG MLFeb 14, 2025

Solving Empirical Bayes via Transformers

arXiv:2502.09844v213.010 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses a foundational statistical problem with potential broad impact in data analysis, though it is incremental as it adapts existing AI tools to a classical setting.

The paper tackles the Poisson empirical Bayes problem by applying transformers to estimate high-dimensional mean vectors, showing that small transformer models outperform classical non-parametric maximum likelihood in runtime and validation loss on synthetic and real-world datasets.

This work applies modern AI tools (transformers) to solving one of the oldest statistical problems: Poisson means under empirical Bayes (Poisson-EB) setting. In Poisson-EB a high-dimensional mean vector $θ$ (with iid coordinates sampled from an unknown prior $π$) is estimated on the basis of $X=\mathrm{Poisson}(θ)$. A transformer model is pre-trained on a set of synthetically generated pairs $(X,θ)$ and learns to do in-context learning (ICL) by adapting to unknown $π$. Theoretically, we show that a sufficiently wide transformer can achieve vanishing regret with respect to an oracle estimator who knows $π$ as dimension grows to infinity. Practically, we discover that already very small models (100k parameters) are able to outperform the best classical algorithm (non-parametric maximum likelihood, or NPMLE) both in runtime and validation loss, which we compute on out-of-distribution synthetic data as well as real-world datasets (NHL hockey, MLB baseball, BookCorpusOpen). Finally, by using linear probes, we confirm that the transformer's EB estimator appears to internally work differently from either NPMLE or Robbins' estimators.

View on arXiv PDF

Similar