MLLGMay 7

Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors

arXiv:2605.0641311.6
Predicted impact top 40% in ML · last 90 daysOriginality Highly original
AI Analysis

For practitioners using PFNs in sequential decision-making, this work provides a method to avoid failure modes of total-variance exploration by enabling epistemic-only acquisition.

The paper addresses the problem of separating epistemic and aleatoric uncertainty in Prior-Fitted Networks (PFNs) for sequential decision-making. By training a decoupled PFN with separate heads for latent signal and noise, they achieve improved acquisition in active learning and Bayesian optimization, with the best average rank in HPO and synthetic BO benchmarks.

Prior-Fitted Networks (PFNs) amortize Bayesian prediction by meta-learning over a synthetic task prior, but their standard output is a posterior predictive distribution over noisy observations. For sequential decision-making, such as active learning and Bayesian optimization, acquisition should prioritize epistemic uncertainty about the latent signal rather than irreducible aleatoric observation noise. We show that this epistemic--aleatoric split is not identifiable in general from the posterior predictive distribution alone, even when that distribution is known exactly. We then exploit a distinctive advantage of PFNs: because the synthetic data-generating process is under our control, each task can contain an explicit latent signal and noise function, and the generator can provide query-level labels for both the noiseless target and the observation-noise variance. We use these labels to train a decoupled PFN with separate latent-signal and aleatoric heads. The observation-level predictive is induced by convolving the latent signal distribution with the learned noise model. Empirically, epistemic-only acquisition mitigates the failure mode of total-variance exploration in noisy and heteroscedastic settings. In matched comparisons, decoupled models usually improve over tuned observation-level baselines, with the clearest gains in HPO; in broader sweeps, a decoupled model obtains the best average rank in both HPO and synthetic BO.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes