LGAIMLFeb 10, 2025

Amortized In-Context Bayesian Posterior Estimation

arXiv:2502.06601v114 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the computational inefficiency of re-running Bayesian inference for new data, which is a problem for researchers and practitioners in fields like simulation-based inference, but it is incremental as it builds on existing amortization and in-context learning methods.

The paper conducted a comparative analysis of amortized in-context Bayesian posterior estimation methods, finding that the reverse KL estimator combined with transformer architectures and normalizing flows performed best, particularly for predictive tasks.

Bayesian inference provides a natural way of incorporating prior beliefs and assigning a probability measure to the space of hypotheses. Current solutions rely on iterative routines like Markov Chain Monte Carlo (MCMC) sampling and Variational Inference (VI), which need to be re-run whenever new observations are available. Amortization, through conditional estimation, is a viable strategy to alleviate such difficulties and has been the guiding principle behind simulation-based inference, neural processes and in-context methods using pre-trained models. In this work, we conduct a thorough comparative analysis of amortized in-context Bayesian posterior estimation methods from the lens of different optimization objectives and architectural choices. Such methods train an amortized estimator to perform posterior parameter inference by conditioning on a set of data examples passed as context to a sequence model such as a transformer. In contrast to language models, we leverage permutation invariant architectures as the true posterior is invariant to the ordering of context examples. Our empirical study includes generalization to out-of-distribution tasks, cases where the assumed underlying model is misspecified, and transfer from simulated to real problems. Subsequently, it highlights the superiority of the reverse KL estimator for predictive problems, especially when combined with the transformer architecture and normalizing flows.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes