LG MLDec 20, 2021

Transformers Can Do Bayesian Inference

Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, Frank Hutter

arXiv:2112.10510v738.7328 citationsHas Code

Originality Highly original

AI Analysis

This provides a general and efficient framework for Bayesian inference in machine learning, enabling uncertainty quantification and prior specification with deep learning, though it builds incrementally on existing in-context learning techniques.

The paper tackles the challenge of applying deep learning to Bayesian methods by introducing Prior-Data Fitted Networks (PFNs), which use in-context learning to approximate posteriors from prior distributions over tasks, achieving over 200-fold speedups in multiple setups compared to current methods while demonstrating strong results in diverse areas like Gaussian process regression and few-shot image classification.

Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.

View on arXiv PDF Code

Similar