MLLGAug 6, 2024

Exchangeable Sequence Models Quantify Uncertainty Over Latent Concepts

arXiv:2408.03307v310 citationsh-index: 20
Originality Incremental advance
AI Analysis

This work provides a method for agents to articulate uncertainty, which is incremental as it adapts existing sequence models to a Bayesian framework.

The paper tackles the problem of quantifying uncertainty over latent concepts in intelligent agents by showing that pre-trained sequence models can perform probabilistic reasoning over exchangeable data points, forming and sharpening beliefs as more information is gathered, with the sequence prediction loss controlling uncertainty quality.

Intelligent agents must be able to articulate its own uncertainty. In this work, we show that pre-trained sequence models are naturally capable of probabilistic reasoning over exchangeable data points -- forming informed beliefs and sharpening them as it gathers more information. A sequence model learns the relationship between observations, which differs from typical Bayesian models that quantify uncertainty over latent parameters through priors and likelihoods (e.g., topic models). Despite the apparent difference, we illustrate how exchangeable sequence modeling provides a valid Bayesian model by going back to De Finetti's classical predictive view of probabilistic reasoning: uncertainty comes from data that has not been observed yet, rather than latent parameters. From this perspective, pre-training autoregressive models is equivalent to formulating informed beliefs based on prior observations ("empirical Bayes"), and forward generation is equivalent to simulating instantiations of an environment ("posterior inference"). In particular, exchangeable sequence models can explicitly perform statistical inference; epistemic uncertainty over latent environments is captured by variation in predicted future observations. Formally, we show the sequence prediction loss controls the quality of uncertainty quantification, and propose several approaches for encoding exchangeability in sequence model architectures: data augmentation, regularization, and causal masking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes