MESTMLDec 12, 2019

Diagnosing model misspecification and performing generalized Bayes' updates via probabilistic classifiers

arXiv:1912.05810v112 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental issue in Bayesian statistics for researchers and practitioners, offering a practical tool for handling model misspecification, though it is incremental as it builds on existing tempering methods.

The paper tackles the problem of model misspecification in Bayesian inference, where posteriors can become overly concentrated on incorrect parameters, by using probabilistic classifiers to estimate the divergence between the model and true data-generating process, enabling diagnostics and generalized Bayesian updates without access to the true model.

Model misspecification is a long-standing enigma of the Bayesian inference framework as posteriors tend to get overly concentrated on ill-informed parameter values towards the large sample limit. Tempering of the likelihood has been established as a safer way to do updates from prior to posterior in the presence of model misspecification. At one extreme tempering can ignore the data altogether and at the other extreme it provides the standard Bayes' update when no misspecification is assumed to be present. However, it is an open issue how to best recognize misspecification and choose a suitable level of tempering without access to the true generating model. Here we show how probabilistic classifiers can be employed to resolve this issue. By training a probabilistic classifier to discriminate between simulated and observed data provides an estimate of the ratio between the model likelihood and the likelihood of the data under the unobserved true generative process, within the discriminatory abilities of the classifier. The expectation of the logarithm of a ratio with respect to the data generating process gives an estimation of the negative Kullback-Leibler divergence between the statistical generative model and the true generative distribution. Using a set of canonical examples we show that this divergence provides a useful misspecification diagnostic, a model comparison tool, and a method to inform a generalised Bayesian update in the presence of misspecification for likelihood-based models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes