ML LG ST COMay 20, 2024

General bounds on the quality of Bayesian coresets

arXiv:2405.11780v27.53 citationsh-index: 12NIPS

Originality Incremental advance

AI Analysis

This work addresses the theoretical gap for Bayesian coresets, which are used to speed up posterior inference in large-scale data, but it is incremental as it extends existing analysis rather than introducing a new paradigm.

The authors tackled the problem of limited theoretical analysis for Bayesian coresets by deriving general upper and lower bounds on the KL divergence of coreset approximations, applicable to a wide range of models beyond restrictive settings like exponential families. They used these bounds to explain poor performance of importance sampling methods and analyze subsample-optimize methods, demonstrating flexibility in experiments with multimodal, unidentifiable, and heavy-tailed distributions.

Bayesian coresets speed up posterior inference in the large-scale data regime by approximating the full-data log-likelihood function with a surrogate log-likelihood based on a small, weighted subset of the data. But while Bayesian coresets and methods for construction are applicable in a wide range of models, existing theoretical analysis of the posterior inferential error incurred by coreset approximations only apply in restrictive settings -- i.e., exponential family models, or models with strong log-concavity and smoothness assumptions. This work presents general upper and lower bounds on the Kullback-Leibler (KL) divergence of coreset approximations that reflect the full range of applicability of Bayesian coresets. The lower bounds require only mild model assumptions typical of Bayesian asymptotic analyses, while the upper bounds require the log-likelihood functions to satisfy a generalized subexponentiality criterion that is weaker than conditions used in earlier work. The lower bounds are applied to obtain fundamental limitations on the quality of coreset approximations, and to provide a theoretical explanation for the previously-observed poor empirical performance of importance sampling-based construction methods. The upper bounds are used to analyze the performance of recent subsample-optimize methods. The flexibility of the theory is demonstrated in validation experiments involving multimodal, unidentifiable, heavy-tailed Bayesian posterior distributions.

View on arXiv PDF

Similar