MEAILGSPSTFeb 13, 2024

Perturbative partial moment matching and gradient-flow adaptive importance sampling transformations for Bayesian leave one out cross-validation

arXiv:2402.08151v31 citationsh-index: 31
AI Analysis

This work addresses a computational bottleneck for Bayesian modelers by providing efficient methods for cross-validation, though it is incremental as it builds on existing transformation techniques.

The paper tackles the problem of unstable importance sampling weights in Bayesian leave-one-out cross-validation by introducing perturbative transformations, specifically partial moment matching and gradient flow methods, which successfully stabilize the weights without requiring model refitting.

Importance sampling (IS) allows one to approximate leave one out (LOO) cross-validation for a Bayesian model, without refitting, by inverting the Bayesian update equation to subtract a given data point from a model posterior. For each data point, one computes expectations under the corresponding LOO posterior by weighted averaging over the full data posterior. This task sometimes requires weight stabilization in the form of adapting the posterior distribution via transformation. So long as one is successful in finding a suitable transformation, one avoids refitting. To this end, we motivate the use of bijective perturbative transformations of the form $T(\boldsymbolθ)=\boldsymbolθ + h Q(\boldsymbolθ),$ for $0<h\ll 1,$ and introduce two classes of such transformations: 1) partial moment matching and 2) gradient flow evolution. The former extends prior literature on moment-matching under the recognition that adaptation for LOO is a small perturbation on the full data posterior. The latter class of methods define transformations based on relaxing various statistical objectives: in our case the variance of the IS estimator and the KL divergence between the transformed distribution and the statistics of the LOO fold. Being model-specific, the gradient flow transformations require evaluating Jacobian determinants. While these quantities are generally readily available through auto-differentiation, we derive closed-form expressions in the case of logistic regression and shallow ReLU activated neural networks. We tested the methodology on an $n\ll p$ dataset that is known to produce unstable LOO IS weights.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes