LGMay 8

Direct Bethe Free Energy Minimization for Bayesian Neural Ne twork

arXiv:2605.084463.2

AI Analysis

For practitioners of Bayesian deep learning, this work offers a principled alternative to variational inference that eliminates the structural Jensen gap and enables single-pass empirical Bayes, though the improvements are incremental and limited to last-layer Gaussian approximations.

The paper proposes training Bayesian neural networks by directly minimizing the Bethe free energy, which yields analytically tractable losses for last-layer Gaussian posteriors and removes the Jensen gap inherent in variational methods. On 8 UCI regression and 12 UCI classification benchmarks, the method is competitive with standard reference methods at single-pass cost, and joint empirical Bayes matches grid-search cross-validation of prior precision on nearly all datasets.

We propose training Bayesian neural networks by directly minimizing the Bethe free energy rather than maximizing a variational lower bound. On tree-structured factor graphs the Bethe free energy is exact; deterministic layers drop out of the objective and are trained by standard backpropagation, so the framework accommodates any mixture of probabilistic and deterministic subgraphs without modification. Restricting the weight posterior to a last-layer Gaussian yields analytically tractable losses: for a Gaussian likelihood the Bethe loss equals the exact marginal likelihood, and for a probit likelihood it reduces to a closed form via the probit-Gaussian convolution. Both objectives sit strictly between MAP and the ELBO ($L_\text{MAP} \leq L_\text{Bethe} \leq L_\text{ELBO}$), removing the structural Jensen gap that no choice of variational family can close. The Z-consistent prior formulation makes the prior precision a differentiable parameter, enabling empirical Bayes - joint optimization of weights, covariance, and hyperparameters - in a single gradient pass, with no cross-validation or outer loop. All variants admit a closed-form predictive at MAP-equivalent inference cost, in contrast to ensemble and sampling-based methods. On 8 UCI regression and 12 UCI classification benchmarks evaluated under a single shared hyperparameter regime, Bethe is competitive with standard reference methods at single-pass cost. Independently, joint single-pass empirical Bayes matches grid-search cross-validation of the prior precision on essentially all dataset-variant combinations, eliminating the outer hyperparameter loop without measurable cost. Isolated optimization gaps on a few datasets reflect numerical rather than principled limitations of the framework.

View on arXiv PDF

Similar