ML AI LGDec 31, 2023

A Compact Representation for Bayesian Neural Networks By Removing Permutation Symmetry

arXiv:2401.00611v19.26 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work improves interpretability and utility of BNNs for safety-critical applications by resolving a known bottleneck in sampling methods, though it is incremental as it builds on existing rebasin techniques.

The paper tackles the problem of interpreting Bayesian neural network (BNN) samples from Hamiltonian Monte Carlo (HMC) by addressing permutation symmetry, which makes summary statistics like mean and variance meaningless; it introduces a compact representation using a transpositions metric and rebasin method to provide meaningful uncertainty estimates for each weight, enabling direct comparison across inference methods and efficient pruning of non-Bayesian networks.

Bayesian neural networks (BNNs) are a principled approach to modeling predictive uncertainties in deep learning, which are important in safety-critical applications. Since exact Bayesian inference over the weights in a BNN is intractable, various approximate inference methods exist, among which sampling methods such as Hamiltonian Monte Carlo (HMC) are often considered the gold standard. While HMC provides high-quality samples, it lacks interpretable summary statistics because its sample mean and variance is meaningless in neural networks due to permutation symmetry. In this paper, we first show that the role of permutations can be meaningfully quantified by a number of transpositions metric. We then show that the recently proposed rebasin method allows us to summarize HMC samples into a compact representation that provides a meaningful explicit uncertainty estimate for each weight in a neural network, thus unifying sampling methods with variational inference. We show that this compact representation allows us to compare trained BNNs directly in weight space across sampling methods and variational inference, and to efficiently prune neural networks trained without explicit Bayesian frameworks by exploiting uncertainty estimates from HMC.

View on arXiv PDF Code

Similar