Differentiable PAC-Bayes Objectives with Partially Aggregated Neural Networks
This work addresses training difficulties for stochastic neural networks, offering incremental improvements in gradient estimation and bound tightness for researchers in PAC-Bayesian learning.
The paper tackles the challenge of training stochastic neural networks in a PAC-Bayesian setting by introducing partially-aggregated estimators, which enable lower-variance gradient estimates and a directly optimizable differentiable objective with a generalization guarantee that is twice as tight as prior work.
We make three related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC-Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of \emph{partially-aggregated} estimators; (2) we show that these lead to provably lower-variance gradient estimates for non-differentiable signed-output networks; (3) we reformulate a PAC-Bayesian bound for these networks to derive a directly optimisable, differentiable objective and a generalisation guarantee, without using a surrogate loss or loosening the bound. This bound is twice as tight as that of Letarte et al. (2019) on a similar network type. We show empirically that these innovations make training easier and lead to competitive guarantees.