A Principled Bayesian Framework for Training Binary and Spiking Neural Networks
This work addresses the challenge of robust and efficient training for binary and spiking neural networks, which are important for low-power AI applications, but it appears incremental as it builds on existing estimators and variational inference methods.
The authors tackled the problem of training binary and spiking neural networks, which are often heuristic and sensitive to hyperparameters, by proposing a Bayesian framework that achieves state-of-the-art performance without normalisation layers, as demonstrated on datasets like CIFAR-10, DVS Gesture, and SHD.
We propose a Bayesian framework for training binary and spiking neural networks that achieves state-of-the-art performance without normalisation layers. Unlike commonly used surrogate gradient methods -- often heuristic and sensitive to hyperparameter choices -- our approach is grounded in a probabilistic model of noisy binary networks, enabling fully end-to-end gradient-based optimisation. We introduce importance-weighted straight-through (IW-ST) estimators, a unified class generalising straight-through and relaxation-based estimators. We characterise the bias-variance trade-off in this family and derive a bias-minimising objective implemented via an auxiliary loss. Building on this, we introduce Spiking Bayesian Neural Networks (SBNNs), a variational inference framework that uses posterior noise to train Binary and Spiking Neural Networks with IW-ST. This Bayesian approach minimises gradient bias, regularises parameters, and introduces dropout-like noise. By linking low-bias conditions, vanishing gradients, and the KL term, we enable training of deep residual networks without normalisation. Experiments on CIFAR-10, DVS Gesture, and SHD show our method matches or exceeds existing approaches without normalisation or hand-tuned gradients.