Stochastic Weight Sharing for Bayesian Neural Networks
This addresses the problem of making BNNs practical for large-scale applications in computer vision, representing a novel method for a known bottleneck rather than a paradigm shift.
The paper tackles the computational burden and convergence issues of Bayesian Neural Networks (BNNs) by introducing a stochastic weight-sharing method that compresses model parameters by 50x and reduces model size by 75, enabling efficient Bayesian training of large-scale models like ResNet-101 and Vision Transformer while maintaining accuracy and uncertainty estimations comparable to state-of-the-art.
While offering a principled framework for uncertainty quantification in deep learning, the employment of Bayesian Neural Networks (BNNs) is still constrained by their increased computational requirements and the convergence difficulties when training very deep, state-of-the-art architectures. In this work, we reinterpret weight-sharing quantization techniques from a stochastic perspective in the context of training and inference with Bayesian Neural Networks (BNNs). Specifically, we leverage 2D adaptive Gaussian distributions, Wasserstein distance estimations, and alpha blending to encode the stochastic behaviour of a BNN in a lower dimensional, soft Gaussian representation. Through extensive empirical investigation, we demonstrate that our approach significantly reduces the computational overhead inherent in Bayesian learning by several orders of magnitude, enabling the efficient Bayesian training of large-scale models, such as ResNet-101 and Vision Transformer (VIT). On various computer vision benchmarks including CIFAR10, CIFAR100, and ImageNet1k. Our approach compresses model parameters by approximately 50x and reduces model size by 75, while achieving accuracy and uncertainty estimations comparable to the state-of-the-art.