Masked Bayesian Neural Networks : Computation and Optimality
This work addresses the need to reduce complexity in large DNNs for practitioners, though it appears incremental as it builds on existing Bayesian neural network methods.
The authors tackled the problem of simplifying complex deep neural networks by proposing a novel sparse Bayesian neural network that uses masking variables to achieve node-wise sparsity, resulting in well-condensed architectures with similar prediction accuracy and uncertainty quantification compared to large DNNs on benchmark datasets.
As data size and computing power increase, the architectures of deep neural networks (DNNs) have been getting more complex and huge, and thus there is a growing need to simplify such complex and huge DNNs. In this paper, we propose a novel sparse Bayesian neural network (BNN) which searches a good DNN with an appropriate complexity. We employ the masking variables at each node which can turn off some nodes according to the posterior distribution to yield a nodewise sparse DNN. We devise a prior distribution such that the posterior distribution has theoretical optimalities (i.e. minimax optimality and adaptiveness), and develop an efficient MCMC algorithm. By analyzing several benchmark datasets, we illustrate that the proposed BNN performs well compared to other existing methods in the sense that it discovers well condensed DNN architectures with similar prediction accuracy and uncertainty quantification compared to large DNNs.