Self-Compression in Bayesian Neural Networks
This addresses the challenge of deploying efficient models on edge devices, though it appears incremental as it builds on existing Bayesian methods for compression.
The paper tackles the problem of high computation and storage costs in machine learning models for edge deployment by proposing that Bayesian neural networks can automatically identify redundant parameters for compression, achieving the same accuracy with reduced memory usage.
Machine learning models have achieved human-level performance on various tasks. This success comes at a high cost of computation and storage overhead, which makes machine learning algorithms difficult to deploy on edge devices. Typically, one has to partially sacrifice accuracy in favor of an increased performance quantified in terms of reduced memory usage and energy consumption. Current methods compress the networks by reducing the precision of the parameters or by eliminating redundant ones. In this paper, we propose a new insight into network compression through the Bayesian framework. We show that Bayesian neural networks automatically discover redundancy in model parameters, thus enabling self-compression, which is linked to the propagation of uncertainty through the layers of the network. Our experimental results show that the network architecture can be successfully compressed by deleting parameters identified by the network itself while retaining the same level of accuracy.