Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization
This work addresses the challenge of reducing energy expenditure during inference for large neural networks, offering a novel method that improves compressibility and accuracy over existing quantization techniques, though it is incremental in advancing weight-sharing approaches.
The paper tackles the problem of weight-sharing quantization in neural networks by proposing a probabilistic framework that uses Bayesian neural networks and variational relaxation to assign weights to cluster centers based on position-specific uncertainty distributions, achieving a 1.6% higher top-1 accuracy on ImageNet with DeiT-Tiny while reducing its over 5 million weights to only 296 unique values.
Weight-sharing quantization has emerged as a technique to reduce energy expenditure during inference in large neural networks by constraining their weights to a limited set of values. However, existing methods for weight-sharing quantization often make assumptions about the treatment of weights based on value alone that neglect the unique role weight position plays. This paper proposes a probabilistic framework based on Bayesian neural networks (BNNs) and a variational relaxation to identify which weights can be moved to which cluster centre and to what degree based on their individual position-specific learned uncertainty distributions. We introduce a new initialisation setting and a regularisation term which allow for the training of BNNs under complex dataset-model combinations. By leveraging the flexibility of weight values captured through a probability distribution, we enhance noise resilience and downstream compressibility. Our iterative clustering procedure demonstrates superior compressibility and higher accuracy compared to state-of-the-art methods on both ResNet models and the more complex transformer-based architectures. In particular, our method outperforms the state-of-the-art quantization method top-1 accuracy by 1.6% on ImageNet using DeiT-Tiny, with its 5 million+ weights now represented by only 296 unique values.