LGMLFeb 25, 2020

Training Binary Neural Networks using the Bayesian Learning Rule

arXiv:2002.10778v447 citations
AI Analysis

This work provides a principled justification for existing methods in training binary neural networks, which are computation-efficient and hardware-friendly, addressing a domain-specific problem in machine learning.

The paper tackles the challenge of training binary neural networks, which involve discrete optimization, by proposing a principled approach using the Bayesian learning rule, resulting in state-of-the-art performance and enabling uncertainty estimation for continual learning to avoid catastrophic forgetting.

Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem. Surprisingly, ignoring the discrete nature of the problem and using gradient-based methods, such as the Straight-Through Estimator, still works well in practice. This raises the question: are there principled approaches which justify such methods? In this paper, we propose such an approach using the Bayesian learning rule. The rule, when applied to estimate a Bernoulli distribution over the binary weights, results in an algorithm which justifies some of the algorithmic choices made by the previous approaches. The algorithm not only obtains state-of-the-art performance, but also enables uncertainty estimation for continual learning to avoid catastrophic forgetting. Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes