MLLGJul 9, 2021

The Bayesian Learning Rule

arXiv:2107.04562v4118 citations
AI Analysis

This provides a foundational unification of diverse ML algorithms, potentially benefiting researchers and practitioners across optimization, deep learning, and graphical models.

The authors unified many machine learning algorithms under a single Bayesian learning rule derived from Bayesian principles, showing that classical and modern algorithms like ridge regression, Newton's method, stochastic-gradient descent, and Dropout are specific instances of this rule, which helps generalize and design new algorithms.

We show that many machine-learning algorithms are specific instances of a single algorithm called the \emph{Bayesian learning rule}. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes