MLLGOCApr 27, 2021

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

arXiv:2104.12949v3
AI Analysis

This is an incremental improvement for optimization in machine learning, offering a new algorithmic perspective on the stochastic Newton method.

The paper tackled the problem of minimizing the average of log-convex functions by contextualizing it as sequential Bayesian inference, resulting in a novel optimization algorithm that incorporates historical gradients and Hessians, with conditions established for momentum-like behavior.

To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes