Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions
This is an incremental improvement for optimization in machine learning, offering a new algorithmic perspective on the stochastic Newton method.
The paper tackled the problem of minimizing the average of log-convex functions by contextualizing it as sequential Bayesian inference, resulting in a novel optimization algorithm that incorporates historical gradients and Hessians, with conditions established for momentum-like behavior.
To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.