LGMar 28, 2015

A Variance Reduced Stochastic Newton Method

arXiv:1503.08316v452 citations
AI Analysis

This addresses a bottleneck in optimization for machine learning practitioners by enhancing the practical performance of stochastic Quasi-Newton methods, though it is incremental as it builds on existing variance reduction techniques.

The paper tackles the problem of high variance in stochastic Quasi-Newton methods for large-scale convex loss minimization, proposing Vite, a novel algorithm that uses first-order variance reduction to achieve geometric convergence with constant step-size for smooth strongly convex functions, and empirically shows improvements over existing methods.

Quasi-Newton methods are widely used in practise for convex loss minimization problems. These methods exhibit good empirical performance on a wide variety of tasks and enjoy super-linear convergence to the optimal solution. For large-scale learning problems, stochastic Quasi-Newton methods have been recently proposed. However, these typically only achieve sub-linear convergence rates and have not been shown to consistently perform well in practice since noisy Hessian approximations can exacerbate the effect of high-variance stochastic gradient estimates. In this work we propose Vite, a novel stochastic Quasi-Newton algorithm that uses an existing first-order technique to reduce this variance. Without exploiting the specific form of the approximate Hessian, we show that Vite reaches the optimum at a geometric rate with a constant step-size when dealing with smooth strongly convex functions. Empirically, we demonstrate improvements over existing stochastic Quasi-Newton and variance reduced stochastic gradient methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes