OCDSLGMLJan 9, 2019

The Lingering of Gradients: Theory and Applications

arXiv:1901.02871v24 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency improvements in optimization algorithms for machine learning practitioners, offering incremental theoretical and practical gains.

The paper tackles the problem of refining time complexity analysis for first-order methods by accounting for gradient 'lingering', where subsequent gradient computations are faster after an initial one, leading to improved convergence rates such as from 1/T to exp(-T^{1/3}) for gradient descent. It demonstrates applications in revenue management with 4.6m users achieving 10^{-6} error in 6 dataset passes and in SVM problems with two orders of magnitude better accuracy than state-of-the-art.

Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the `lingering' of gradients: once a gradient is computed at $x_k$, the additional time to compute gradients at $x_{k+1},x_{k+2},\dots$ may be reduced. We show how this improves the running time of several first-order methods. For instance, if the `additional time' scales linearly with respect to the traveled distance, then the `convergence rate' of gradient descent can be improved from $1/T$ to $\exp(-T^{1/3})$. On the application side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module with 4.6m users to $10^{-6}$ error using only 6 passes of the dataset; and solve a real-life support vector machine problem to an accuracy that is two orders of magnitude better comparing to the state-of-the-art algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes