OCLGDec 30, 2021

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

arXiv:2112.14872v11 citations
Originality Incremental advance
AI Analysis

This provides a theoretical improvement for optimization methods in machine learning, though it appears incremental as it builds on existing adaptive variants like Adagrad and Adam.

The paper tackled the problem of establishing a fast convergence rate for stochastic gradient descent with adaptive step size, achieving local quadratic convergence for tasks like matrix inversion.

Establishing a fast rate of convergence for optimization methods is crucial to their applicability in practice. With the increasing popularity of deep learning over the past decade, stochastic gradient descent and its adaptive variants (e.g. Adagrad, Adam, etc.) have become prominent methods of choice for machine learning practitioners. While a large number of works have demonstrated that these first order optimization methods can achieve sub-linear or linear convergence, we establish local quadratic convergence for stochastic gradient descent with adaptive step size for problems such as matrix inversion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes