MLLGOCCOFeb 1, 2017

On SGD's Failure in Practice: Characterizing and Overcoming Stalling

arXiv:1702.00317v2
AI Analysis

This addresses a crippling and generic limitation of SGD and its variants in practice, making them more practical for minimization tasks.

The paper tackles the problem of SGD stalling before reaching the empirical risk minimizer, even in simple linear regression with unity condition number, and proposes a generalized framework that prevents stalling while maintaining convergence guarantees.

Stochastic Gradient Descent (SGD) is widely used in machine learning problems to efficiently perform empirical risk minimization, yet, in practice, SGD is known to stall before reaching the actual minimizer of the empirical risk. SGD stalling has often been attributed to its sensitivity to the conditioning of the problem; however, as we demonstrate, SGD will stall even when applied to a simple linear regression problem with unity condition number for standard learning rates. Thus, in this work, we numerically demonstrate and mathematically argue that stalling is a crippling and generic limitation of SGD and its variants in practice. Once we have established the problem of stalling, we generalize an existing framework for hedging against its effects, which (1) deters SGD and its variants from stalling, (2) still provides convergence guarantees, and (3) makes SGD and its variants more practical methods for minimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes