LGOCMLJul 1, 2020

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

arXiv:2007.00534v119 citations
AI Analysis

This addresses convergence inefficiencies in optimization for machine learning practitioners, though it is incremental as it builds on prior diagnostic methods.

The paper tackled the problem of inefficient step-size scheduling in Stochastic Gradient Descent by proposing a new statistical procedure to detect the transition from transient to stationary phases, achieving state-of-the-art performance on synthetic and real-world datasets.

Constant step-size Stochastic Gradient Descent exhibits two phases: a transient phase during which iterates make fast progress towards the optimum, followed by a stationary phase during which iterates oscillate around the optimal point. In this paper, we show that efficiently detecting this transition and appropriately decreasing the step size can lead to fast convergence rates. We analyse the classical statistical test proposed by Pflug (1983), based on the inner product between consecutive stochastic gradients. Even in the simple case where the objective function is quadratic we show that this test cannot lead to an adequate convergence diagnostic. We then propose a novel and simple statistical procedure that accurately detects stationarity and we provide experimental results showing state-of-the-art performance on synthetic and real-world datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes