LGNAOCMLOct 18, 2016

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

arXiv:1610.05792v463 citations
Originality Incremental advance
AI Analysis

This incremental improvement addresses the challenge of adaptive stepsize selection and automatic stopping in optimization for machine learning practitioners.

The paper tackles the problem of noisy gradient approximations in stochastic gradient descent (SGD) by proposing 'big batch' SGD schemes that adaptively increase batch size to maintain a constant signal-to-noise ratio, resulting in automated learning rate selection and no need for stepsize decay.

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative "big batch" SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes