OC LG MLApr 9, 2019

On the Adaptivity of Stochastic Gradient-Based Optimization

arXiv:1904.04480v316.827 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of making optimization algorithms more accessible and effective for practitioners in machine learning by reducing the need for specialized knowledge about problem regimes, though it is incremental in improving upon existing adaptive methods.

The paper tackles the gap between theoretical optimality and practical usability in stochastic gradient optimization by introducing the SCSG method, which adapts to strong convexity and target accuracy without requiring regime-specific tuning, achieving strictly better theoretical complexity than existing adaptive algorithms.

Stochastic-gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas. Despite the progress, the gap between theory and practice remains significant, with theoreticians pursuing mathematical optimality at a cost of obtaining specialized procedures in different regimes (e.g., modulus of strong convexity, magnitude of target accuracy, signal-to-noise ratio), and with practitioners not readily able to know which regime is appropriate to their problem, and seeking broadly applicable algorithms that are reasonably close to optimality. To bridge these perspectives it is necessary to study algorithms that are adaptive to different regimes. We present the stochastically controlled stochastic gradient (SCSG) method for composite convex finite-sum optimization problems and show that SCSG is adaptive to both strong convexity and target accuracy. The adaptivity is achieved by batch variance reduction with adaptive batch sizes and a novel technique, which we referred to as geometrization, which sets the length of each epoch as a geometric random variable. The algorithm achieves strictly better theoretical complexity than other existing adaptive algorithms, while the tuning parameters of the algorithm only depend on the smoothness parameter of the objective.

View on arXiv PDF

Similar