OC IT LG MLSep 23, 2019

Geometry, Computation, and Optimality in Stochastic Optimization

arXiv:1909.10455v411.311 citations

Originality Highly original

AI Analysis

This work provides theoretical insights into when specific optimization methods are optimal, benefiting researchers in machine learning and optimization by clarifying computational trade-offs based on geometric properties.

The paper investigates how problem geometry affects the optimality of stochastic and adaptive gradient methods in optimization, showing that diagonally pre-conditioned stochastic gradient methods are minimax optimal for quadratically convex constraint sets, with sub-optimality increasing as constraints deviate from this geometry, such as for ℓ_p-balls with p < 2.

We study computational and statistical consequences of problem geometry in stochastic and online optimization. By focusing on constraint set and gradient geometry, we characterize the problem families for which stochastic- and adaptive-gradient methods are (minimax) optimal and, conversely, when nonlinear updates -- such as those mirror descent employs -- are necessary for optimal convergence. When the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We provide quantitative converses showing that the ``distance'' of the underlying constraints from quadratic convexity determines the sub-optimality of subgradient methods. These results apply, for example, to any $\ell_p$-ball for $p < 2$, and the computation/accuracy tradeoffs they demonstrate exhibit a striking analogy to those in Gaussian sequence models.

View on arXiv PDF

Similar