Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure
This addresses a central problem in machine learning for optimizing nonsmooth functions, offering a novel solution with proven optimal rates.
The paper tackles the stochastic minimization of nonsmooth convex loss functions, such as in SVMs, by proposing ANSGD, which achieves the optimal O(1/t) convergence rate and significantly outperforms prior subgradient descent methods in empirical tests.
In this work we consider the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We propose a novel algorithm called Accelerated Nonsmooth Stochastic Gradient Descent (ANSGD), which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algorithm that can achieve the optimal O(1/t) rate for minimizing nonsmooth loss functions (with strong convexity). The fast rates are confirmed by empirical comparisons, in which ANSGD significantly outperforms previous subgradient descent algorithms including SGD.