Stochastic Dual Coordinate Ascent with Adaptive Probabilities
This work addresses optimization efficiency for machine learning practitioners, but it is incremental as it builds on existing SDCA methods.
The paper tackles the problem of solving regularized empirical risk minimization by introducing AdaSDCA, an adaptive variant of stochastic dual coordinate ascent that changes probability distributions over dual variables, achieving a provably better complexity bound than fixed-distribution methods, with a practical variant AdaSDCA+ outperforming non-adaptive methods in experiments.
This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed probability distribution, known as importance sampling. However, it is of a theoretical character as it is expensive to implement. We also propose AdaSDCA+: a practical variant which in our experiments outperforms existing non-adaptive methods.