On Optimal Probabilities in Stochastic Coordinate Descent Methods
This work addresses efficiency improvements in optimization algorithms for machine learning and computational mathematics, though it is incremental as it builds on existing coordinate descent methods.
The paper tackles the problem of optimizing coordinate selection probabilities in stochastic coordinate descent methods, showing that non-uniform selection can reduce iteration count by an order of magnitude compared to uniform or full-update strategies.
We propose and analyze a new parallel coordinate descent method---`NSync---in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iteration---with optimal probabilities---may require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.