Faster Coordinate Descent via Adaptive Importance Sampling
This work addresses optimization efficiency for large-scale machine learning problems, but it is incremental as it builds on existing coordinate descent frameworks.
The paper tackles the problem of improving coordinate descent methods for huge-scale convex optimization by introducing adaptive rules for random selection of updates based on dual residual or primal-dual gap estimates, resulting in demonstrated improvements over state-of-the-art methods.
Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the state-of-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory.