OC DC NA MLDec 20, 2013

Accelerated, Parallel and Proximal Coordinate Descent

arXiv:1312.5799v2389 citations

Originality Highly original

AI Analysis

This addresses efficiency bottlenecks in large-scale optimization for machine learning and data analysis, offering a novel method with practical speed-ups.

The paper tackles the problem of minimizing sums of convex functions with coordinate-wise dependencies by proposing APPROX, a stochastic coordinate descent method that is accelerated, parallel, and proximal, achieving a convergence rate of 2ω̄L̄R²/(k+1)² when processors match coordinates.

We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\barω\bar{L} R^2/(k+1)^2 $, where $k$ is the iteration counter, $\barω$ is an average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of existing accelerated coordinate descent methods. The fact that the method depends on the average degree of separability, and not on the maximum degree of separability, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel stochastic coordinate descent algorithms based on the concept of ESO.

View on arXiv PDF

Similar