Accelerated Parallel Optimization Methods for Large Scale Machine Learning
This work addresses scalability issues in machine learning optimization for applications with high-dimensional data, though it is incremental as it builds on existing methods like BOOM and Shotgun.
The paper tackles the problem of inefficient optimization for large-scale machine learning by combining parallelism and Nesterov's acceleration to design faster algorithms for L1-regularized loss, resulting in an improved convergence rate from O(1/t) to O(1/t^2) for an accelerated version of Shotgun.
The growing amount of high dimensional data in different machine learning applications requires more efficient and scalable optimization algorithms. In this work, we consider combining two techniques, parallelism and Nesterov's acceleration, to design faster algorithms for L1-regularized loss. We first simplify BOOM, a variant of gradient descent, and study it in a unified framework, which allows us to not only propose a refined measurement of sparsity to improve BOOM, but also show that BOOM is provably slower than FISTA. Moving on to parallel coordinate descent methods, we then propose an efficient accelerated version of Shotgun, improving the convergence rate from $O(1/t)$ to $O(1/t^2)$. Our algorithm enjoys a concise form and analysis compared to previous work, and also allows one to study several connected work in a unified way.