LGAug 5, 2016

Communication-Efficient Parallel Block Minimization for Kernel Machines

arXiv:1608.02010v12 citations
Originality Highly original
AI Analysis

This addresses the problem of slow kernel machine training for researchers and practitioners dealing with large-scale datasets, representing a strong specific gain rather than a foundational advancement.

The paper tackles the computational challenge of speeding up kernel machines by developing a parallel block minimization framework, achieving a significant improvement over existing parallel solvers, such as obtaining 96% accuracy in 20 seconds on a dataset where others require over 2000 seconds for 95% accuracy.

Kernel machines often yield superior predictive performance on various tasks; however, they suffer from severe computational challenges. In this paper, we show how to overcome the important challenge of speeding up kernel machines. In particular, we develop a parallel block minimization framework for solving kernel machines, including kernel SVM and kernel logistic regression. Our framework proceeds by dividing the problem into smaller subproblems by forming a block-diagonal approximation of the Hessian matrix. The subproblems are then solved approximately in parallel. After that, a communication efficient line search procedure is developed to ensure sufficient reduction of the objective function value at each iteration. We prove global linear convergence rate of the proposed method with a wide class of subproblem solvers, and our analysis covers strongly convex and some non-strongly convex functions. We apply our algorithm to solve large-scale kernel SVM problems on distributed systems, and show a significant improvement over existing parallel solvers. As an example, on the covtype dataset with half-a-million samples, our algorithm can obtain an approximate solution with 96% accuracy in 20 seconds using 32 machines, while all the other parallel kernel SVM solvers require more than 2000 seconds to achieve a solution with 95% accuracy. Moreover, our algorithm can scale to very large data sets, such as the kdd algebra dataset with 8 million samples and 20 million features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes