LG MLJul 18, 2018

Newton-ADMM: A Distributed GPU-Accelerated Optimizer for Multiclass Classification Problems

Chih-Hao Fang, Sudhir B Kylasa, Fred Roosta, Michael W. Mahoney, Ananth Grama

arXiv:1807.07132v33.54 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient distributed optimization in machine learning, particularly for classification tasks, by reducing communication overhead and leveraging GPU acceleration, though it is incremental as it combines existing methods in a novel way.

The paper tackles the problem of slow convergence and high communication costs in distributed optimization for multiclass classification by proposing Newton-ADMM, a distributed GPU-accelerated optimizer that integrates Newton-type methods with ADMM, resulting in better generalization performance, significantly faster distributed time to solution, and improved scaling on large platforms.

First-order optimization methods, such as stochastic gradient descent (SGD) and its variants, are widely used in machine learning applications due to their simplicity and low per-iteration costs. However, they often require larger numbers of iterations, with associated communication costs in distributed environments. In contrast, Newton-type methods, while having higher per-iteration costs, typically require a significantly smaller number of iterations, which directly translates to reduced communication costs. In this paper, we present a novel distributed optimizer for classification problems, which integrates a GPU-accelerated Newton-type solver with the global consensus formulation of Alternating Direction of Method Multipliers (ADMM). By leveraging the communication efficiency of ADMM, GPU-accelerated inexact-Newton solver, and an effective spectral penalty parameter selection strategy, we show that our proposed method (i) yields better generalization performance on several classification problems; (ii) significantly outperforms state-of-the-art methods in distributed time to solution; and (iii) offers better scaling on large distributed platforms.

View on arXiv PDF Code

Similar