PFLGMLMay 2, 2019

On Linear Learning with Manycore Processors

arXiv:1905.00626v6
Originality Incremental advance
AI Analysis

This work addresses performance bottlenecks in machine learning training for users of manycore processors, offering significant speedups but is incremental in optimizing existing methods.

The paper tackles efficient training of generalized linear models on manycore processors by proposing a novel parallelism approach called Heterogeneous Tasks on Homogeneous Cores (HTHC, and demonstrates an order of magnitude improvement in state-of-the-art for Lasso on dense data.

A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator. In this paper we target the efficient training of generalized linear models on these machines. We propose a novel approach for achieving parallelism which we call Heterogeneous Tasks on Homogeneous Cores (HTHC). It divides the problem into multiple fundamentally different tasks, which themselves are parallelized. For evaluation, we design a detailed, architecture-cognizant implementation of our scheme on a recent 72-core Knights Landing processor that is adaptive to the cache, memory, and core structure. Our library efficiently supports dense and sparse datasets as well as 4-bit quantized data for further possible gains in performance. We show benchmarks for Lasso and SVM with different data sets against straightforward parallel implementations and prior software. In particular, for Lasso on dense data, we improve the state-of-the-art by an order of magnitude.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes