LGMLMay 22, 2017

Parallel Stochastic Gradient Descent with Sound Combiners

arXiv:1705.08030v111 citations
AI Analysis

This addresses the scalability issue for machine learning practitioners using SGD on large datasets, though it is incremental as it builds on prior parallelization methods.

The paper tackles the problem of parallelizing stochastic gradient descent (SGD) while maintaining sequential semantics to avoid poor convergence, proposing SYMSGD, which achieves up to 11x speedup over a sequential baseline on 16 cores and is 2.2x faster on average than HOGWILD!.

Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from the previous examples. Prior approaches to parallelizing linear learners using SGD, such as HOGWILD! and ALLREDUCE, do not honor these dependencies across threads and thus can potentially suffer poor convergence rates and/or poor scalability. This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model in addition to a model combiner, which allows local models to be combined to produce the same result as what a sequential SGD would have produced. This paper evaluates SYMSGD's accuracy and performance on 6 datasets on a shared-memory machine shows upto 11x speedup over our heavily optimized sequential baseline on 16 cores and 2.2x, on average, faster than HOGWILD!.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes