OCLGMay 29, 2019

Accelerated Sparsified SGD with Error Feedback

arXiv:1905.12224v23 citations
Originality Incremental advance
AI Analysis

This work addresses communication bottlenecks in distributed optimization for machine learning practitioners, offering an incremental improvement over existing sparsified SGD methods.

The paper tackles the problem of slower early convergence in sparsified SGD with error feedback due to compression errors, which is problematic for early stopping in practice, and proposes an accelerated version using Nesterov's method that reduces communication cost while maintaining convergence rates comparable to vanilla SGD, as validated by numerical experiments.

A stochastic gradient method for synchronous distributed optimization is studied. For reducing communication cost, we particularly focus on utilization of compression of communicated gradients. Several work has shown that {\it{sparsified}} stochastic gradient descent method (SGD) with {\it{error feedback}} asymptotically achieves the same rate as (non-sparsified) parallel SGD. However, from a viewpoint of non-asymptotic behavior, the compression error may cause slower convergence than non-sparsified SGD in early iterations. This is problematic in practical situations since early stopping is often adopted to maximize the generalization ability of learned models. For improving the previous results, we propose and theoretically analyse a sparsified stochastic gradient method with error feedback scheme combined with {\it{Nesterov's acceleration}}. It is shown that the necessary per iteration communication cost for maintaining the same rate as vanilla SGD can be smaller than non-accelerated methods in convex and even in nonconvex optimization problems. This indicates that our proposed method makes a better use of compressed information than previous methods. Numerical experiments are provided and empirically validates our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes