LGAIOCOct 1, 2021

Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

arXiv:2110.00625v1
Originality Incremental advance
AI Analysis

This work addresses distributed training efficiency for nonconvex optimization, likely benefiting machine learning practitioners, but it appears incremental as it builds on existing momentum and model averaging methods.

The paper tackles the problem of accelerating distributed stochastic descent for nonconvex optimization by introducing a momentum method called block momentum, which applies momentum at the global learner level in model averaging approaches, and experimental results show it accelerates training and achieves better results.

Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes