LGNEJan 17, 2021

Guided parallelized stochastic gradient descent for delay compensation

arXiv:2101.07259v128 citations
Originality Incremental advance
AI Analysis

This work addresses delay compensation in parallel SGD for deep learning, offering an incremental improvement over existing methods.

The paper tackles the high variance caused by delay in parallel SGD algorithms for training deep neural networks by proposing guided SGD (gSGD) to compensate for this delay, achieving classification accuracy close to sequential SGD on some benchmark datasets.

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate for the delay. The experimental results demonstrate that our proposed approach has been able to mitigate the impact of delay for the quality of classification accuracy. The guided approach with SSGD clearly outperforms sequential SGD and even achieves the accuracy close to sequential SGD for some benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes