LG IT SPSep 23, 2024

Novel Gradient Sparsification Algorithm via Bayesian Inference

Ali Bereyhi, Ben Liang, Gary Boudreau, Ali Afana

arXiv:2409.14893v16.46 citationsh-index: 16

Originality Incremental advance

AI Analysis

This work addresses convergence issues in distributed machine learning for practitioners, but it is incremental as it builds on existing Top-k sparsification methods.

The paper tackles the problem of error accumulation in Top-k gradient sparsification for distributed gradient descent, which can deteriorate convergence, by proposing a novel algorithm called RegTop-k that controls learning rate scaling; numerical experiments show it achieves about 8% higher accuracy than standard Top-k at 0.1% sparsification on ResNet-18 with CIFAR-10.

Error accumulation is an essential component of the Top-$k$ sparsification method in distributed gradient descent. It implicitly scales the learning rate and prevents the slow-down of lateral movement, but it can also deteriorate convergence. This paper proposes a novel sparsification algorithm called regularized Top-$k$ (RegTop-$k$) that controls the learning rate scaling of error accumulation. The algorithm is developed by looking at the gradient sparsification as an inference problem and determining a Bayesian optimal sparsification mask via maximum-a-posteriori estimation. It utilizes past aggregated gradients to evaluate posterior statistics, based on which it prioritizes the local gradient entries. Numerical experiments with ResNet-18 on CIFAR-10 show that at $0.1\%$ sparsification, RegTop-$k$ achieves about $8\%$ higher accuracy than standard Top-$k$.

View on arXiv PDF

Similar