LGNAOCMay 17, 2024

Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance

arXiv:2405.11095v1h-index: 6
Originality Incremental advance
AI Analysis

This work addresses communication bottlenecks in distributed optimization for machine learning, offering a novel compression technique that is incremental but improves upon existing methods by controlling variance and handling sparse gradients.

The paper tackles the problem of high communication costs in distributed stochastic gradient descent by proposing FO-SGD, a compressed gradient method that uses one-bit quantization and a randomized transform to flatten gradients, achieving SGD-like convergence guarantees under mild conditions.

We propose a novel algorithm for distributed stochastic gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit stochastic gradient descent (FO-SGD), relies on two simple algorithmic ideas: (i) a one-bit quantization procedure leveraging the technique of dithering, and (ii) a randomized fast Walsh-Hadamard transform to flatten the stochastic gradient before quantization. As a result, the approximation of the true gradient in this scheme is biased, but it prevents commonly encountered algorithmic problems, such as exploding variance in the one-bit compression regime, deterioration of performance in the case of sparse gradients, and restrictive assumptions on the distribution of the stochastic gradients. In fact, we show SGD-like convergence guarantees under mild conditions. The compression technique can be used in both directions of worker-server communication, therefore admitting distributed optimization with full communication compression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes