LG OCMay 8, 2022

Federated Random Reshuffling with Compression and Variance Reduction

arXiv:2205.03914v212.413 citationsh-index: 67

Originality Incremental advance

AI Analysis

This work addresses efficiency and convergence challenges in federated learning for distributed machine learning systems, representing an incremental improvement over existing methods.

The paper tackles the problem of improving the efficiency and convergence of federated learning with Random Reshuffling by introducing three new algorithms: compressed FedRR and two variance-reduced extensions, which eliminate dependence on compression parameters and reduce variance at the optimum, achieving superior performance on synthetic and real datasets.

Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the compression parameter, and applying additional controlled linear perturbations for Random Reshuffling, introduced by Malinovsky et al.(2021) helps to eliminate variance at the optimum. We provide the first analysis of compressed local methods under standard assumptions without bounded gradient assumptions and for heterogeneous data, overcoming the limitations of the compression operator. We corroborate our theoretical results with experiments on synthetic and real data sets.

View on arXiv PDF

Similar