LGFeb 20, 2022

Personalized Federated Learning with Exact Stochastic Gradient Descent

Sotirios Nikoloutsopoulos, Iordanis Koutsopoulos, Michalis K. Titsias

arXiv:2202.09848v26.99 citations

Originality Incremental advance

AI Analysis

This work addresses energy efficiency in federated learning for mobile devices, though it appears incremental as it builds on existing personalized federated learning methods.

The paper tackles the problem of personalized federated learning by proposing an SGD-type algorithm that reduces per-client computational cost for energy-limited mobile regimes, achieving convergence with a rate of O(1/√T) and showing superior performance in multi-class classification datasets compared to baselines like FedAvg and FedPer.

We propose a Stochastic Gradient Descent (SGD)-type algorithm for Personalized Federated Learning which can be particularly attractive for mobile energy-limited regimes due to its low per-client computational cost. The model to be trained includes a set of common weights for all clients, and a set of personalized weights that are specific to each client. At each optimization round, randomly selected clients perform multiple full gradient-descent updates over their client-specific weights towards optimizing the loss function on their own datasets, without updating the common weights. This procedure is energy-efficient since it has low computational cost per client. At the final update of each round, each client computes the joint gradient over both the client-specific and the common weights and returns the gradient of common weights to the server, which allows to perform an exact SGD step over the full set of weights in a distributed manner. For the overall optimization scheme, we rigorously prove convergence, even in non-convex settings such as those encountered when training neural networks, with a rate of $\mathcal{O} \left (\frac{1}{\sqrt{T}} \right )$ with respect to communication rounds $T$. In practice, PFLEGO exhibits substantially lower per-round wall-clock time, used as a proxy for energy. Our theoretical guarantees translate to superior performance in practice against baselines such as FedAvg and FedPer, as evaluated in several multi-class classification datasets, in particular, Omniglot, CIFAR-10, MNIST, Fashion-MNIST, and EMNIST.

View on arXiv PDF

Similar