LGApr 21, 2021

Gradient Masked Federated Optimization

arXiv:2104.10322v12 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation in federated learning for scenarios with heterogeneous client data, though it is an incremental improvement over existing methods.

The paper tackles the problem of poor generalization in Federated Averaging (FedAVG) when applied to new clients with different data distributions, proposing a modification using masked gradients that achieves better out-of-distribution accuracy, especially with non-identically distributed data.

Federated Averaging (FedAVG) has become the most popular federated learning algorithm due to its simplicity and low communication overhead. We use simple examples to show that FedAVG has the tendency to sew together the optima across the participating clients. These sewed optima exhibit poor generalization when used on a new client with new data distribution. Inspired by the invariance principles in (Arjovsky et al., 2019; Parascandolo et al., 2020), we focus on learning a model that is locally optimal across the different clients simultaneously. We propose a modification to FedAVG algorithm to include masked gradients (AND-mask from (Parascandolo et al., 2020)) across the clients and uses them to carry out an additional server model update. We show that this algorithm achieves better accuracy (out-of-distribution) than FedAVG, especially when the data is non-identically distributed across clients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes