Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory
This addresses the communication cost problem for federated learning systems, offering an incremental improvement over prior compression methods.
The paper tackles the communication bottleneck in federated learning by proposing a method to reduce average communication cost, achieving near-optimal performance and outperforming existing compression techniques like Top-K and QSGD on the Stack Overflow next-word prediction benchmark.
A significant bottleneck in federated learning (FL) is the network communication cost of sending model updates from client devices to the central server. We present a comprehensive empirical study of the statistics of model updates in FL, as well as the role and benefits of various compression techniques. Motivated by these observations, we propose a novel method to reduce the average communication cost, which is near-optimal in many use cases, and outperforms Top-K, DRIVE, 3LC and QSGD on Stack Overflow next-word prediction, a realistic and challenging FL benchmark. This is achieved by examining the problem using rate-distortion theory, and proposing distortion as a reliable proxy for model accuracy. Distortion can be more effectively used for optimizing the trade-off between model performance and communication cost across clients. We demonstrate empirically that in spite of the non-i.i.d. nature of federated learning, the rate-distortion frontier is consistent across datasets, optimizers, clients and training rounds.