SalientGrads: Sparse Models for Communication Efficient and Data Aware Distributed Federated Training
This work addresses communication bottlenecks in federated learning for resource-limited edge clients, offering an incremental improvement over existing sparse training methods.
The paper tackles the challenge of communication inefficiency in federated learning by proposing SalientGrads, a method that selects a data-aware sparse subnetwork before training and transmits only highly sparse gradients, resulting in improved wall-clock communication time.
Federated learning (FL) enables the training of a model leveraging decentralized data in client sites while preserving privacy by not collecting data. However, one of the significant challenges of FL is limited computation and low communication bandwidth in resource limited edge client nodes. To address this, several solutions have been proposed in recent times including transmitting sparse models and learning dynamic masks iteratively, among others. However, many of these methods rely on transmitting the model weights throughout the entire training process as they are based on ad-hoc or random pruning criteria. In this work, we propose Salient Grads, which simplifies the process of sparse training by choosing a data aware subnetwork before training, based on the model-parameter's saliency scores, which is calculated from the local client data. Moreover only highly sparse gradients are transmitted between the server and client models during the training process unlike most methods that rely on sharing the entire dense model in each round. We also demonstrate the efficacy of our method in a real world federated learning application and report improvement in wall-clock communication time.