Adaptive Differential Filters for Fast and Communication-Efficient Federated Learning
This work addresses communication efficiency in federated learning, which is a critical bottleneck for distributed AI systems, but it appears incremental as it builds on existing sparsity and differential update techniques.
The paper tackles the communication overhead in federated learning by proposing an adaptive scaling method for convolutional filters that compensates for sparse updates, adapts to new data domains, and increases sparsity, resulting in up to 377 times reduction in transmitted data while improving model performance and convergence speed.
Federated learning (FL) scenarios inherently generate a large communication overhead by frequently transmitting neural network updates between clients and server. To minimize the communication cost, introducing sparsity in conjunction with differential updates is a commonly used technique. However, sparse model updates can slow down convergence speed or unintentionally skip certain update aspects, e.g., learned features, if error accumulation is not properly addressed. In this work, we propose a new scaling method operating at the granularity of convolutional filters which 1) compensates for highly sparse updates in FL processes, 2) adapts the local models to new data domains by enhancing some features in the filter space while diminishing others and 3) motivates extra sparsity in updates and thus achieves higher compression ratios, i.e., savings in the overall data transfer. Compared to unscaled updates and previous work, experimental results on different computer vision tasks (Pascal VOC, CIFAR10, Chest X-Ray) and neural networks (ResNets, MobileNets, VGGs) in uni-, bidirectional and partial update FL settings show that the proposed method improves the performance of the central server model while converging faster and reducing the total amount of transmitted data by up to 377 times.