Federated Learning for Inference at Anytime and Anywhere
This work addresses the problem of efficient and scalable inference in FL for applications requiring anytime and anywhere deployment, though it is incremental as it builds on existing adapter methods.
The paper tackles the challenge of adapting pre-trained Transformer models in federated learning (FL) by proposing a lightweight attention-based adapter module that enables fast, communication-efficient learning and supports heterogeneous data and devices, achieving competitive performance on benchmarks like CIFAR-100, FEMNIST, and SpeechCommandsv2.
Federated learning has been predominantly concerned with collaborative training of deep networks from scratch, and especially the many challenges that arise, such as communication cost, robustness to heterogeneous data, and support for diverse device capabilities. However, there is no unified framework that addresses all these problems together. This paper studies the challenges and opportunities of exploiting pre-trained Transformer models in FL. In particular, we propose to efficiently adapt such pre-trained models by injecting a novel attention-based adapter module at each transformer block that both modulates the forward pass and makes an early prediction. Training only the lightweight adapter by FL leads to fast and communication-efficient learning even in the presence of heterogeneous data and devices. Extensive experiments on standard FL benchmarks, including CIFAR-100, FEMNIST and SpeechCommandsv2 demonstrate that this simple framework provides fast and accurate FL while supporting heterogenous device capabilities, efficient personalization, and scalable-cost anytime inference.