Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
This addresses the challenge of personalizing lightweight prompts in federated learning for vision-language models, offering an incremental improvement over traditional single-model download methods.
The paper tackles the problem of federated prompt learning for vision-language models by proposing a personalized framework that allows clients to download multiple pre-aggregated prompts as experts, improving alignment with local image data. The result shows efficacy across 9 datasets under various federated settings, with concrete numbers not explicitly provided in the abstract.
Federated prompt learning benefits federated learning with CLIP-like Vision-Language Model's (VLM's) robust representation learning ability through prompt learning. However, current federated prompt learning methods are habitually restricted to the traditional FL paradigm, where the participating clients are generally only allowed to download a single globally aggregated model from the server. While justifiable for training full-sized models under federated settings, in this work, we argue that this paradigm is ill-suited for lightweight prompts. By facilitating the clients to download multiple pre-aggregated prompts as fixed non-local experts, we propose Personalized Federated Mixture of Adaptive Prompts (pFedMoAP), a novel FL framework that personalizes the prompt learning process through the lens of Mixture of Experts (MoE). pFedMoAP implements a local attention-based gating network that learns to generate enhanced text features for better alignment with local image data, benefiting from both local and downloaded non-local adaptive prompt experts. Extensive experiments on 9 datasets under various federated settings demonstrate the efficacy of the proposed pFedMoAP algorithm. The code is available at https://github.com/ljaiverson/pFedMoAP.