FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning
This work addresses the need for reliable evaluation in federated learning for privacy-sensitive applications, but it is incremental as it focuses on benchmarking rather than proposing new methods.
The paper tackles the problem of evaluating federated prompt learning algorithms for vision-language models by introducing FLIP, a comprehensive framework that tests 8 methods across 4 protocols and 12 datasets, showing that prompt learning maintains strong generalization with minimal resource use.
The increasing emphasis on privacy and data security has driven the adoption of federated learning, a decentralized approach to train machine learning models without sharing raw data. Prompt learning, which fine-tunes prompt embeddings of pretrained models, offers significant advantages in federated settings by reducing computational costs and communication overheads while leveraging the strong performance and generalization capabilities of vision-language models such as CLIP. This paper addresses the intersection of federated learning and prompt learning, particularly for vision-language models. In this work, we introduce a comprehensive framework, named FLIP, to evaluate federated prompt learning algorithms. FLIP assesses the performance of 8 state-of-the-art federated prompt learning methods across 4 federated learning protocols and 12 open datasets, considering 6 distinct evaluation scenarios. Our findings demonstrate that prompt learning maintains strong generalization performance in both in-distribution and out-of-distribution settings with minimal resource consumption. This work highlights the effectiveness of federated prompt learning in environments characterized by data scarcity, unseen classes, and cross-domain distributional shifts. We open-source the code for all implemented algorithms in FLIP to facilitate further research in this domain.