LG AIFeb 26, 2025

FAA-CLIP: Federated Adversarial Adaptation of CLIP

Yihang Wu, Ahmad Chaddad, Christian Desrosiers, Tareef Daqqaq, Reem Kateb

arXiv:2503.05776v113.012 citationsh-index: 7Has CodeIEEE Internet of Things Journal

Originality Incremental advance

AI Analysis

This work addresses efficient and robust adaptation of pre-trained models in federated settings, particularly for medical applications, though it is incremental as it builds on existing CLIP and domain adaptation techniques.

The paper tackles the challenges of using large vision-language models like CLIP in federated learning, including high communication costs and data heterogeneity, by proposing FAA-CLIP, which reduces parameters transferred by 90% and improves generalization on medical datasets by up to 15% over baselines.

Despite the remarkable performance of vision language models (VLMs) such as Contrastive Language Image Pre-training (CLIP), the large size of these models is a considerable obstacle to their use in federated learning (FL) systems where the parameters of local client models need to be transferred to a global server for aggregation. Another challenge in FL is the heterogeneity of data from different clients, which affects the generalization performance of the solution. In addition, natural pre-trained VLMs exhibit poor generalization ability in the medical datasets, suggests there exists a domain gap. To solve these issues, we introduce a novel method for the Federated Adversarial Adaptation (FAA) of CLIP. Our method, named FAA-CLIP, handles the large communication costs of CLIP using a light-weight feature adaptation module (FAM) for aggregation, effectively adapting this VLM to each client's data while greatly reducing the number of parameters to transfer. By keeping CLIP frozen and only updating the FAM parameters, our method is also computationally efficient. Unlike existing approaches, our FAA-CLIP method directly addresses the problem of domain shifts across clients via a domain adaptation (DA) module. This module employs a domain classifier to predict if a given sample is from the local client or the global server, allowing the model to learn domain-invariant representations. Extensive experiments on six different datasets containing both natural and medical images demonstrate that FAA-CLIP can generalize well on both natural and medical datasets compared to recent FL approaches. Our codes are available at https://github.com/AIPMLab/FAA-CLIP.

View on arXiv PDF Code

Similar