CVOct 21, 2025

FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning

Yubin Zheng, Pak-Hei Yeung, Jing Xia, Tianjie Ju, Peng Tang, Weidong Qiu, Jagath C. Rajapakse

arXiv:2510.18837v13.6h-index: 9MM

Originality Incremental advance

AI Analysis

This work addresses domain adaptation challenges in federated learning for image recognition, offering a method to improve model performance in multi-domain scenarios, though it is incremental in building on existing prompt tuning and federated learning techniques.

The paper tackles the problem of domain shift and label heterogeneity in federated learning by proposing FedDEAP, an adaptive dual-prompt tuning framework for CLIP, which enhances generalization across multiple domains, as demonstrated through experiments on four datasets.

Federated learning (FL) enables multiple clients to collaboratively train machine learning models without exposing local data, balancing performance and privacy. However, domain shift and label heterogeneity across clients often hinder the generalization of the aggregated global model. Recently, large-scale vision-language models like CLIP have shown strong zero-shot classification capabilities, raising the question of how to effectively fine-tune CLIP across domains in a federated setting. In this work, we propose an adaptive federated prompt tuning framework, FedDEAP, to enhance CLIP's generalization in multi-domain scenarios. Our method includes the following three key components: (1) To mitigate the loss of domain-specific information caused by label-supervised tuning, we disentangle semantic and domain-specific features in images by using semantic and domain transformation networks with unbiased mappings; (2) To preserve domain-specific knowledge during global prompt aggregation, we introduce a dual-prompt design with a global semantic prompt and a local domain prompt to balance shared and personalized information; (3) To maximize the inclusion of semantic and domain information from images in the generated text features, we align textual and visual representations under the two learned transformations to preserve semantic and domain consistency. Theoretical analysis and extensive experiments on four datasets demonstrate the effectiveness of our method in enhancing the generalization of CLIP for federated image recognition across multiple domains.

View on arXiv PDF

Similar