AI CV LGSep 30, 2025

Zero-Shot Decentralized Federated Learning

Alessio Masano, Matteo Pennisi, Federica Proietto Salanitri, Concetto Spampinato, Giovanni Bellitto

arXiv:2509.26462v1h-index: 18IJCNN

Originality Highly original

AI Analysis

This work addresses scalability and privacy issues in federated learning for vision-language models, offering a solution for real-world decentralized applications.

The paper tackles the challenge of applying zero-shot learning in federated learning by proposing ZeroDFL, a fully decentralized framework that eliminates the need for a central server and reduces communication overhead by 118x compared to prior methods, while maintaining or improving performance on nine image classification datasets.

CLIP has revolutionized zero-shot learning by enabling task generalization without fine-tuning. While prompting techniques like CoOp and CoCoOp enhance CLIP's adaptability, their effectiveness in Federated Learning (FL) remains an open challenge. Existing federated prompt learning approaches, such as FedCoOp and FedTPG, improve performance but face generalization issues, high communication costs, and reliance on a central server, limiting scalability and privacy. We propose Zero-shot Decentralized Federated Learning (ZeroDFL), a fully decentralized framework that enables zero-shot adaptation across distributed clients without a central coordinator. ZeroDFL employs an iterative prompt-sharing mechanism, allowing clients to optimize and exchange textual prompts to enhance generalization while drastically reducing communication overhead. We validate ZeroDFL on nine diverse image classification datasets, demonstrating that it consistently outperforms--or remains on par with--state-of-the-art federated prompt learning methods. More importantly, ZeroDFL achieves this performance in a fully decentralized setting while reducing communication overhead by 118x compared to FedTPG. These results highlight that our approach not only enhances generalization in federated zero-shot learning but also improves scalability, efficiency, and privacy preservation--paving the way for decentralized adaptation of large vision-language models in real-world applications.

View on arXiv PDF

Similar