CLCVApr 1, 2024

Open-Vocabulary Federated Learning with Multimodal Prototyping

arXiv:2404.01232v231 citationsh-index: 15NAACL
AI Analysis

This addresses a practical limitation in federated learning for real-world applications where label spaces may not be identical, though it is incremental as it builds on existing vision-language models.

The paper tackles the problem of open-vocabulary queries in federated learning, where new users may involve unseen classes, by proposing Federated Multimodal Prototyping (Fed-MP) to adapt pre-trained vision-language models, achieving effective results as validated on various datasets.

Existing federated learning (FL) studies usually assume the training label space and test label space are identical. However, in real-world applications, this assumption is too ideal to be true. A new user could come up with queries that involve data from unseen classes, and such open-vocabulary queries would directly defect such FL systems. Therefore, in this work, we explicitly focus on the under-explored open-vocabulary challenge in FL. That is, for a new user, the global server shall understand her/his query that involves arbitrary unknown classes. To address this problem, we leverage the pre-trained vision-language models (VLMs). In particular, we present a novel adaptation framework tailored for VLMs in the context of FL, named as Federated Multimodal Prototyping (Fed-MP). Fed-MP adaptively aggregates the local model weights based on light-weight client residuals, and makes predictions based on a novel multimodal prototyping mechanism. Fed-MP exploits the knowledge learned from the seen classes, and robustifies the adapted VLM to unseen categories. Our empirical evaluation on various datasets validates the effectiveness of Fed-MP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes