CVNov 24, 2024

Modality Alignment Meets Federated Broadcasting

Yuting Ma, Shengeng Tang, Xiaohua Xu, Lechao Cheng

arXiv:2411.15837v13.72 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses the problem of data heterogeneity in federated learning for edge devices, representing an incremental advancement.

This paper tackles the challenge of maintaining performance in federated learning with heterogeneous data by introducing a framework that uses modality alignment with text encoders on the server and image encoders on local devices, achieving improved generalization and robustness in experiments on benchmark datasets.

Federated learning (FL) has emerged as a powerful approach to safeguard data privacy by training models across distributed edge devices without centralizing local data. Despite advancements in homogeneous data scenarios, maintaining performance between the global and local clients in FL over heterogeneous data remains challenging due to data distribution variations that degrade model convergence and increase computational costs. This paper introduces a novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices. Inspired by multi-modal learning paradigms like CLIP, this design aligns cross-client learning by treating server-client communications akin to multi-modal broadcasting. We initialize with a pre-trained model to mitigate overfitting, updating select parameters through low-rank adaptation (LoRA) to meet computational demand and performance efficiency. Local models train independently and communicate updates to the server, which aggregates parameters via a query-based method, facilitating cross-client knowledge sharing and performance improvement under extreme heterogeneity. Extensive experiments on benchmark datasets demonstrate the efficacy in maintaining generalization and robustness, even in highly heterogeneous settings.

View on arXiv PDF

Similar