Adaptive Prototype Knowledge Transfer for Federated Learning with Mixed Modalities and Heterogeneous Tasks
This addresses privacy-preserving collaborative learning for clients with diverse data modalities and tasks, though it appears incremental as it builds on existing prototype-based methods.
The paper tackles the problem of multimodal federated learning with mixed modalities and heterogeneous tasks by proposing an adaptive prototype-based framework (AproMFL) that transfers knowledge without unified labels and dynamically adjusts aggregation weights, achieving accuracy improvements of 0.42% to 6.09% and recall gains of 1.6% to 3.89% over baselines on highly heterogeneous datasets.
Multimodal Federated Learning (MFL) with mixed modalities enables unimodal and multimodal clients to collaboratively train models while ensuring clients' privacy. As a representative sample of local data, prototypes offer an approach with low resource consumption and no reliance on prior knowledge for MFL with mixed modalities. However, existing prototype-based MFL methods assume unified labels across clients and identical tasks per client, which is impractical in MFL with mixed modalities. In this work, we propose an Adaptive prototype-based Multimodal Federated Learning (AproMFL) framework for mixed modalities to address the aforementioned issues. Our AproMFL transfers knowledge through adaptively-constructed prototypes without unified labels. Clients adaptively select prototype construction methods in line with labels; server converts client prototypes into unified multimodal prototypes and cluster them to form global prototypes. To address model aggregation issues in task heterogeneity, we develop a client relationship graph-based scheme to dynamically adjust aggregation weights. Furthermore, we propose a global prototype knowledge transfer loss and a global model knowledge transfer loss to enable the transfer of global knowledge to local knowledge. Experimental results show that AproMFL outperforms four baselines on three highly heterogeneous datasets ($α=0.1$) and two heterogeneous tasks, with the optimal results in accuracy and recall being 0.42%~6.09% and 1.6%~3.89% higher than those of FedIoT (FedAvg-based MFL), respectively.