LGAIJul 25, 2023

FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

arXiv:2307.13214v235 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses the challenge of exploiting multimodal data in federated learning for applications like human activity recognition, though it is incremental as it builds on existing FL and distillation methods.

The paper tackles the problem of limited labeled data and modality discrepancy in multimodal federated learning by proposing FedMEKT, a distillation-based embedding knowledge transfer framework, which achieves superior global encoder performance on linear evaluation across three human activity recognition datasets while reducing communication costs.

Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL approaches still rely on the labeled data at the client side, which is limited in real-world applications due to the inability of self-annotation from users. In light of these limitations, we propose a novel multimodal FL framework that employs a semi-supervised learning approach to leverage the representations from different modalities. Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset. Our FedMEKT iteratively updates the generalized global encoders with the joint embedding knowledge from the participating clients. Thereby, to address the modality discrepancy and labeled data constraint in existing FL systems, our proposed FedMEKT comprises local multimodal autoencoder learning, generalized multimodal autoencoder construction, and generalized classifier learning. Through extensive experiments on three multimodal human activity recognition datasets, we demonstrate that FedMEKT achieves superior global encoder performance on linear evaluation and guarantees user privacy for personal data and model parameters while demanding less communication cost than other baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes