LG AIAug 3, 2025

MHARFedLLM: Multimodal Human Activity Recognition Using Federated Large Language Model

Asmit Bandyopadhyay, Rohit Basu, Tanmay Sen, Swagatam Das

arXiv:2508.01701v17.12 citationsh-index: 1

Originality Incremental advance

AI Analysis

It addresses the need for robust and accurate HAR in applications like fitness tracking and healthcare, though it appears incremental by integrating existing techniques like multimodal fusion and federated learning.

This work tackled the problem of Human Activity Recognition (HAR) by developing a multimodal federated learning framework that combines depth cameras, pressure mats, and accelerometers, achieving a centralized F1 Score of 0.934 and a federated F1 Score of 0.881.

Human Activity Recognition (HAR) plays a vital role in applications such as fitness tracking, smart homes, and healthcare monitoring. Traditional HAR systems often rely on single modalities, such as motion sensors or cameras, limiting robustness and accuracy in real-world environments. This work presents FedTime-MAGNET, a novel multimodal federated learning framework that advances HAR by combining heterogeneous data sources: depth cameras, pressure mats, and accelerometers. At its core is the Multimodal Adaptive Graph Neural Expert Transformer (MAGNET), a fusion architecture that uses graph attention and a Mixture of Experts to generate unified, discriminative embeddings across modalities. To capture complex temporal dependencies, a lightweight T5 encoder only architecture is customized and adapted within this framework. Extensive experiments show that FedTime-MAGNET significantly improves HAR performance, achieving a centralized F1 Score of 0.934 and a strong federated F1 Score of 0.881. These results demonstrate the effectiveness of combining multimodal fusion, time series LLMs, and federated learning for building accurate and robust HAR systems.

View on arXiv PDF

Similar