HCApr 10

Lightweight and Generalizable Multi-Sensor Human Activity Recognition via Cascaded Fusion and Style-Augmented Decomposition

Wang Chenglong, Zhuo Yan, Ding Wenbo, Chen Xinlei

arXiv:2604.089105.7h-index: 4

Predicted impact top 94% in HC · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a practical solution for wearable human activity recognition on resource-constrained devices, though it is incremental as it builds on the existing decomposition-extraction-fusion paradigm.

The paper tackled the problem of high computational complexity and lack of robustness in wearable human activity recognition by proposing a lightweight framework with cascaded fusion and style-augmented decomposition, achieving state-of-the-art accuracy and macro-F1 scores while reducing computational overhead by over 30%.

Wearable Human Activity Recognition (WHAR) is a prominent research area within ubiquitous computing, whose core lies in effectively modeling intra- and inter-sensor spatio-temporal relationships from multi-modal time series data. Existing methods either suffer from high computational complexity due to attention-based fusion or lack robustness to data variations during feature extraction. To address these issues, we propose a lightweight and generalizable framework that retains the core "decomposition-extraction-fusion" paradigm while introducing two key innovations. First, we replace the computationally expensive Attention and Cross-Variable Fusion (CVF) modules with a Cascaded Fusion Block (CFB), which achieves efficient feature interaction without explicit attention weights through the operational process of "compression-recursion-concatenation-fusion". Second, we integrate a MixStyle-based data augmentation module before the Local Temporal Feature Extraction (LTFE) and Global Temporal Aggregation (GTA) stages. By mixing the mean and variance of different samples within a batch and introducing random coefficients to perturb the data distribution, the model's generalization ability is enhanced without altering the core information of the data. The proposed framework maintains sensor-level, variable-level, and channel-level independence during the decomposition phase, and achieves efficient feature fusion and robust feature extraction in subsequent processes. Experiments on two benchmark datasets (Realdisp, Skoda) demonstrate that our model outperforms state-of-the-art methods in both accuracy and macro-F1 score, while reducing computational overhead by more than 30\% compared to attention-based baselines. This work provides a practical solution for WHAR applications on resource-constrained wearable devices.

View on arXiv PDF

Similar