Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
This work addresses the problem of efficient and reliable HAR for users of wearable devices, highlighting that transformers are not suitable for this domain due to data scarcity and resource constraints, making it an incremental study that challenges existing assumptions.
The paper investigates transformer-based approaches for sensor-based human activity recognition (HAR) and finds that they consistently underperform compared to non-transformer methods, with higher computational demands, inferior performance, and reduced robustness to adversarial attacks.
Transformers have excelled in natural language processing and computer vision, paving their way to sensor-based Human Activity Recognition (HAR). Previous studies show that transformers outperform their counterparts exclusively when they harness abundant data or employ compute-intensive optimization algorithms. However, neither of these scenarios is viable in sensor-based HAR due to the scarcity of data in this field and the frequent need to perform training and inference on resource-constrained devices. Our extensive investigation into various implementations of transformer-based versus non-transformer-based HAR using wearable sensors, encompassing more than 500 experiments, corroborates these concerns. We observe that transformer-based solutions pose higher computational demands, consistently yield inferior performance, and experience significant performance degradation when quantized to accommodate resource-constrained devices. Additionally, transformers demonstrate lower robustness to adversarial attacks, posing a potential threat to user trust in HAR.