Self-supervised Learning Method Using Transformer for Multi-dimensional Sensor Data Processing
This is an incremental improvement for human activity recognition using sensor data.
The paper tackled human activity recognition from sensor data by developing an enhanced Transformer model with n-dimensional numerical processing features, achieving 10%-15% accuracy improvements over a vanilla Transformer across five datasets.
We developed a deep learning algorithm for human activity recognition using sensor signals as input. In this study, we built a pretrained language model based on the Transformer architecture, which is widely used in natural language processing. By leveraging this pretrained model, we aimed to improve performance on the downstream task of human activity recognition. While this task can be addressed using a vanilla Transformer, we propose an enhanced n-dimensional numerical processing Transformer that incorporates three key features: embedding n-dimensional numerical data through a linear layer, binning-based pre-processing, and a linear transformation in the output layer. We evaluated the effectiveness of our proposed model across five different datasets. Compared to the vanilla Transformer, our model demonstrated 10%-15% improvements in accuracy.