Homogenization of Existing Inertial-Based Datasets to Support Human Activity Recognition
This work addresses a data bottleneck for researchers in HAR, enabling better model training, but it is incremental as it focuses on data integration rather than novel algorithmic breakthroughs.
The paper tackles the problem of insufficient and heterogeneous publicly available inertial signal datasets for human activity recognition (HAR), proposing a platform to integrate and homogenize these datasets to provide large, high-quality, and enriched data for the scientific community.
Several techniques have been proposed to address the problem of recognizing activities of daily living from signals. Deep learning techniques applied to inertial signals have proven to be effective, achieving significant classification accuracy. Recently, research in human activity recognition (HAR) models has been almost totally model-centric. It has been proven that the number of training samples and their quality are critical for obtaining deep learning models that both perform well independently of their architecture, and that are more robust to intraclass variability and interclass similarity. Unfortunately, publicly available datasets do not always contain hight quality data and a sufficiently large and diverse number of samples (e.g., number of subjects, type of activity performed, and duration of trials). Furthermore, datasets are heterogeneous among them and therefore cannot be trivially combined to obtain a larger set. The final aim of our work is the definition and implementation of a platform that integrates datasets of inertial signals in order to make available to the scientific community large datasets of homogeneous signals, enriched, when possible, with context information (e.g., characteristics of the subjects and device position). The main focus of our platform is to emphasise data quality, which is essential for training efficient models.