AICVLGNov 23, 2020

Yet it moves: Learning from Generic Motions to Generate IMU data from YouTube videos

arXiv:2011.11600v16 citations
AI Analysis

This research provides a method to generate synthetic IMU data from videos, which could significantly reduce the data collection burden for researchers and developers working on Human Activity Recognition.

This paper addresses the scarcity of labeled wearable sensor data for Human Activity Recognition (HAR) by proposing a method to generate synthetic IMU data (acceleration and gyro norms) from online videos. The authors demonstrate that HAR systems trained on this simulated data can achieve F1 scores within 10% of systems trained on real sensor data, and this gap can be closed by either calibration with a small amount of real data or by generating a larger volume of synthetic data.

Human activity recognition (HAR) using wearable sensors has benefited much less from recent advances in Machine Learning than fields such as computer vision and natural language processing. This is to a large extent due to the lack of large scale repositories of labeled training data. In our research we aim to facilitate the use of online videos, which exists in ample quantity for most activities and are much easier to label than sensor data, to simulate labeled wearable motion sensor data. In previous work we already demonstrate some preliminary results in this direction focusing on very simple, activity specific simulation models and a single sensor modality (acceleration norm)\cite{10.1145/3341162.3345590}. In this paper we show how we can train a regression model on generic motions for both accelerometer and gyro signals and then apply it to videos of the target activities to generate synthetic IMU data (acceleration and gyro norms) that can be used to train and/or improve HAR models. We demonstrate that systems trained on simulated data generated by our regression model can come to within around 10% of the mean F1 score of a system trained on real sensor data. Furthermore we show that by either including a small amount of real sensor data for model calibration or simply leveraging the fact that (in general) we can easily generate much more simulated data from video than we can collect in terms of real sensor data the advantage of real sensor data can be eventually equalized.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes