SP AI LGOct 18, 2024

UniMTS: Unified Pre-training for Motion Time Series

Xiyuan Zhang, Diyan Teng, Ranak Roy Chowdhury, Shuheng Li, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang

arXiv:2410.19818v113.033 citationsh-index: 9Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses the challenge of building scalable models for human activity analysis from motion data in applications like healthcare and IoT, though it is incremental as it builds on existing contrastive learning and graph network techniques.

The paper tackles the problem of poor generalizability in motion time series models due to limited datasets and variations in device factors and activities, by introducing UniMTS, a unified pre-training procedure that achieves 340% improvement in zero-shot, 16.3% in few-shot, and 9.2% in full-shot settings across 18 benchmarks.

Motion time series collected from mobile and wearable devices such as smartphones and smartwatches offer significant insights into human behavioral patterns, with wide applications in healthcare, automation, IoT, and AR/XR due to their low-power, always-on nature. However, given security and privacy concerns, building large-scale motion time series datasets remains difficult, preventing the development of pre-trained models for human activity analysis. Typically, existing models are trained and tested on the same dataset, leading to poor generalizability across variations in device location, device mounting orientation and human activity type. In this paper, we introduce UniMTS, the first unified pre-training procedure for motion time series that generalizes across diverse device latent factors and activities. Specifically, we employ a contrastive learning framework that aligns motion time series with text descriptions enriched by large language models. This helps the model learn the semantics of time series to generalize across activities. Given the absence of large-scale motion time series data, we derive and synthesize time series from existing motion skeleton data with all-joint coverage. Spatio-temporal graph networks are utilized to capture the relationships across joints for generalization across different device locations. We further design rotation-invariant augmentation to make the model agnostic to changes in device mounting orientations. Our model shows exceptional generalizability across 18 motion time series classification benchmark datasets, outperforming the best baselines by 340% in the zero-shot setting, 16.3% in the few-shot setting, and 9.2% in the full-shot setting.

View on arXiv PDF Code

Similar