Investigating Enhancements to Contrastive Predictive Coding for Human Activity Recognition
This work addresses the challenge of annotation scarcity in human activity recognition, offering incremental improvements to self-supervised learning for wearable applications.
The authors tackled the problem of learning representations from unlabeled wearable sensor data for human activity recognition by enhancing Contrastive Predictive Coding, resulting in substantial improvements on four of six datasets and outperforming baselines with limited labeled data.
The dichotomy between the challenging nature of obtaining annotations for activities, and the more straightforward nature of data collection from wearables, has resulted in significant interest in the development of techniques that utilize large quantities of unlabeled data for learning representations. Contrastive Predictive Coding (CPC) is one such method, learning effective representations by leveraging properties of time-series data to setup a contrastive future timestep prediction task. In this work, we propose enhancements to CPC, by systematically investigating the encoder architecture, the aggregator network, and the future timestep prediction, resulting in a fully convolutional architecture, thereby improving parallelizability. Across sensor positions and activities, our method shows substantial improvements on four of six target datasets, demonstrating its ability to empower a wide range of application scenarios. Further, in the presence of very limited labeled data, our technique significantly outperforms both supervised and self-supervised baselines, positively impacting situations where collecting only a few seconds of labeled data may be possible. This is promising, as CPC does not require specialized data transformations or reconstructions for learning effective representations.