Learning behavioral context recognition with multi-stream temporal convolutional networks
This addresses the need for automatic and unobtrusive sensing in applications like assisted living and fitness tracking, representing an incremental improvement in multi-modal learning for context recognition.
The paper tackles the problem of recognizing diverse behavioral contexts from raw multi-modal sensor data using a multi-stream temporal convolutional network, achieving an optimal recognition rate on a highly imbalanced and sparsely labeled dataset without manual feature engineering.
Smart devices of everyday use (such as smartphones and wearables) are increasingly integrated with sensors that provide immense amounts of information about a person's daily life such as behavior and context. The automatic and unobtrusive sensing of behavioral context can help develop solutions for assisted living, fitness tracking, sleep monitoring, and several other fields. Towards addressing this issue, we raise the question: can a machine learn to recognize a diverse set of contexts and activities in a real-life through joint learning from raw multi-modal signals (e.g. accelerometer, gyroscope and audio etc.)? In this paper, we propose a multi-stream temporal convolutional network to address the problem of multi-label behavioral context recognition. A four-stream network architecture handles learning from each modality with a contextualization module which incorporates extracted representations to infer a user's context. Our empirical evaluation suggests that a deep convolutional network trained end-to-end achieves an optimal recognition rate. Furthermore, the presented architecture can be extended to include similar sensors for performance improvements and handles missing modalities through multi-task learning without any manual feature engineering on highly imbalanced and sparsely labeled dataset.