PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion
This work addresses a bottleneck in HAR for context-aware applications, but it is incremental as it builds on existing deep learning approaches with a specific fusion technique.
The paper tackles the problem of motion sensor fusion and feature extraction for Human Activity Recognition (HAR) by introducing PerceptionNet, a deep CNN that applies late 2D convolution to multimodal time-series data, resulting in an average accuracy improvement of over 3% compared to state-of-the-art methods.
Human Activity Recognition (HAR) based on motion sensors has drawn a lot of attention over the last few years, since perceiving the human status enables context-aware applications to adapt their services on users' needs. However, motion sensor fusion and feature extraction have not reached their full potentials, remaining still an open issue. In this paper, we introduce PerceptionNet, a deep Convolutional Neural Network (CNN) that applies a late 2D convolution to multimodal time-series sensor data, in order to extract automatically efficient features for HAR. We evaluate our approach on two public available HAR datasets to demonstrate that the proposed model fuses effectively multimodal sensors and improves the performance of HAR. In particular, PerceptionNet surpasses the performance of state-of-the-art HAR methods based on: (i) features extracted from humans, (ii) deep CNNs exploiting early fusion approaches, and (iii) Long Short-Term Memory (LSTM), by an average accuracy of more than 3%.