Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
This addresses the problem of scalable and deployable concurrent activity recognition for applications like smart environments, though it is incremental as it builds on existing CNN-LSTM methods.
The paper tackles concurrent activity recognition from multimodal sensor data by using a CNN-LSTM structure for feature extraction and a single classifier for output, achieving performance comparable to domain-specific systems on three datasets.
We introduce a system that recognizes concurrent activities from real-world data captured by multiple sensors of different types. The recognition is achieved in two steps. First, we extract spatial and temporal features from the multimodal data. We feed each datatype into a convolutional neural network that extracts spatial features, followed by a long-short term memory network that extracts temporal information in the sensory data. The extracted features are then fused for decision making in the second step. Second, we achieve concurrent activity recognition with a single classifier that encodes a binary output vector in which elements indicate whether the corresponding activity types are currently in progress. We tested our system with three datasets from different domains recorded using different sensors and achieved performance comparable to existing systems designed specifically for those domains. Our system is the first to address the concurrent activity recognition with multisensory data using a single model, which is scalable, simple to train and easy to deploy.