CVJun 27, 2023

A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System

arXiv:2306.15765v16 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate human activity recognition for applications like surveillance or healthcare, but it is incremental as it builds on existing fusion and neural network techniques.

The paper tackles multimodal human activity recognition by proposing a two-stream decision-level fusion system that combines vision and inertial sensor data, achieving accuracies of 96.9% to 98.7% on four benchmark datasets, which surpass current state-of-the-art methods.

This paper presents a novel multimodal human activity recognition system. It uses a two-stream decision level fusion of vision and inertial sensors. In the first stream, raw RGB frames are passed to a part affinity field-based pose estimation network to detect the keypoints of the user. These keypoints are then pre-processed and inputted in a sliding window fashion to a specially designed convolutional neural network for the spatial feature extraction followed by regularized LSTMs to calculate the temporal features. The outputs of LSTM networks are then inputted to fully connected layers for classification. In the second stream, data obtained from inertial sensors are pre-processed and inputted to regularized LSTMs for the feature extraction followed by fully connected layers for the classification. At this stage, the SoftMax scores of two streams are then fused using the decision level fusion which gives the final prediction. Extensive experiments are conducted to evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations. The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and 95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and C-MHAD datasets. These results are far superior than the current state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes