Vision and Inertial Sensing Fusion for Human Action Recognition : A Review
This is an incremental review paper that surveys existing methods for improving human action recognition accuracy in applications like surveillance and assistive living.
This paper reviews research on fusing vision and inertial sensing for human action recognition, finding that combined sensing improves accuracy compared to individual modalities. It categorizes existing work by fusion approaches, features, classifiers, and datasets, and discusses challenges for real-world deployment.
Human action recognition is used in many applications such as video surveillance, human computer interaction, assistive living, and gaming. Many papers have appeared in the literature showing that the fusion of vision and inertial sensing improves recognition accuracies compared to the situations when each sensing modality is used individually. This paper provides a survey of the papers in which vision and inertial sensing are used simultaneously within a fusion framework in order to perform human action recognition. The surveyed papers are categorized in terms of fusion approaches, features, classifiers, as well as multimodality datasets considered. Challenges as well as possible future directions are also stated for deploying the fusion of these two sensing modalities under realistic conditions.