Robust Event Detection based on Spatio-Temporal Latent Action Unit using Skeletal Information
This work addresses robust fall detection for health monitoring, but it is incremental as it builds on existing dictionary learning methods with specific improvements.
The paper tackles fall event detection from RGBD video by proposing a novel dictionary learning method that represents events as latent spatial-temporal atoms and filters outliers via a gradual online algorithm. It achieves the best precision and accuracy on a dataset of 209 fall videos, maintaining high accuracy and low variance with increasing noise.
This paper propose a novel dictionary learning approach to detect event action using skeletal information extracted from RGBD video. The event action is represented as several latent atoms and composed of latent spatial and temporal attributes. We perform the method at the example of fall event detection. The skeleton frames are clustered by an initial K-means method. Each skeleton frame is assigned with a varying weight parameter and fed into our Gradual Online Dictionary Learning (GODL) algorithm. During the training process, outlier frames will be gradually filtered by reducing the weight that is inversely proportional to a cost. In order to strictly distinguish the event action from similar actions and robustly acquire its action unit, we build a latent unit temporal structure for each sub-action. We evaluate the proposed method on parts of the NTURGB+D dataset, which includes 209 fall videos, 405 ground-lift videos, 420 sit-down videos, and 280 videos of 46 otheractions. We present the experimental validation of the achieved accuracy, recall and precision. Our approach achieves the bestperformance on precision and accuracy of human fall event detection, compared with other existing dictionary learning methods. With increasing noise ratio, our method remains the highest accuracy and the lowest variance.