Spatio-Temporal Covariance Descriptors for Action and Gesture Recognition
This work addresses the problem of robust action and gesture recognition for video analysis applications, offering an incremental improvement over existing methods.
The authors tackled action and gesture recognition by proposing a method based on spatio-temporal covariance descriptors and a weighted Riemannian projection, achieving superior performance on datasets like UCF sport, CK+, and Cambridge hand gesture compared to recent state-of-the-art techniques.
We propose a new action and gesture recognition method based on spatio-temporal covariance descriptors and a weighted Riemannian locality preserving projection approach that takes into account the curved space formed by the descriptors. The weighted projection is then exploited during boosting to create a final multiclass classification algorithm that employs the most useful spatio-temporal regions. We also show how the descriptors can be computed quickly through the use of integral video representations. Experiments on the UCF sport, CK+ facial expression and Cambridge hand gesture datasets indicate superior performance of the proposed method compared to several recent state-of-the-art techniques. The proposed method is robust and does not require additional processing of the videos, such as foreground detection, interest-point detection or tracking.