Sparse Coding of Shape Trajectories for Facial Expression and Action Recognition
This work addresses the challenge of handling nonlinear shape trajectories in computer vision for applications like facial expression and action recognition, representing an incremental improvement over existing techniques.
The paper tackled the problem of analyzing time-varying geometric data from human landmarks for behavior understanding by applying sparse coding and dictionary learning to shape trajectories on nonlinear manifolds, achieving competitive results with state-of-the-art methods on action and expression recognition datasets.
The detection and tracking of human landmarks in video streams has gained in reliability partly due to the availability of affordable RGB-D sensors. The analysis of such time-varying geometric data is playing an important role in the automatic human behavior understanding. However, suitable shape representations as well as their temporal evolution, termed trajectories, often lie to nonlinear manifolds. This puts an additional constraint (i.e., nonlinearity) in using conventional Machine Learning techniques. As a solution, this paper accommodates the well-known Sparse Coding and Dictionary Learning approach to study time-varying shapes on the Kendall shape spaces of 2D and 3D landmarks. We illustrate effective coding of 3D skeletal sequences for action recognition and 2D facial landmark sequences for macro- and micro-expression recognition. To overcome the inherent nonlinearity of the shape spaces, intrinsic and extrinsic solutions were explored. As main results, shape trajectories give rise to more discriminative time-series with suitable computational properties, including sparsity and vector space structure. Extensive experiments conducted on commonly-used datasets demonstrate the competitiveness of the proposed approaches with respect to state-of-the-art.