Emotion recognition in talking-face videos using persistent entropy and neural networks
This work addresses emotion recognition for applications in human-computer interaction, but it is incremental as it builds on existing methods with a novel combination of tools.
The authors tackled emotion recognition from talking-face videos by combining audio and image data into a topological signature, which was then classified by a neural network into eight emotions, achieving competitive results that outperform other state-of-the-art methods.
The automatic recognition of a person's emotional state has become a very active research field that involves scientists specialized in different areas such as artificial intelligence, computer vision or psychology, among others. Our main objective in this work is to develop a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos. Specifically, we combine audio-signal and image-sequence information to compute a topology signature(a 9-dimensional vector) for each video. We prove that small changes in the video produce small changes in the signature. These topological signatures are used to feed a neural network to distinguish between the following emotions: neutral, calm, happy, sad, angry, fearful, disgust, and surprised. The results reached are promising and competitive, beating the performance reached in other state-of-the-art works found in the literature.