AS SDNov 15, 2021

Biologically inspired speech emotion recognition

arXiv:2111.08112v151 citations

Originality Incremental advance

AI Analysis

This addresses the problem of accurately recognizing emotions in speech for applications like human-computer interaction, though it appears incremental as it builds on existing biological and neural network concepts.

The paper tackles the challenge of speech emotion recognition by avoiding explicit feature extraction and combining a source-filter model with a liquid state machine, achieving very good classification performance on the Emo-DB database.

Conventional feature-based classification methods do not apply well to automatic recognition of speech emotions, mostly because the precise set of spectral and prosodic features that is required to identify the emotional state of a speaker has not been determined yet. This paper presents a method that operates directly on the speech signal, thus avoiding the problematic step of feature extraction. Furthermore, this method combines the strengths of the classical source-filter model of human speech production with those of the recently introduced liquid state machine (LSM), a biologically-inspired spiking neural network (SNN). The source and vocal tract components of the speech signal are first separated and converted into perceptually relevant spectral representations. These representations are then processed separately by two reservoirs of neurons. The output of each reservoir is reduced in dimensionality and fed to a final classifier. This method is shown to provide very good classification performance on the Berlin Database of Emotional Speech (Emo-DB). This seems a very promising framework for solving efficiently many other problems in speech processing.

View on arXiv PDF

Similar