ASAICLSDMay 23, 2018

ASR-based Features for Emotion Recognition: A Transfer Learning Approach

arXiv:1805.09197v31102 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of emotion recognition in spontaneous speech for affective computing applications, representing an incremental improvement over existing feature sets.

The paper tackled emotion recognition from speech by using a neural Automatic Speech Recognition (ASR) system as a feature extractor, showing that these features outperform the eGeMAPS feature set in predicting valence and arousal dimensions with concrete performance gains.

During the last decade, the applications of signal processing have drastically improved with deep learning. However areas of affecting computing such as emotional speech synthesis or emotion recognition from spoken language remains challenging. In this paper, we investigate the use of a neural Automatic Speech Recognition (ASR) as a feature extractor for emotion recognition. We show that these features outperform the eGeMAPS feature set to predict the valence and arousal emotional dimensions, which means that the audio-to-text mapping learning by the ASR system contain information related to the emotional dimensions in spontaneous speech. We also examine the relationship between first layers (closer to speech) and last layers (closer to text) of the ASR and valence/arousal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes