AS SDMar 3, 2018

An Ensemble Framework of Voice-Based Emotion Recognition System for Films and TV Programs

arXiv:1803.01122v16.627 citations

Originality Incremental advance

AI Analysis

This work addresses emotion recognition for AI products in noisy environments, but it is incremental as it builds on existing ensemble and multi-task learning approaches.

The paper tackled the problem of voice-based emotion recognition in noisy, real-world conditions like films and TV programs, and the result was a 29.5% relative improvement over the best baseline system on the MEC 2017 corpus.

Employing voice-based emotion recognition function in artificial intelligence (AI) product will improve the user experience. Most of researches that have been done only focus on the speech collected under controlled conditions. The scenarios evaluated in these research were well controlled. The conventional approach may fail when background noise or nonspeech filler exist. In this paper, we propose an ensemble framework combining several aspects of features from audio. The framework incorporates gender and speaker information relying on multi-task learning. Therefore it is able to dig and capture emotional information as much as possible. This framework is evaluated on multimodal emotion challenge (MEC) 2017 corpus which is close to real world. The proposed framework outperformed the best baseline system by 29.5% (relative improvement).

View on arXiv PDF

Similar