LG AI HCSep 9, 2021

Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network

Lance Ying, Amrit Romana, Emily Mower Provost

arXiv:2109.04316v11.6

Originality Incremental advance

AI Analysis

This addresses the challenge of handling variations in emotional expressions for speech emotion recognition systems, though it appears incremental as it builds on existing hierarchical and multitask learning approaches.

The paper tackled the problem of speech emotion recognition by proposing a Nonparametric Hierarchical Neural Network (NHNN) that does not require domain labels, and it outperformed state-of-the-art models in within-corpus and cross-corpus tests.

In recent years, deep-learning-based speech emotion recognition models have outperformed classical machine learning models. Previously, neural network designs, such as Multitask Learning, have accounted for variations in emotional expressions due to demographic and contextual factors. However, existing models face a few constraints: 1) they rely on a clear definition of domains (e.g. gender, noise condition, etc.) and the availability of domain labels; 2) they often attempt to learn domain-invariant features while emotion expressions can be domain-specific. In the present study, we propose the Nonparametric Hierarchical Neural Network (NHNN), a lightweight hierarchical neural network model based on Bayesian nonparametric clustering. In comparison to Multitask Learning approaches, the proposed model does not require domain/task labels. In our experiments, the NHNN models generally outperform the models with similar levels of complexity and state-of-the-art models in within-corpus and cross-corpus tests. Through clustering analysis, we show that the NHNN models are able to learn group-specific features and bridge the performance gap between groups.

View on arXiv PDF

Similar