SD AI CL LGDec 17, 2023

Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

Vikramjit Mitra, Jingping Nie, Erdrin Azemi

arXiv:2312.16180v112.411 citationsh-index: 28ICASSP

Originality Incremental advance

AI Analysis

This work addresses efficiency and robustness issues in speech emotion analysis, offering incremental improvements for researchers and practitioners in affective computing.

The paper tackles the problem of high computational costs in dimensional speech emotion recognition by identifying lower-dimensional subspaces within pre-trained representations like BERT and HuBERT, achieving similar performance with reduced model complexity, and improves generalization by modeling label uncertainty from grader variance.

Representations derived from models such as BERT (Bidirectional Encoder Representations from Transformers) and HuBERT (Hidden units BERT), have helped to achieve state-of-the-art performance in dimensional speech emotion recognition. Despite their large dimensionality, and even though these representations are not tailored for emotion recognition tasks, they are frequently used to train large speech emotion models with high memory and computational costs. In this work, we show that there exist lower-dimensional subspaces within the these pre-trained representational spaces that offer a reduction in downstream model complexity without sacrificing performance on emotion estimation. In addition, we model label uncertainty in the form of grader opinion variance, and demonstrate that such information can improve the models generalization capacity and robustness. Finally, we compare the robustness of the emotion models against acoustic degradations and observed that the reduced dimensional representations were able to retain the performance similar to the full-dimensional representations without significant regression in dimensional emotion performance.

View on arXiv PDF

Similar