CLAIDec 10, 2023

Speech and Text-Based Emotion Recognizer

arXiv:2312.11503v1
Originality Synthesis-oriented
AI Analysis

This work addresses dataset limitations in affective computing for researchers, but it is incremental as it focuses on improving existing methods through data handling and model experimentation.

The paper tackled the problem of scarce and imbalanced datasets in Speech Emotion Recognition by building a balanced corpus from public datasets using data augmentation and experimenting with architectures, resulting in a multi-modal speech and text-based model achieving a UA+WA of 157.57 compared to a baseline of 119.66.

Affective computing is a field of study that focuses on developing systems and technologies that can understand, interpret, and respond to human emotions. Speech Emotion Recognition (SER), in particular, has got a lot of attention from researchers in the recent past. However, in many cases, the publicly available datasets, used for training and evaluation, are scarce and imbalanced across the emotion labels. In this work, we focused on building a balanced corpus from these publicly available datasets by combining these datasets as well as employing various speech data augmentation techniques. Furthermore, we experimented with different architectures for speech emotion recognition. Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes