Color-based Emotion Representation for Speech Emotion Recognition

Ryotaro Nagase, Ryoichi Takashima, Yoichi Yamashita

arXiv:2602.16256v11.2h-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses the problem of better representing emotions in speech for applications like human-computer interaction, though it appears incremental by applying existing methods to a new representation.

The paper tackled the limited diversity and interpretability in speech emotion recognition by using color attributes like hue, saturation, and value to represent emotions as continuous scores, demonstrating their relationship and developing regression models with multitask learning that improved performance.

Speech emotion recognition (SER) has traditionally relied on categorical or dimensional labels. However, this technique is limited in representing both the diversity and interpretability of emotions. To overcome this limitation, we focus on color attributes, such as hue, saturation, and value, to represent emotions as continuous and interpretable scores. We annotated an emotional speech corpus with color attributes via crowdsourcing and analyzed them. Moreover, we built regression models for color attributes in SER using machine learning and deep learning, and explored the multitask learning of color attribute regression and emotion classification. As a result, we demonstrated the relationship between color attributes and emotions in speech, and successfully developed color attribute regression models for SER. We also showed that multitask learning improved the performance of each task.

View on arXiv PDF

Similar