SDAIASOct 3, 2023

Prompting Audios Using Acoustic Properties For Emotion Representation

arXiv:2310.02298v310 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of capturing emotional diversity in audio for applications like emotion recognition, though it is incremental as it builds on existing contrastive learning methods.

The paper tackles the problem of representing emotions as discrete variables by proposing acoustic prompts generated from audio properties like pitch and intensity, and shows improved performance in Emotion Audio Retrieval and a 3.8% relative accuracy gain in Speech Emotion Recognition on the Ravdess dataset.

Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs. We use acoustic properties that are correlated to emotion like pitch, intensity, speech rate, and articulation rate to automatically generate prompts i.e. 'acoustic prompts'. We use a contrastive learning objective to map speech to their respective acoustic prompts. We evaluate our model on Emotion Audio Retrieval and Speech Emotion Recognition. Our results show that the acoustic prompts significantly improve the model's performance in EAR, in various Precision@K metrics. In SER, we observe a 3.8% relative accuracy improvement on the Ravdess dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes