SD LG ASDec 27, 2024

Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition

Shreya G. Upadhyay, Ali N. Salman, Carlos Busso, Chi-Chun Lee

arXiv:2412.19909v14.93 citationsh-index: 8ICASSP

Originality Incremental advance

AI Analysis

This addresses the problem of variable acoustic features in speech emotion recognition for practical applications, but it is incremental as it shifts focus rather than introducing a new paradigm.

The study tackled cross-corpus speech emotion recognition by focusing on emotion-specific articulatory gestures instead of acoustic features, revealing that mouth articulatory gestures improve emotion recognition across different settings.

Cross-corpus speech emotion recognition (SER) plays a vital role in numerous practical applications. Traditional approaches to cross-corpus emotion transfer often concentrate on adapting acoustic features to align with different corpora, domains, or labels. However, acoustic features are inherently variable and error-prone due to factors like speaker differences, domain shifts, and recording conditions. To address these challenges, this study adopts a novel contrastive approach by focusing on emotion-specific articulatory gestures as the core elements for analysis. By shifting the emphasis on the more stable and consistent articulatory gestures, we aim to enhance emotion transfer learning in SER tasks. Our research leverages the CREMA-D and MSP-IMPROV corpora as benchmarks and it reveals valuable insights into the commonality and reliability of these articulatory gestures. The findings highlight mouth articulatory gesture potential as a better constraint for improving emotion recognition across different settings or domains.

View on arXiv PDF

Similar