More Similar than Dissimilar: Modeling Annotators for Cross-Corpus Speech Emotion Recognition
This work addresses the challenge of personalizing emotion recognition systems for individual annotators in real-world applications, though it is incremental as it builds on existing multi-annotator modeling.
The paper tackled the problem of adapting speech emotion recognition models to new annotators without requiring extensive labeled data from each, by leveraging inter-annotator similarity to identify and use predictions from a similar, previously seen annotator, resulting in significant performance improvements over other off-the-shelf approaches.
Speech emotion recognition systems often predict a consensus value generated from the ratings of multiple annotators. However, these models have limited ability to predict the annotation of any one person. Alternatively, models can learn to predict the annotations of all annotators. Adapting such models to new annotators is difficult as new annotators must individually provide sufficient labeled training data. We propose to leverage inter-annotator similarity by using a model pre-trained on a large annotator population to identify a similar, previously seen annotator. Given a new, previously unseen, annotator and limited enrollment data, we can make predictions for a similar annotator, enabling off-the-shelf annotation of unseen data in target datasets, providing a mechanism for extremely low-cost personalization. We demonstrate our approach significantly outperforms other off-the-shelf approaches, paving the way for lightweight emotion adaptation, practical for real-world deployment.