SDLGASJan 5, 2021

Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition

arXiv:2101.01356v214 citations
AI Analysis

This work tackles the problem of multilingual speech emotion recognition for less popular languages where large training corpora are unavailable, offering an incremental improvement.

This paper addresses the challenge of multilingual speech emotion recognition (SER) with limited data by reframing it as a few-shot learning problem. They propose F-MAML, a modification to the MAML algorithm, which outperforms the original MAML on the EmoFilm dataset.

In this paper, we analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER). The current speech emotion recognition models work exceptionally well but fail when then input is multilingual. Moreover, when training such models, the models' performance is suitable only when the training corpus is vast. This availability of a big training corpus is a significant problem when choosing a language that is not much popular or obscure. We attempt to solve this challenge of multilingualism and lack of available data by turning this problem into a few-shot learning problem. We suggest relaxing the assumption that all N classes in an N-way K-shot problem be new and define an N+F way problem where N and F are the number of emotion classes and predefined fixed classes, respectively. We propose this modification to the Model-Agnostic MetaLearning (MAML) algorithm to solve the problem and call this new model F-MAML. This modification performs better than the original MAML and outperforms on EmoFilm dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes