CLLGSDASApr 27, 2024

Usefulness of Emotional Prosody in Neural Machine Translation

arXiv:2404.17968v11 citationsh-index: 14Speech prosody
Originality Incremental advance
AI Analysis

This work addresses translation quality for applications involving speech data, but it is incremental as it builds on existing methods for adding external information to NMT.

The authors tackled the problem of improving neural machine translation by incorporating automatically recognized emotional prosody from speech, showing that adding emotion information, particularly arousal, leads to better translations.

Neural Machine Translation (NMT) is the task of translating a text from one language to another with the use of a trained neural network. Several existing works aim at incorporating external information into NMT models to improve or control predicted translations (e.g. sentiment, politeness, gender). In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice. This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions. Our proposed method follows a two-stage procedure. At first, we select a state-of-the-art Speech Emotion Recognition (SER) model to predict dimensional emotion values from all input audio in the dataset. Then, we use these predicted emotions as source tokens added at the beginning of input texts to train our NMT model. We show that integrating emotion information, especially arousal, into NMT systems leads to better translations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes