SD CL LG ASSep 6, 2022

Read it to me: An emotionally aware Speech Narration Application

arXiv:2209.02785v1h-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses the problem of generating emotionally varied audio for applications like narration, but it is incremental as it builds on existing methods with limited scope.

The paper tackled emotional style transfer on audio using a MelGAN-VC architecture for emotion-pair transfers, finding that 'sad' audio was generated better than 'happy' or 'anger' due to more consistent expressions of sadness.

In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architecture is explored for various emotion-pair transfers. The generated audio is then classified using an LSTM-based emotion classifier for audio. We find that "sad" audio is generated well as compared to "happy" or "anger" as people have similar expressions of sadness.

View on arXiv PDF

Similar