CLMay 31, 2021

Emotional Voice Conversion: Theory, Databases and ESD

arXiv:2105.14762v2276 citations
Originality Synthesis-oriented
AI Analysis

It addresses the need for a multi-speaker and cross-lingual emotional speech database for researchers in voice conversion, but is incremental as it builds on existing work.

The paper reviews emotional voice conversion research and existing databases, then introduces the ESD database with 350 parallel utterances from 20 speakers covering 5 emotions, totaling over 29 hours of speech, and implements state-of-the-art systems as case studies.

In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases. We then motivate the development of a novel emotional speech database (ESD) that addresses the increasing research need. With this paper, the ESD database is now made available to the research community. The ESD database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers and covers 5 emotion categories (neutral, happy, angry, sad and surprise). More than 29 hours of speech data were recorded in a controlled acoustic environment. The database is suitable for multi-speaker and cross-lingual emotional voice conversion studies. As case studies, we implement several state-of-the-art emotional voice conversion systems on the ESD database. This paper provides a reference study on ESD in conjunction with its release.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes