SDCLASApr 5, 2021

StarGAN-based Emotional Voice Conversion for Japanese Phrases

arXiv:2104.01807v18 citations
AI Analysis

This addresses emotional voice conversion for Japanese language applications, but it is incremental as it directly applies an existing method to a new task.

The paper tackled emotional voice conversion for Japanese phrases by applying StarGAN-VC with minimal processing, achieving subjective evaluation results in terms of classification and similarity scores.

This paper shows that StarGAN-VC, a spectral envelope transformation method for non-parallel many-to-many voice conversion (VC), is capable of emotional VC (EVC). Although StarGAN-VC has been shown to enable speaker identity conversion, its capability for EVC for Japanese phrases has not been clarified. In this paper, we describe the direct application of StarGAN-VC to an EVC task with minimal fundamental frequency and aperiodicity processing. Through subjective evaluation experiments, we evaluated the performance of our StarGAN-EVC system in terms of its ability to achieve EVC for Japanese phrases. The subjective evaluation is conducted in terms of subjective classification and mean opinion score of neutrality and similarity. In addition, the interdependence between the source and target emotional domains was investigated from the perspective of the quality of EVC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes