SDLGASNov 30, 2021

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

arXiv:2111.15159v113 citations
Originality Incremental advance
AI Analysis

This work addresses emotional voice conversion for speech processing applications, representing an incremental improvement over existing methods.

The researchers tackled emotional voice conversion by proposing a CycleGAN-based model with a transformer to capture frame intra-relations, achieving higher emotion strength and quality compared to baselines like ACVAE and CycleGAN.

In this study, we explore the transformer's ability to capture intra-relations among frames by augmenting the receptive field of models. Concretely, we propose a CycleGAN-based model with the transformer and investigate its ability in the emotional voice conversion task. In the training procedure, we adopt curriculum learning to gradually increase the frame length so that the model can see from the short segment till the entire speech. The proposed method was evaluated on the Japanese emotional speech dataset and compared to several baselines (ACVAE, CycleGAN) with objective and subjective evaluations. The results show that our proposed model is able to convert emotion with higher strength and quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes