ASLGSDApr 15, 2021

Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels

arXiv:2104.07283v18 citations
Originality Incremental advance
AI Analysis

This work addresses voice conversion for emotional expression, presenting an incremental improvement with a novel network architecture.

The paper tackles the problem of F0 transformation for expressive voice conversion by proposing an end-to-end framework using a single neural network with a convolutional wavelet kernel module for multi-scale F0 representation and an adversarial module for emotion transformation, achieving results directly from raw F0 signals.

This paper presents a end-to-end framework for the F0 transformation in the context of expressive voice conversion. A single neural network is proposed, in which a first module is used to learn F0 representation over different temporal scales and a second adversarial module is used to learn the transformation from one emotion to another. The first module is composed of a convolution layer with wavelet kernels so that the various temporal scales of F0 variations can be efficiently encoded. The single decomposition/transformation network allows to learn in a end-to-end manner the F0 decomposition that are optimal with respect to the transformation, directly from the raw F0 signal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes