SD LG ASMar 30, 2022

Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE

Ziang Long, Yunling Zheng, Meng Yu, Jack Xin

arXiv:2203.16037v24.17 citations

Originality Incremental advance

AI Analysis

This work addresses voice conversion for unseen speakers, but it is incremental as it builds on existing VAE methods with specific improvements.

The paper tackled the problem of zero-shot many-to-many voice conversion by enhancing a VAE model with self-attention and structural regularization, achieving a 28.3% gain in speaker classification accuracy on unseen speakers while slightly improving voice quality.

Variational auto-encoder (VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a source speaker. This is possible by concatenating the identity embedding of the target speaker and the content embedding of the source speaker uttering a desired sentence. In this work, we propose to improve VAE models with self-attention and structural regularization (RGSM). Specifically, we found a suitable location of VAE's decoder to add a self-attention layer for incorporating non-local information in generating a converted utterance and hiding the source speaker's identity. We applied relaxed group-wise splitting method (RGSM) to regularize network weights and remarkably enhance generalization performance. In experiments of zero-shot many-to-many voice conversion task on VCTK data set, with the self-attention layer and relaxed group-wise splitting method, our model achieves a gain of speaker classification accuracy on unseen speakers by 28.3\% while slightly improved conversion voice quality in terms of MOSNet scores. Our encouraging findings point to future research on integrating more variety of attention structures in VAE framework while controlling model size and overfitting for advancing zero-shot many-to-many voice conversions.

View on arXiv PDF

Similar