AS LG SDSep 9, 2019

DNN-based cross-lingual voice conversion using Bottleneck Features

arXiv:1909.03974v26 citations

Originality Incremental advance

AI Analysis

This addresses voice conversion for speakers of different languages, but it is incremental as it builds on existing bottleneck and DNN methods.

The paper tackles cross-lingual voice conversion by proposing a DNN-based framework using bottleneck features from a deep auto-encoder to map speaker-independent features to target spectral features, and it outperforms a GMM baseline on data from three Indian languages.

Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different languages. A DNN model is trained to learn the mapping between bottleneck features and the corresponding spectral features of the target speaker. The proposed method can capture speaker-specific characteristics of a target speaker, and hence requires no speech data from source speaker during training. The performance of the proposed method is evaluated using data from three Indian languages: Telugu, Tamil and Malayalam. The experimental results show that the proposed method outperforms the baseline Gaussian mixture model (GMM)-based CLVC approach.

View on arXiv PDF

Similar